LabSess 1 - 2024-10-03 - Introduction to RStudio - DATS

LabSess 1 - 2024-10-03 - Introduction to RStudio - DATS

Files:

Set working directory. Options:

Setup

Comments


Set directory

To set the directory through code, use setwd()

setwd("~/Library/Mobile Documents/com~apple~CloudDocs/0_MATTEO/0_School/0_Università/1_Magistrale/1_UPC - Barcellona/1° Year/Data Analisys in Transport Systems/Lab Sessions/2024-10-03 - DATS")

Packages

Can add packages through code:

library(car)
library(AER)

but have to also check it in the "Packages" section

If a package doesn't appear, go to
Install > "write the name" > Install

Run

Run 1 line or selection

Put pointer to 1 line, or select the line

Click: Run the current line or selection (Cmnd + Enter)

Inport data

data(name of data)

View data

To write in console

View(name of data)

this isn't efficient with very big data frames.

Help

Run:

?"Name of command"

Running a command with a "?" before, opens the "help" section. You can also do it for data.

Univariate Explorative Data Analysis

Univariate: Understand variables one-by-one.

Numeric Indicators

Use summary(Davis):

Factor: term in R for a qualitative variable (has label). ie, M/F (Male or Female). In this case, summary() just counts...

Charts

We'll first see basic charts for 1 variable

Charts for factors

Factors:
Possible charts:

The variable "sex" only exists as a column of Davis. It is NOT an object. R can only work with objects.

We can select a part of an object using a Dollar symbol ($).
ie:

# Give summary of the variable "sex" in "Davis"
summary(Davis$sex)

We need tables, w/ command table: it counts the number if observations for each label.

# Give summary of the variable "sex" in "Davis"
table(Davis$sex)

To assign a name to something, we can use the = sign, or the <- arrow.

# the following commands, give the name tt to the table

tt <- table(Davis$sex)

# or

tt = table(Davis$sex)

Charts in R are ugly. To make them beautiful we can use a package ggplot2

Barplot

barplot()
barplot(tt,main="My first barplot")
barplot(tt,main="My first barplot",col="cyan")
barplot(tt,main="My first barplot",col=heat.colors(2))

Pie chart

pie(tt) # Make a pie chart

Charts for num variables

Histogram

It has beams, not bars

hist(Davis$height) # Makes an histogram
hist(Davis$height,main="My first Histogram",col="magenta") # Make title and color
# Higher customization

hist(Davis$height,10,main="My first Histogram",col="magenta") # Make histogram dividing the range into 10 intervals (10 beams) of the same lenght


hist(Davis$height,breaks=seq(50,200,5),main="My first Histogram",col="magenta") # From 50 to 200, the lenght of the beam is equal to 5

if I use freq= F (F = False), it means that I'm not interested to the absolute number of observations, but to the ???

Applying a distribution to a set of data:

hist(Davis$height,freq=F,breaks=seq(50,200,5),main="My first Histogram",col="magenta")
mm<-mean(Davis$height);ss<-sd(Davis$height);mm;ss # Calculate mean and Standard Deviation (needed for the density func of the normal distrib.) and assign them to variables called mm and ss
curve(dnorm(x,mm,ss),add=T,col="blue",lwd=2) # Applies the normal distribution using the mean mm and the Standard Deviation ss
# In the last command: dnorm() is the normal distribution, "add=T"R

We can do a test to check. For example the Shapiro Wilk test.