R Stats Basics
R Language Basics#
R-lang is a free software environment for statistical computing and graphics.
Install R#
Download R Studio - An application to write R programs on
Use Swirl#
Swirl is an interactive prompt based way to learn about R and other data science topics
To start open r in terminal with r
or open r studio
Install swirl
install.packages("swirl")
Load the swirl library
library(swirl)
Then start with swirl()
swirl()
Everything else will be guided
R Language#
The rlang interpreter works much like many others in that you can do basic maths with it.
Syntax#
Assignment: <-
Assigning a value to a variable is done with <-
Data Structures#
Any object containing data is a data structure
The simplest data structure is a vector. A single number is a vector of length 1.
A vector is created with the c()
concatenate of combine method
z = c(1.1, 4.5, 6)
You can concatenate vectors with c
:
c(z, 255, z)
Numberic operations on vectors are applied to all elements in the vector. When arithmetic is done to vectors of the same length, each operation is applied element by element. If they are not the same length, the shorter vector is recycled to the same length.
Behind the scenes R
converts single vectors into multiple.
z <- c(5, 10, 15)
z * 2 + 100
# same as
z * c(2,2,2) + c(100,100,100)
Artihmetic Operators#
+
,-
,/
,*
^
: to power ofsqrt()
: square rootabs()
: absolute value
Getting Help#
To get help on a function type: ?
and the function name without calling it
Eg. ?c
Dollar Operator#
Grab specific items from output with the $
operator
eg file.info("mytest.R")$mode
Workspace and Files#
Get working directory getwd()
List all objects in local workspace ls()
List all files in directory: dir()
or list.files()
Find what arguments a function takes: args(list.files)
Remember to not call the function
Create a directory: dir.create('testdir')
Set the working directory: setwd('testdir')
Create a file: file.create('mytest.R')
Check if a file exists: file.exists("mytest.R")
File info: file.info("mytest.R")
Rename a file: file.rename('mytest.R', 'mytest2.R')
Copy a file: file.copy('mytest2.R', 'mytest3.R')
Get relative path to a file: file.path('mytest3.R')
Create a path to a folder or file: file.path('folder1', 'folder2')
Create directory with recursive folders: dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE)
Top tip: It is often helpful to save the settings that you had before you began an analysis and then go back to them at the end. This trick is often used within functions; you save, say, the par() settings that you started with, mess around a bunch, and then set them back to the original values at the end. This isn’t the same as what we have done here, but it seems similar enough to mention.
Sequences#
Create a sequence of numbers :
: 1:20
Get a sequence of real numbers
pi:10
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
It stops before it goes greater than 10, incrmeenting by 1 each time
Returns a vector
Go back / decrement: 15:1
Help on special chars#
Use backticks
?`:`
Use seq()
for more control
seq(1,20)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Get 30 items equally between 2 numbers
seq(5, 10, length=30)
[1] 5.000000 5.172414 5.344828 5.517241 5.689655 5.862069 6.034483
[8] 6.206897 6.379310 6.551724 6.724138 6.896552 7.068966 7.241379
[15] 7.413793 7.586207 7.758621 7.931034 8.103448 8.275862 8.448276
[22] 8.620690 8.793103 8.965517 9.137931 9.310345 9.482759 9.655172
[29] 9.827586 10.000000
Check the length of a vector
length(my_seq)
[1] 30
Make a sequence of numbers of length of another vector
1:length(my_seq)
There are often several approaches to solving the same problem, particularly in R. Simple approaches that involve less typing are generally best. It’s also important for your code to be readable, so that you and others can figure out what’s going on without too much hassle.
Replicate with rep()
A vector of 40 zeroes
rep(0, times = 40)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[40] 0
Replicate a vector 10 times
> rep(c(0,1,2), times=10)
[1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
```
Create 10 of `each` in sequence
rep(c(0, 1, 2), each = 10) [1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
## Vectors
The simplest and most common data structure
* atomic vectors - single data type
* lists - contain multiple data types
Logical vectors contain the values `TRUE`, `FALSE` and `NA` (Not Available)
num_vect <- c(0.5, 55, -10, 6)
tf <- num_vect < 1
tf
[1] TRUE FALSE TRUE FALSE
#### Logical operators:
Exact equality: `>`, `<=`, `==`
Inequality: `!=`
Or (Union): `A | B`
And (intersection): `A & B`
Not (Negation): `!A`
Character vectors
my_char <- c("My", "name", "is")
Concatenate into svector of length 1
paste(my_char, collapse = " ")
Append a value:
my_name <- c(my_char, "stephen")
Adding an integer and character vector of length 3 together:
paste(1:3, c("X", "Y", "Z"), sep="")
If they are not of equal kength there is `vector recycling`
Printing letters with `vector recycling`:
> paste(LETTERS, 1:4, sep="-")
[1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1"
[14] "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4" "Y-1" "Z-2"
## Missing values
Missing values play an important role in statistics and data analysis. Often,
missing values must not be ignored, but rather they should be carefully studied
to see if there's an underlying pattern or cause for their missingness.
In `R`, `NA` is used to represent any value that is 'not available' or 'missing'
(in the statistical sense).
Any operation involving `NA` generally yields `NA` as the result
> x <- c(44, NA, 5, NA)
> x
[1] 44 NA 5 NA
> x * 3
[1] 132 NA 15 NA
Create a vector with 1000 draws from standard distribution
y <- rnorm(1000)
Then a vector of 1000 `NA`'s
z <- rep(NA, 1000)
Select 100 at random from both:
my_data <- sample(c(y,z), 100)
Check which are NA in a new vector using `is.na()`
my_na <- is.na(my_data)
Camparing with `my_data == NA` returns all `NA`
> The reason you got a vector of all NAs is that NA is not really a value, but just a
> placeholder for a quantity that is not available. Therefore the logical expression is
> incomplete and R has no choice but to return a vector of the same length as my_data
> that contains all NAs.
> 5 == NA
[1] NA
> The key takeaway is to be cautious when using logical expressions anytime NAs might creep in
Tota number of true values
> sum(my_na)
[1] 45
#### Not a Number
There is another missing value
> 0 / 0
[1] NaN
In `R`, `Inf` stands for `infinity`
> Inf - Inf
[1] NaN
## Subsetting Vectors
Selecting first 10 elements of a vector
> x[1:10]
[1] 3.0949871 0.1960158 0.2084758 NA -0.2614606 NA -0.4809142
[8] NA NA 0.6007584
Getting all results that are not `NA`:
y <- x[!is.na(x)]
Get a vector of all positive values
y[y > 0]
> Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0 evaluates to NA
Only values of x that are both non-missing AND greater than zero.
> x[!is.na(x) & x > 0]
[1] 3.09498711 0.19601584 0.20847579 0.60075844 1.72316551 0.87532455 0.27598833
[8] 0.58037652 0.10702578 0.08164542 1.65696398
Many programming languages use what's called **zero-based indexing**, which means that the first element of a vector is considered element 0. R uses **one-based indexing**, which (you guessed it!) means the first element of a vector is considered element 1
Get the 3rd, 5th and 7th elements of vector
x[c(3, 5, 7)]
But you can still ask for the `0th element` (No error thrown, nothing)
> x[0]
numeric(0)
Getting an element that does not exist:
> x[3000]
[1] NA
Getting elements except a few needs to use `negative` indices
x[c(-2, -10)]
The shorthand for the above is:
x[-c(2, 10)]
Named vector
vect <- c(foo = 11, bar = 2, norf = NA)
> vect
foo bar norf
11 2 NA
Get just the names of a named vector
> names(vect)
[1] "foo" "bar" "norf"
You can give names to elements retrospectively
vect2 <- c(11, 2, NA)
names(vect2) <- c("foo", "bar", "norf")
Checking if 2 vectors are the same use `identical()`
> identical(vect, vect2)
[1] TRUE
Get a named element
vect["bar"]
## Matrices and Data Frames
Both represent *rectangular* data types, meaning that they are used to store tabular data, with rows and columns.
* matrices: can only contain a single class of data
* data frames: can consist of many different classes of data
Find the dimensions of a variable
`dim()` function tells us how many dimensions an object has
> my_vector <- 1:20
> my_vector
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> dim(my_vector)
NULL
A vector does not have a dimension so it is `NULL`
Get length:
> length(my_vector)
[1] 20
> The `dim()` function allows you to get OR set the `dim` attribute for an R object.
You can also use `aatributes`:
> attributes(my_vector)
$dim
[1] 4 5
Now it is a `matrix`: rows and columns
> my_vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
Check the class of the element
> class(my_vector)
[1] "matrix"
Open docs for matrix:
> ?matrix()
Create the matrix
> my_matrix2 = matrix(1:20, 4, 5)
Column combine for named rows
> patients <- c("Bill", "Gina", "Kelly", "Sean")
> cbind(patients, my_matrix)
patients
[1,] "Bill" "1" "5" "9" "13" "17"
[2,] "Gina" "2" "6" "10" "14" "18"
[3,] "Kelly" "3" "7" "11" "15" "19"
[4,] "Sean" "4" "8" "12" "16" "20"
This makes all the data to now be of type string / character
So we need a `data frame`
my_data <- data.frame(patients, my_matrix)
> my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
Confirm the class:
> class(my_data)
[1] "data.frame"
Add column names
> cnames <- c("patient", "age", "weight", "bp", "rating", "test")
> colnames(my_data) <- cnames
> my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
## Logic
The basic of logic will not be mentioned here.
In `R`:
* `&` evalautes to `AND` for the entire vector
* `&&` evaluates to `AND` just for the first element for vector
> TRUE & c(TRUE, FALSE, FALSE)
[1] TRUE FALSE FALSE
and
> TRUE && c(TRUE, FALSE, FALSE)
[1] TRUE
* `|` evaluates to OR across the entire vector
* `||` version of OR only evaluates the first member of a vector
**All AND operators are evaluated before OR operators**
There is a `isTRUE` function
* `isTRUE()` will only return TRUE if the statement passed to it as an argument is TRUE
> isTRUE(NA)
[1] FALSE
> isTRUE(3)
[1] FALSE
`xor()` function stands for exclusive OR
> xor(TRUE, TRUE)
[1] FALSE
Get a random sample of `ints` 1 to 10
> ints <- sample(10)
> ints
[1] 4 6 8 7 2 9 10 5 3 1
`which()` function takes a logical vector as an argument and returns the indices of the vector that are TRUE
Finding which ints are greater than 7
> which(ints > 7)
[1] 3 6 7
* `any()` function will return TRUE if one or more of the elements in the logical vector is TRUE
* `all()` function will return TRUE if every element in the logical vector is TRUE
> any(ints < 0)
[1] FALSE
> all(ints > 0)
[1] TRUE
## Functions
> Sys.Date()
[1] "2018-03-16"
Get the `mean()`
> mean(c(2, 4, 5))
[1] 3.666667
Writing a function:
function_name <- function(arg1, arg2){
# Manipulate arguments in some way
# Return a value
}
Use the function:
function_name(value1, value2)
> Note: There is no `return`. The last expression evaluated will be returned!
John Chambers the creator of `R` said:
> To understand computations in R, two slogans are helpful:
> 1. Everything that exists is an object.
> 2. Everything that happens is a function call.
You can view a function's source code by just typing the function name
Setting default arguments
remainder <- function(num, divisor=2) {
num %% divisor
}
You can use named parameters:
remainder(divisor = 11, num = 5)
Check what arguments a function expects with:
> args(remainder)
function (num, divisor = 2)
You can pass functions as arguments
evaluate <- function(func, dat){
func(dat)
}
Running it:
> evaluate(sd, c(1.4, 3.6, 7.9, 8.8))
[1] 3.514138
Anonymous functions:
> evaluate(function(x){x + 1}, 6)
[1] 7
`paste` function: Concatenate vectors after converting to character
The first argument is an `...` meaning it allows an indefinite number of arguments to be passed into a function. Any number of strings can be passed to function and a concatenated string will return.
> Strict rule in R programming: all arguments after an ellipses must have default values.
Unpacking arguments:
args <- list(...)
alpha <- args[["alpha"]]
beta <- args[["beta"]]
`+, -, *, and /` symbols. These symbols are called binary operators because they take two inputs, an input from the left and an input from the right.
#### User defined Binary Operators
"%mult_add_one%" <- function(left, right){ # Notice the quotation marks!
left * right + 1
}
I could then use this binary operator like `4 %mult_add_one% 5` which would
evaluate to 21.
# Lapply and Sapply
`loop` functions
Used for implementing the `Split-Apply-Combine strategy for data analysis`
We will be using the [uci flag dataset(http://archive.ics.uci.edu/ml/datasets/Flags)
View the first 6 lines of a dataset:
head(flags)
Dimensions:
> dim(flags)
[1] 194 30
194 rows and 30 columns
> To open a more complete description of the dataset in a separate text file, type `viewinfo()`
Class type:
> class(flags)
[1] "data.frame"
But what is the `class` of each variable or column in the dataset?
`lapply()` takes a list as input and applies a function to each element of the list.
A dataframe is really just a list of vectors: `as.list(flags))`
Remember to only give the name of the function you want to call (don't call it with the results):
> cls_list <- lapply(flags, class)
> cls_list
$name
[1] "factor"
$landmass
[1] "integer"
$zone
[1] "integer"
$area
[1] "integer"
$population
[1] "integer"
$language
[1] "integer"
$religion
[1] "integer"
$bars
[1] "integer"
$stripes
[1] "integer"
$colours
[1] "integer"
$red
[1] "integer"
$green
[1] "integer"
$blue
[1] "integer"
$gold
[1] "integer"
$white
[1] "integer"
$black
[1] "integer"
$orange
[1] "integer"
$mainhue
[1] "factor"
$circles
[1] "integer"
$crosses
[1] "integer"
$saltires
[1] "integer"
$quarters
[1] "integer"
$sunstars
[1] "integer"
$crescent
[1] "integer"
$triangle
[1] "integer"
$icon
[1] "integer"
$animate
[1] "integer"
$text
[1] "integer"
$topleft
[1] "factor"
$botright
[1] "factor"
The `l` in `lapply` stands for `list`
Simpified to a character vector:
> as.character(cls_list)
[1] "factor" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
[9] "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
[17] "integer" "factor" "integer" "integer" "integer" "integer" "integer" "integer"
[25] "integer" "integer" "integer" "integer" "factor" "factor"
`sapply` stands for `simplify` apply. It converts to a character vector.
> cls_vect <- sapply(flags, class)
> class(cls_vect)
[1] "character"
> if the result is a list where every element is of length one, then sapply() returns a vector. If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix. If sapply() can't figure things out, then it just returns a list, no different from what lapply() would give you.
See number of flags that has `orange`:
> sum(flags$orange)
[1] 26
Get only certain columns but keep all the rows:
> flag_colors <- flags[, 11:17]
> lapply(flag_colors, sum)
$red
[1] 153
$green
[1] 91
$blue
[1] 99
$gold
[1] 91
$white
[1] 146
$black
[1] 52
$orange
[1] 26
Using `sapply`:
> sapply(flag_colors, sum)
red green blue gold white black orange
153 91 99 91 146 52 26
> sapply(flag_colors, mean)
red green blue gold white black orange
0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412 0.1340206
The `range()` function returns the minimum and maximum of its first argument
> shape_mat <- sapply(flag_shapes, range)
> shape_mat
circles crosses saltires quarters sunstars
[1,] 0 0 0 0 0
[2,] 4 2 1 4 50
`unique()` returns a vector of only the 'unique' elements
> unique(c(3, 4, 5, 5, 5, 6, 6))
[1] 3 4 5 6
Use with anonymous functions:
> lapply(unique_vals, function(elem) elem[2])
## vapply and tapply
`vapply()` allows you to specify format of result explicitly
Alows you to be mroe strict and will throw an error when data does not a single numeric value
> vapply(flags, unique, numeric(1))
Error in vapply(flags, unique, numeric(1)) : values must be length 1,
but FUN(X[[1]]) result is length 194
To explicitly get the data types as a single element character vector
> vapply(flags, class, character(1))
> As a data analyst, you'll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group.
See amount in each group based on landmass:
> table(flags$landmass)
1 2 3 4 5 6
31 17 35 52 39 20
Aplitting data into groups by landmass and running stats on it:
> tapply(flags$animate, $flags$landmass, mean)
See mean of animate flags per landmass
Get summary of popualtion for flags with/without red in:
> tapply(flags$population, flags$red, summary)
$`0`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 3.00 27.63 9.00 684.00
$`1`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.0 4.0 22.1 15.0 1008.0
## Looking at Data
> Whenever you're working with a new dataset, the first thing you should do is look at it! What is the format of the data? What are the dimensions? What are the variable names? How are the variables stored? Are there missing data? Are there any flaws in the data?
List variables in your workspace: `> ls()`
Check strucute of data:
> class(plants)
[1] "data.frame"
> It's very common for data to be stored in a data frame. It is the default class for data read into R using functions like read.csv() and read.table(), which you'll learn about in another lesson.
Check rows and columns:
> dim(plants)
[1] 5166 10
> nrow(plants)
[1] 5166
> ncol(plants)
[1] 10
Size in memeory:
> object.size(plants)
644232 bytes
Get column names:
> names(plants)
[1] "Scientific_Name" "Duration" "Active_Growth_Period"
[4] "Foliage_Color" "pH_Min" "pH_Max"
[7] "Precip_Min" "Precip_Max" "Shade_Tolerance"
[10] "Temp_Min_F"
By defulat `head()` shows you the first 6 lines you can get the first 10 with:
> head(plants, 10)
Same for tail:
> tail(plants, 15)
Get a summary of the dataset and missing values:
> summary(plants)
> Categorical values are called factors in R
Sometimes number of categories is truncated by saying `Other` in that case use:
> table(plants$Active_Growth_Period)
The best is casting to `str()`
`str()` can be used on many other datastructures
## Simlulation
Creating random numbers
sample(x, size, replace = FALSE, prob = NULL)
Roll 4 dice (6 sided):
> sample(1:6, 4, replace=TRUE)
[1] 6 2 3 3
Choose 4 numbers, from 1 to 6, each number is replaced after selection so it can show up more than once
Get 10 numbers from 1 to 20 that won't appear again:
> sample(1:20, 10)
[1] 1 7 20 14 13 10 6 2 15 18
`LETTERS` is a predefined variable in R containing a vector of all 26 letters of the English alphabet
permute a sample of letters:
> sample(LETTERS)
[1] "I" "L" "B" "R" "F" "S" "Q" "J" "G" "M" "A" "H" "W" "U" "O" "P" "K" "T" "Y" "X" "E"
[22] "D" "Z" "N" "C" "V"
If `size` is not given, `R` takes a sample equal in size.
Get an unfair coin with 100 flips:
flips <- sample(c(0, 1), 100, replace=TRUE, prob=c(0.3, 0.7))
### Rbinom
Random binomial distribution: `rbinom`
> Each probability distribution in R has an r*** function (for "random"), a d*** function (for "density"), a p*** (for "probability"), and q*** (for "quantile").
Binomial distribution - Number of successes
Only specify the number of successes
To see number of successes:
> rbinom(1, size = 100, prob = 0.7)
To store number of flips:
> flips2 <- rbinom(100, size = 1, prob = 0.7)
### RNorm
The standard normal distribution has mean 0 and standard deviation 1
10 random numbers in a normal distribution:
> rnorm(10)
[1] 0.53665009 -2.39624561 -1.50745602 -1.27852621 -0.85378324 -0.04011113 0.49547350
[8] -0.21447406 -0.81949348 0.75271073
### RPois
Poisson Distribution - Expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event.[1] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.
Generate 5 numbers with mean on 10:
> rpois(5, lambda=10)
[1] 9 7 6 12 6
TO get that 10 times use:
> my_pois <- replicate(100, rpois(5, 10))
Get the column means:
> cm <- colMeans(my_pois)
Plot a histogram of column means:
> hist(cm)
All the other standard probability distributions are built into R:
* Exponential: `rexpr()`
* Chi-squared: `rchisq()`
* Gamma: `rgamma()`
## Dates and Times
Timeseries data or temporal information
Dates are represented by the ‘Date’ class and times are represented by the ‘POSIXct’ and ‘POSIXlt’ classes. Internally, dates are stored as the number of days since 1970-01-01 and times are stored as either the number of seconds since 1970-01-01 (for ‘POSIXct’) or a list of seconds, minutes, hours, etc. (for ‘POSIXlt’). ```
> d1 <- Sys.Date()
> d1
[1] "2018-03-19"
> class(d1)
[1] "Date"
See internal look of class
> unclass(d1)
[1] 17609
The total number of days since: 1970-01-01
Create a date before epoch:
> d2 <- as.Date("1969-01-01")
> unclass(d2)
[1] -365
System time:
> t1 <- Sys.time()
> t1
[1] "2018-03-19 12:16:16 SAST"
> class(t1)
[1] "POSIXct" "POSIXt"
coerce the result to POSIXlt
(Not sure why though)
> t2 <- as.POSIXlt(Sys.time())
> t2
[1] "2018-03-19 12:17:49 SAST"
> unclass(t2)
$sec
[1] 49.87161
$min
[1] 17
$hour
[1] 12
$mday
[1] 19
$mon
[1] 2
$year
[1] 118
$wday
[1] 1
$yday
[1] 77
$isdst
[1] 0
$zone
[1] "SAST"
$gmtoff
[1] 7200
attr(,"tzone")
[1] "" "SAST" "SAST"
> str(unclass(t2))
List of 11
$ sec : num 49.9
$ min : int 17
$ hour : int 12
$ mday : int 19
$ mon : int 2
$ year : int 118
$ wday : int 1
$ yday : int 77
$ isdst : int 0
$ zone : chr "SAST"
$ gmtoff: int 7200
- attr(*, "tzone")= chr [1:3] "" "SAST" "SAST"
Just get minutes:
> t2$min
[1] 17
Return day of the week:
> weekdays(d1)
[1] "Monday"
Similarly with months and quarters:
> months(t1)
[1] "March"
> quarters(t2)
[1] "Q1"
strptime()
converts character vectors to POSIXlt. In that sense, it is similar toas.POSIXlt()
, except that the input doesn’t have to be in a particular format (YYYY-MM-DD).
> t3 <- "October 17, 1986 08:24"
> t4 <- strptime(t3, "%B %d, %Y %H:%M")
> t4
[1] "1986-10-17 08:24:00 SAST"
> class(t4)
[1] "POSIXlt" "POSIXt"
Comparison of time:
> Sys.time() > t1
[1] TRUE
Time difference:
> Sys.time() - t1
Time difference of 9.086724 mins
Find time difference in specific unit:
> difftime(Sys.time(), t1, units = 'days')
Time difference of 0.006632809 days
Base Graphics#
Not covered are more advanced graphics:
- lattice
- ggplot2
- ggvis
Load dataset cars
:
data(cars)
Get help page for cars:
`?cars`
Create basic chart:
> plot(cars)
If dataset has 2 columns it assumes what you want to plot. Since we do not provide labels for either axis, R uses the names of the columns
plot
is short for scatterplot
Can be plotted with:
> plot(x = cars$speed, y=cars$dist)
Setting labels:
> plot(x = cars$speed, y=cars$dist, xlab='Speed', ylab='Stopping Distance')
Plot so points are red
:
> plot(cars, col = 2)
PLot and limit x-axis:
> plot(cars, xlim=c(10, 15))
PLot with triangles:
> plot(cars, pch=2)
Boxplot#
You can pass the entire data frame
boxplot()
, like manyR
functions, also takes a “formula” argument, generally an expression with a tilde (“~”) which indicates the relationship between the input variables. This allows you to enter something likempg ~ cyl
to plot the relationship betweencyl
(number of cylinders) on the x-axis andmpg
(miles per gallon) on the y-axis.
> boxplot(formula=mpg ~ cyl, data = mtcars)
A histogram can be used for a single vector:
> hist(mtcars$mpg)