RStats
R Stats Basics
R Language Basics
Rlang is a free software environment for statistical computing and graphics.
Install R
Download R Studio  An application to write R programs on
Use Swirl
Swirl is an interactive prompt based way to learn about R and other data science topics
To start open r in terminal with r
or open r studio
Install swirl
install.packages("swirl")
Load the swirl library
library(swirl)
Then start with swirl()
swirl()
Everything else will be guided
R Language
The rlang interpreter works much like many others in that you can do basic maths with it.
Syntax
Assignment: <
Assigning a value to a variable is done with <
Data Structures
Any object containing data is a data structure
The simplest data structure is a vector. A single number is a vector of length 1.
A vector is created with the c()
concatenate of combine method
z = c(1.1, 4.5, 6)
You can concatenate vectors with c
:
c(z, 255, z)
Numberic operations on vectors are applied to all elements in the vector. When arithmetic is done to vectors of the same length, each operation is applied element by element. If they are not the same length, the shorter vector is recycled to the same length.
Behind the scenes R
converts single vectors into multiple.
``` z < c(5, 10, 15) z * 2 + 100
same as
z * c(2,2,2) + c(100,100,100) ```
Artihmetic Operators
+
,
,/
,*
^
: to power ofsqrt()
: square rootabs()
: absolute value
Getting Help
To get help on a function type: ?
and the function name without calling it
Eg. ?c
Dollar Operator
Grab specific items from output with the $
operator
eg file.info("mytest.R")$mode
Workspace and Files
Get working directory getwd()
List all objects in local workspace ls()
List all files in directory: dir()
or list.files()
Find what arguments a function takes: args(list.files)
Remember to not call the function
Create a directory: dir.create('testdir')
Set the working directory: setwd('testdir')
Create a file: file.create('mytest.R')
Check if a file exists: file.exists("mytest.R")
File info: file.info("mytest.R")
Rename a file: file.rename('mytest.R', 'mytest2.R')
Copy a file: file.copy('mytest2.R', 'mytest3.R')
Get relative path to a file: file.path('mytest3.R')
Create a path to a folder or file: file.path('folder1', 'folder2')
Create directory with recursive folders: dir.create(file.path('testdir2', 'testdir3'), recursive = TRUE)
Top tip: It is often helpful to save the settings that you had before you began an analysis and then go back to them at the end. This trick is often used within functions; you save, say, the par() settings that you started with, mess around a bunch, and then set them back to the original values at the end. This isn’t the same as what we have done here, but it seems similar enough to mention.
Sequences
Create a sequence of numbers :
: 1:20
Get a sequence of real numbers
pi:10
[1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
It stops before it goes greater than 10, incrmeenting by 1 each time
Returns a vector
Go back / decrement: 15:1
Help on special chars
Use backticks
?`:`
Use seq()
for more control
seq(1,20)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Get 30 items equally between 2 numbers
``` seq(5, 10, length=30)
[1] 5.000000 5.172414 5.344828 5.517241 5.689655 5.862069 6.034483 [8] 6.206897 6.379310 6.551724 6.724138 6.896552 7.068966 7.241379 [15] 7.413793 7.586207 7.758621 7.931034 8.103448 8.275862 8.448276 [22] 8.620690 8.793103 8.965517 9.137931 9.310345 9.482759 9.655172 [29] 9.827586 10.000000 ```
Check the length of a vector
length(my_seq)
[1] 30
Make a sequence of numbers of length of another vector
1:length(my_seq)
There are often several approaches to solving the same problem, particularly in R. Simple approaches that involve less typing are generally best. It’s also important for your code to be readable, so that you and others can figure out what’s going on without too much hassle.
Replicate with rep()
A vector of 40 zeroes
rep(0, times = 40)
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[40] 0
Replicate a vector 10 times
> rep(c(0,1,2), times=10)
[1] 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2 0 1 2
Create 10 of each
in sequence
rep(c(0, 1, 2), each = 10)
[1] 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
## Vectors
The simplest and most common data structure
 atomic vectors  single data type
 lists  contain multiple data types
Logical vectors contain the values TRUE
, FALSE
and NA
(Not Available)
num_vect < c(0.5, 55, 10, 6)
tf < num_vect < 1
tf
[1] TRUE FALSE TRUE FALSE
Logical operators:
Exact equality: >
, <=
, ==
Inequality: !=
Or (Union): A  B
And (intersection): A & B
Not (Negation): !A
Character vectors
my_char < c("My", "name", "is")
Concatenate into svector of length 1
paste(my_char, collapse = " ")
Append a value:
my_name < c(my_char, "stephen")
Adding an integer and character vector of length 3 together:
paste(1:3, c("X", "Y", "Z"), sep="")
If they are not of equal kength there is vector recycling
Printing letters with vector recycling
:
> paste(LETTERS, 1:4, sep="")
[1] "A1" "B2" "C3" "D4" "E1" "F2" "G3" "H4" "I1" "J2" "K3" "L4" "M1"
[14] "N2" "O3" "P4" "Q1" "R2" "S3" "T4" "U1" "V2" "W3" "X4" "Y1" "Z2"
Missing values
Missing values play an important role in statistics and data analysis. Often, missing values must not be ignored, but rather they should be carefully studied to see if there’s an underlying pattern or cause for their missingness.
In R
, NA
is used to represent any value that is ‘not available’ or ‘missing’
(in the statistical sense).
Any operation involving NA
generally yields NA
as the result
> x < c(44, NA, 5, NA)
> x
[1] 44 NA 5 NA
> x * 3
[1] 132 NA 15 NA
Create a vector with 1000 draws from standard distribution
y < rnorm(1000)
Then a vector of 1000 NA
’s
z < rep(NA, 1000)
Select 100 at random from both:
my_data < sample(c(y,z), 100)
Check which are NA in a new vector using is.na()
my_na < is.na(my_data)
Camparing with my_data == NA
returns all NA
The reason you got a vector of all NAs is that NA is not really a value, but just a placeholder for a quantity that is not available. Therefore the logical expression is incomplete and R has no choice but to return a vector of the same length as my_data that contains all NAs.
> 5 == NA
[1] NA
The key takeaway is to be cautious when using logical expressions anytime NAs might creep in
Tota number of true values
> sum(my_na)
[1] 45
Not a Number
There is another missing value
> 0 / 0
[1] NaN
In R
, Inf
stands for infinity
> Inf  Inf
[1] NaN
Subsetting Vectors
Selecting first 10 elements of a vector
> x[1:10]
[1] 3.0949871 0.1960158 0.2084758 NA 0.2614606 NA 0.4809142
[8] NA NA 0.6007584
Getting all results that are not NA
:
y < x[!is.na(x)]
Get a vector of all positive values
y[y > 0]
Since NA is not a value, but rather a placeholder for an unknown quantity, the expression NA > 0 evaluates to NA
Only values of x that are both nonmissing AND greater than zero.
> x[!is.na(x) & x > 0]
[1] 3.09498711 0.19601584 0.20847579 0.60075844 1.72316551 0.87532455 0.27598833
[8] 0.58037652 0.10702578 0.08164542 1.65696398
Many programming languages use what’s called zerobased indexing, which means that the first element of a vector is considered element 0. R uses onebased indexing, which (you guessed it!) means the first element of a vector is considered element 1
Get the 3rd, 5th and 7th elements of vector
x[c(3, 5, 7)]
But you can still ask for the 0th element
(No error thrown, nothing)
> x[0]
numeric(0)
Getting an element that does not exist:
> x[3000]
[1] NA
Getting elements except a few needs to use negative
indices
x[c(2, 10)]
The shorthand for the above is:
x[c(2, 10)]
Named vector
vect < c(foo = 11, bar = 2, norf = NA)
> vect
foo bar norf
11 2 NA
Get just the names of a named vector
> names(vect)
[1] "foo" "bar" "norf"
You can give names to elements retrospectively
vect2 < c(11, 2, NA)
names(vect2) < c("foo", "bar", "norf")
Checking if 2 vectors are the same use identical()
> identical(vect, vect2)
[1] TRUE
Get a named element
vect["bar"]
Matrices and Data Frames
Both represent rectangular data types, meaning that they are used to store tabular data, with rows and columns.
 matrices: can only contain a single class of data
 data frames: can consist of many different classes of data
Find the dimensions of a variable
dim()
function tells us how many dimensions an object has
> my_vector < 1:20
> my_vector
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> dim(my_vector)
NULL
A vector does not have a dimension so it is NULL
Get length:
> length(my_vector)
[1] 20
The
dim()
function allows you to get OR set thedim
attribute for an R object.
You can also use aatributes
:
> attributes(my_vector)
$dim
[1] 4 5
Now it is a matrix
: rows and columns
> my_vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
Check the class of the element
> class(my_vector)
[1] "matrix"
Open docs for matrix:
> ?matrix()
Create the matrix
> my_matrix2 = matrix(1:20, 4, 5)
Column combine for named rows
> patients < c("Bill", "Gina", "Kelly", "Sean")
> cbind(patients, my_matrix)
patients
[1,] "Bill" "1" "5" "9" "13" "17"
[2,] "Gina" "2" "6" "10" "14" "18"
[3,] "Kelly" "3" "7" "11" "15" "19"
[4,] "Sean" "4" "8" "12" "16" "20"
This makes all the data to now be of type string / character
So we need a data frame
my_data < data.frame(patients, my_matrix)
> my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
Confirm the class:
> class(my_data)
[1] "data.frame"
Add column names
> cnames < c("patient", "age", "weight", "bp", "rating", "test")
> colnames(my_data) < cnames
> my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
Logic
The basic of logic will not be mentioned here.
In R
:
* &
evalautes to AND
for the entire vector
* &&
evaluates to AND
just for the first element for vector
> TRUE & c(TRUE, FALSE, FALSE)
[1] TRUE FALSE FALSE
and
> TRUE && c(TRUE, FALSE, FALSE)
[1] TRUE

evaluates to OR across the entire vector
version of OR only evaluates the first member of a vector
All AND operators are evaluated before OR operators
There is a isTRUE
function

isTRUE()
will only return TRUE if the statement passed to it as an argument is TRUEisTRUE(NA) [1] FALSE isTRUE(3) [1] FALSE
xor()
function stands for exclusive OR
> xor(TRUE, TRUE)
[1] FALSE
Get a random sample of ints
1 to 10
> ints < sample(10)
> ints
[1] 4 6 8 7 2 9 10 5 3 1
which()
function takes a logical vector as an argument and returns the indices of the vector that are TRUE
Finding which ints are greater than 7
> which(ints > 7)
[1] 3 6 7
any()
function will return TRUE if one or more of the elements in the logical vector is TRUE
all()
function will return TRUE if every element in the logical vector is TRUEany(ints < 0) [1] FALSE all(ints > 0) [1] TRUE
Functions
> Sys.Date()
[1] "20180316"
Get the mean()
> mean(c(2, 4, 5))
[1] 3.666667
Writing a function:
function_name < function(arg1, arg2){
# Manipulate arguments in some way
# Return a value
}
Use the function:
function_name(value1, value2)
Note: There is no
return
. The last expression evaluated will be returned!
John Chambers the creator of R
said:
To understand computations in R, two slogans are helpful: 1. Everything that exists is an object. 2. Everything that happens is a function call.
You can view a function’s source code by just typing the function name
Setting default arguments
remainder < function(num, divisor=2) {
num %% divisor
}
You can use named parameters:
remainder(divisor = 11, num = 5)
Check what arguments a function expects with:
> args(remainder)
function (num, divisor = 2)
You can pass functions as arguments
evaluate < function(func, dat){
func(dat)
}
Running it:
> evaluate(sd, c(1.4, 3.6, 7.9, 8.8))
[1] 3.514138
Anonymous functions:
> evaluate(function(x){x + 1}, 6)
[1] 7
paste
function: Concatenate vectors after converting to character
The first argument is an ...
meaning it allows an indefinite number of arguments to be passed into a function. Any number of strings can be passed to function and a concatenated string will return.
Strict rule in R programming: all arguments after an ellipses must have default values.
Unpacking arguments:
args < list(...)
alpha < args[["alpha"]]
beta < args[["beta"]]
+, , *, and /
symbols. These symbols are called binary operators because they take two inputs, an input from the left and an input from the right.
User defined Binary Operators
"%mult_add_one%" < function(left, right){ # Notice the quotation marks!
left * right + 1
}
I could then use this binary operator like 4 %mult_add_one% 5
which would
evaluate to 21.
Lapply and Sapply
loop
functions
Used for implementing the SplitApplyCombine strategy for data analysis
We will be using the [uci flag dataset(http://archive.ics.uci.edu/ml/datasets/Flags)
View the first 6 lines of a dataset:
head(flags)
Dimensions:
> dim(flags)
[1] 194 30
194 rows and 30 columns
To open a more complete description of the dataset in a separate text file, type
viewinfo()
Class type:
> class(flags)
[1] "data.frame"
But what is the class
of each variable or column in the dataset?
lapply()
takes a list as input and applies a function to each element of the list.
A dataframe is really just a list of vectors: as.list(flags))
Remember to only give the name of the function you want to call (don’t call it with the results):
> cls_list < lapply(flags, class)
> cls_list
$name
[1] "factor"
$landmass
[1] "integer"
$zone
[1] "integer"
$area
[1] "integer"
$population
[1] "integer"
$language
[1] "integer"
$religion
[1] "integer"
$bars
[1] "integer"
$stripes
[1] "integer"
$colours
[1] "integer"
$red
[1] "integer"
$green
[1] "integer"
$blue
[1] "integer"
$gold
[1] "integer"
$white
[1] "integer"
$black
[1] "integer"
$orange
[1] "integer"
$mainhue
[1] "factor"
$circles
[1] "integer"
$crosses
[1] "integer"
$saltires
[1] "integer"
$quarters
[1] "integer"
$sunstars
[1] "integer"
$crescent
[1] "integer"
$triangle
[1] "integer"
$icon
[1] "integer"
$animate
[1] "integer"
$text
[1] "integer"
$topleft
[1] "factor"
$botright
[1] "factor"
The l
in lapply
stands for list
Simpified to a character vector:
> as.character(cls_list)
[1] "factor" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
[9] "integer" "integer" "integer" "integer" "integer" "integer" "integer" "integer"
[17] "integer" "factor" "integer" "integer" "integer" "integer" "integer" "integer"
[25] "integer" "integer" "integer" "integer" "factor" "factor"
sapply
stands for simplify
apply. It converts to a character vector.
> cls_vect < sapply(flags, class)
> class(cls_vect)
[1] "character"
if the result is a list where every element is of length one, then sapply() returns a vector. If the result is a list where every element is a vector of the same length (> 1), sapply() returns a matrix. If sapply() can’t figure things out, then it just returns a list, no different from what lapply() would give you.
See number of flags that has orange
:
> sum(flags$orange)
[1] 26
Get only certain columns but keep all the rows:
> flag_colors < flags[, 11:17]
> lapply(flag_colors, sum)
$red
[1] 153
$green
[1] 91
$blue
[1] 99
$gold
[1] 91
$white
[1] 146
$black
[1] 52
$orange
[1] 26
Using sapply
:
> sapply(flag_colors, sum) red green blue gold white black orange 153 91 99 91 146 52 26
> sapply(flag_colors, mean)
red green blue gold white black orange
0.7886598 0.4690722 0.5103093 0.4690722 0.7525773 0.2680412 0.1340206
The range()
function returns the minimum and maximum of its first argument
> shape_mat < sapply(flag_shapes, range)
> shape_mat
circles crosses saltires quarters sunstars
[1,] 0 0 0 0 0
[2,] 4 2 1 4 50
unique()
returns a vector of only the ‘unique’ elements
> unique(c(3, 4, 5, 5, 5, 6, 6))
[1] 3 4 5 6
Use with anonymous functions:
> lapply(unique_vals, function(elem) elem[2])
vapply and tapply
vapply()
allows you to specify format of result explicitly
Alows you to be mroe strict and will throw an error when data does not a single numeric value
> vapply(flags, unique, numeric(1))
Error in vapply(flags, unique, numeric(1)) : values must be length 1,
but FUN(X[[1]]) result is length 194
To explicitly get the data types as a single element character vector
> vapply(flags, class, character(1))
As a data analyst, you’ll often wish to split your data up into groups based on the value of some variable, then apply a function to the members of each group.
See amount in each group based on landmass:
> table(flags$landmass)
1 2 3 4 5 6
31 17 35 52 39 20
Aplitting data into groups by landmass and running stats on it:
> tapply(flags$animate, $flags$landmass, mean)
See mean of animate flags per landmass
Get summary of popualtion for flags with/without red in:
> tapply(flags$population, flags$red, summary)
$`0` Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 0.00 3.00 27.63 9.00 684.00
$`1`
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 0.0 4.0 22.1 15.0 1008.0
Looking at Data
Whenever you’re working with a new dataset, the first thing you should do is look at it! What is the format of the data? What are the dimensions? What are the variable names? How are the variables stored? Are there missing data? Are there any flaws in the data?
List variables in your workspace: > ls()
Check strucute of data:
> class(plants)
[1] "data.frame"
It’s very common for data to be stored in a data frame. It is the default class for data read into R using functions like read.csv() and read.table(), which you’ll learn about in another lesson.
Check rows and columns:
> dim(plants)
[1] 5166 10
> nrow(plants)
[1] 5166
> ncol(plants)
[1] 10
Size in memeory:
> object.size(plants)
644232 bytes
Get column names:
> names(plants)
[1] "Scientific_Name" "Duration" "Active_Growth_Period"
[4] "Foliage_Color" "pH_Min" "pH_Max"
[7] "Precip_Min" "Precip_Max" "Shade_Tolerance"
[10] "Temp_Min_F"
By defulat head()
shows you the first 6 lines you can get the first 10 with:
> head(plants, 10)
Same for tail:
> tail(plants, 15)
Get a summary of the dataset and missing values:
> summary(plants)
Categorical values are called factors in R
Sometimes number of categories is truncated by saying Other
in that case use:
> table(plants$Active_Growth_Period)
The best is casting to str()
str()
can be used on many other datastructures
Simlulation
Creating random numbers
sample(x, size, replace = FALSE, prob = NULL)
Roll 4 dice (6 sided):
> sample(1:6, 4, replace=TRUE)
[1] 6 2 3 3
Choose 4 numbers, from 1 to 6, each number is replaced after selection so it can show up more than once
Get 10 numbers from 1 to 20 that won’t appear again:
> sample(1:20, 10)
[1] 1 7 20 14 13 10 6 2 15 18
LETTERS
is a predefined variable in R containing a vector of all 26 letters of the English alphabet
permute a sample of letters:
> sample(LETTERS)
[1] "I" "L" "B" "R" "F" "S" "Q" "J" "G" "M" "A" "H" "W" "U" "O" "P" "K" "T" "Y" "X" "E"
[22] "D" "Z" "N" "C" "V"
If size
is not given, R
takes a sample equal in size.
Get an unfair coin with 100 flips:
flips < sample(c(0, 1), 100, replace=TRUE, prob=c(0.3, 0.7))
Rbinom
Random binomial distribution: rbinom
Each probability distribution in R has an r* function (for “random”), a d* function (for “density”), a p* (for “probability”), and q* (for “quantile”).
Binomial distribution  Number of successes
Only specify the number of successes
To see number of successes:
> rbinom(1, size = 100, prob = 0.7)
To store number of flips:
> flips2 < rbinom(100, size = 1, prob = 0.7)
RNorm
The standard normal distribution has mean 0 and standard deviation 1
10 random numbers in a normal distribution:
> rnorm(10)
[1] 0.53665009 2.39624561 1.50745602 1.27852621 0.85378324 0.04011113 0.49547350
[8] 0.21447406 0.81949348 0.75271073
RPois
Poisson Distribution  Expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event.[1] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.
Generate 5 numbers with mean on 10:
> rpois(5, lambda=10)
[1] 9 7 6 12 6
TO get that 10 times use:
> my_pois < replicate(100, rpois(5, 10))
Get the column means:
> cm < colMeans(my_pois)
Plot a histogram of column means:
> hist(cm)
All the other standard probability distributions are built into R:
 Exponential:
rexpr()
 Chisquared:
rchisq()
 Gamma:
rgamma()
Dates and Times
Timeseries data or temporal information
Dates are represented by the 'Date' class and times are represented by the 'POSIXct' and 'POSIXlt' classes. Internally, dates are stored as the number of days since 19700101 and times are stored as either the number of seconds since 19700101 (for 'POSIXct') or a list of seconds, minutes, hours, etc. (for 'POSIXlt').
> d1 < Sys.Date()
> d1
[1] "20180319"
> class(d1)
[1] "Date"
See internal look of class
> unclass(d1)
[1] 17609
The total number of days since: 19700101
Create a date before epoch:
> d2 < as.Date("19690101")
> unclass(d2)
[1] 365
System time:
> t1 < Sys.time()
> t1
[1] "20180319 12:16:16 SAST"
> class(t1)
[1] "POSIXct" "POSIXt"
coerce the result to POSIXlt
(Not sure why though)
> t2 < as.POSIXlt(Sys.time())
> t2
[1] "20180319 12:17:49 SAST"
> unclass(t2)
$sec
[1] 49.87161
$min
[1] 17
$hour
[1] 12
$mday
[1] 19
$mon
[1] 2
$year
[1] 118
$wday
[1] 1
$yday
[1] 77
$isdst
[1] 0
$zone
[1] "SAST"
$gmtoff
[1] 7200
attr(,"tzone")
[1] "" "SAST" "SAST"
> str(unclass(t2))
List of 11
$ sec : num 49.9
$ min : int 17
$ hour : int 12
$ mday : int 19
$ mon : int 2
$ year : int 118
$ wday : int 1
$ yday : int 77
$ isdst : int 0
$ zone : chr "SAST"
$ gmtoff: int 7200
 attr(*, "tzone")= chr [1:3] "" "SAST" "SAST"
Just get minutes:
> t2$min
[1] 17
Return day of the week:
> weekdays(d1)
[1] "Monday"
Similarly with months and quarters:
> months(t1)
[1] "March"
> quarters(t2)
[1] "Q1"
strptime()
converts character vectors to POSIXlt. In that sense, it is similar toas.POSIXlt()
, except that the input doesn’t have to be in a particular format (YYYYMMDD).
> t3 < "October 17, 1986 08:24"
> t4 < strptime(t3, "%B %d, %Y %H:%M")
> t4
[1] "19861017 08:24:00 SAST"
> class(t4)
[1] "POSIXlt" "POSIXt"
Comparison of time:
> Sys.time() > t1
[1] TRUE
Time difference:
> Sys.time()  t1
Time difference of 9.086724 mins
Find time difference in specific unit:
> difftime(Sys.time(), t1, units = 'days')
Time difference of 0.006632809 days
Base Graphics
Not covered are more advanced graphics:
 lattice
 ggplot2
 ggvis
Load dataset cars
:
data(cars)
Get help page for cars:
`?cars`
Create basic chart:
> plot(cars)
If dataset has 2 columns it assumes what you want to plot. Since we do not provide labels for either axis, R uses the names of the columns
plot
is short for scatterplot
Can be plotted with:
> plot(x = cars$speed, y=cars$dist)
Setting labels:
> plot(x = cars$speed, y=cars$dist, xlab='Speed', ylab='Stopping Distance')
Plot so points are red
:
> plot(cars, col = 2)
PLot and limit xaxis:
> plot(cars, xlim=c(10, 15))
PLot with triangles:
> plot(cars, pch=2)
Boxplot
You can pass the entire data frame
boxplot()
, like manyR
functions, also takes a “formula” argument, generally an expression with a tilde (“~”) which indicates the relationship between the input variables. This allows you to enter something likempg ~ cyl
to plot the relationship betweencyl
(number of cylinders) on the xaxis andmpg
(miles per gallon) on the yaxis.
> boxplot(formula=mpg ~ cyl, data = mtcars)
A histogram can be used for a single vector:
> hist(mtcars$mpg)