Access to sqlite is provided via the RSQLite package.
sudo R install.packages("RSQLite")
Note: To fetch all rows of a resultset, append n=-1 to your fetch statement.
library(RSQLite) driver = dbDriver("SQLite") con <- dbConnect(driver,dbname="some_sqlite.db") dbListTables(con) # show tables rs <- dbSendQuery(con, "SELECT * from sensordata limit 10") # send a query data <- fetch(rs,n=3) # fetch 3 rows from result (use -1 to fetch all rows) dbHasCompleted(rs) # checks if other rows left and returns true/false #cleanup dbClearResult(rs) dbDisconnect(con) dbUnloadDriver(driver)
library(RSQLite); driver = dbDriver("SQLite"); con <- dbConnect(driver, dbname="/home/soma/Desktop/testdata.db"); data(USArrests); #prepare some data frame dbWriteTable(con,"arrests", USArrests); # insert it as a table #cleanup dbClearResult(rs); dbDisconnect(con); dbUnloadDriver(driver);
sudo aptitude install r-cran-rmysql
library(RMySQL) roadid = 1234 laneid = 1 drv = dbDriver("MySQL") con = dbConnect(drv,dbname="flow_timeseries",user="123",pass="123",host="10.10.10.10") sql <- paste("SELECT * from timeseries WHERE roadid = ",roadid,"AND laneid = ",laneid,"ORDER BY day") res <- dbSendQuery(con,sql) data <- fetch(res, n = -1) dbDisconnect(con)
Install the r-base postgres server dependencies
sudo apt-get install r-base-dev postgresql-server-dev-8.3 sudo R
In an R shell, install the necessary packages
install.packages("RPostgreSQL", dependencies=TRUE)
Talk to a postgres from within R:
library(RPostgreSQL) drv <- PostgreSQL() library(RdbiPgSQL) drv <- PgSQL() con <- dbConnect(drv, dbname="db", user="1234", password="1234", host="10.10.10.10") res <- dbSendQuery(con, "SELECT * FROM ...") data <- dbGetResult(res) dbDisconnect(con)
Package RODBC on CRAN provides an interface to database sources supporting an ODBC interface. This is very widely available, and allows the same R code to access different database systems. RODBC runs on both Unix/Linux and Windows, and almost all database systems provide support for ODBC.
sudo apt-get install odbcinst1debian2 tdsodbc
As a simple example of using ODBC under Windows with a Excel spreadsheet, we can read from a spreadsheet by
library(RODBC) channel <- odbcConnectExcel("bdr.xls") ## list the spreadsheets > sqlTables(channel)
TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE REMARKS 1 C:\\bdr NA Sheet1$ SYSTEM TABLE NA 2 C:\\bdr NA Sheet2$ SYSTEM TABLE NA 3 C:\\bdr NA Sheet3$ SYSTEM TABLE NA 4 C:\\bdr NA Sheet1$Print_Area TABLE NA
sh1 <- sqlFetch(channel, "Sheet1") sh1 <- sqlQuery(channel, "select * from [Sheet1$]")
Connections provide a flexible way for R to read data from a variety of sources, providing more complete control over the nature of the connection than simply specifying a file name as input to functions like read.table' and 'scan.
Skip last lines of a data file (e.g. last two lines):
.
con <- textConnection(rev(rev(readLines('data.txt'))[-(1:2)])) data <- read.table(con) close(con)
Read data from gzip-compressed file:
gz <- gzfile("datafile.csv.gz", "r") raw <- textConnection(readLines(gz)) close(gz) dataset <- read.table(raw, sep=";", as.is=TRUE, header=TRUE) close(raw)
.
df = data.frame(...) row_count = nrow(df) col_count = ncol(df) dim(df)
R and its contributed packages have a number of datetime (i.e. date or date/time) classes:
classes refer to the two classes
POSIXct,
POSIXlt and their common super class
POSIXt. These support times and dates including time zones and standard vs. daylight savings time.References: R News, The Newsletter of the R Project, Volume 4/1, June 2004, ISSN 1609-3631, http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf
d1 <- as.Date("2008-05-18") class(d1) # Output: [1] "Date" d2 <- strptime("2008-01-01 14:30", "%Y-%m-%d %H:%M") class(d2) # Output: [1] "POSIXt" "POSIXlt" d3 <- as.POSIXct("2008-01-01 14:30", tz="GMT") class(d3) #Output: [1] "POSIXt" "POSIXct"
See strptime for formatting details
All functions also work for lists:
strptime(c("2008-01-01 14:30","2008-02-02 0:30"), "%Y-%m-%d %H:%M")
If you want to store your date in a data.frame you will NOT' be able to use 'POSIXlt. The reason for this is that a POSIXlt actually is a list with 9 elements. So if you want to add your dates to a data.frame you will need to convert your dates to POSIXct:
data$time = as.POSIXct(strptime(data$time_string, "%H:%M:%S"))
.
format(d1, "%a %Y/%m/%d") #[1] "So 2008/05/18" format(d2, "%A %Y/%m/%d") # [1] "Dienstag 2008/01/01"
.
b1 <- ISOdate(1977,7,13) b2 <- ISOdate(2003,8,14) b2 - b1 # Time difference of 9528 days > class(b2-b1) [1] "difftime"
If an alternative unit of time is desired, the <tt>difftime</tt> function can be called, using the optional <tt>units=</tt> argument with any of the following values: “auto”, “secs”, “mins”, “hours”, “days”, or “weeks”.
.
difftime(b2,b1,units="weeks") #Time difference of 1361.143 weeks
The by=> argument to the seq function can be specified either as a
difftime value, or in any units of time that the
difftime function accepts, making it very easy to generate sequences of dates.
.
seq(as.Date("1976-07-04"),by="days",length=10) # [1] "1976-07-04" "1976-07-05" "1976-07-06" "1976-07-07" "1976-07-08" [6] "1976-07-09" "1976-07-10" "1976-07-11" "1976-07-12" "1976-07-13" seq(as.Date("2000-06-01"),to=as.Date("2000-08-01"),by="2 weeks") # [1] "2000-06-01" "2000-06-15" "2000-06-29" "2000-07-13" "2000-07-27" seq(as.POSIXct("2009-03-23 00:00:00", tz="GMT"), length=96, by="15 mins") # [1] "2009-03-23 00:00:00 GMT" "2009-03-23 00:15:00 GMT" [3] "2009-03-23 00:30:00 GMT" "2009-03-23 00:45:00 GMT"
Formulas in R can be thought of as a “little language” since they obey a different structure and syntax from expressions. Expressions when evaluated produce some result such as a number, vector or list which is then displayed by the print function. Formulas on the other hand are used as a concise and intuitive way of specifying a statistical model. For example, consider a multiple linear regression of y on a numeric variable x1 and its squared value, x1^2 and a categorical variable x2. Note that in R categorical variables are called factors. This regression is specified by:
. y ~ x1 + I(x1^2) + x2
and could be fit using the lm function (linear model, regression):
. lm(y ~ x1 + I(x1^2) + x2)
In the formula notation, “~” means the left-hand-side is the independent variable or response and the right-hand-side are the dependent variables. The I(x1^2) means interpret the inside expression as a regular expression in R. Including a factor variable like x2 is very convenient since we don't have to be bothered about specifying all the indicator variables as we would have to do in other statistical software.
. x = c(1,2,3) y = c(1,2,3) plot(x,y, col='red', type='l')
. # draw data from matrix mat # plot each column # use lines between data points … t='l' # limit range of y-coordinate … ylim=c(0,120) matplot(mat,t='l',ylim=c(0,120))
If you are running plots in a script, you will want R to pause until you have viewed one plot, before it creates the next:
. par(ask=TRUE) for(i in 1:3) {
. plot( _something_ ) . }
.
library(plotrix) sql <- paste("select detector_id, 1.0*q/60 as q, v, count(*) as cnt from sensordata_raw where q>0 and detector_id=",det_id," group by detector_id, q, v") # get the max values for the calculation of the counts max_q <- max(data$q)+1 max_v <- max(data$v)+1 len <- length(data$v) # as we use exponential scale, calculate the max value we need for the levels f <- (exp(20/5)-1)/max(data$cnt) levels <- (exp(c(0:20)/5)-1)/f # define a matrix to hold the data and fill it h <- array(0, dim=c(max_q,max_v)) for (i in 1:len) { } # as we use exponential scale, calculate the max value we need for the levels levels = 20 f <- (exp(levels/5)-1)/max(h) breaks <- (exp(c(0:levels)/5)-1)/f colors = rev(heat.colors(levels)) filled.contour( 1:max_q,1:max_v,h, main = paste("Fundamental Diagram Sensor ",det_id," (",det_group,")",sep=""), xlab = "count", ylab = "speed", levels = breaks, nlevels = levels, col = colors)
.
h[data$q[i]+1,data$v[i]+1] = data$cnt[i]
.
year <- c(2000 , 2001 , 2002 , 2003 , 2004) rate <- c(9.34 , 8.50 , 7.62 , 6.93 , 6.60) plot(year,rate) abline(lsfit(year,rate)$coefficients, col="red")
.
library(sp)
.
library(maptools)
map <- read.shape("shp/at_districts_lambert.shp")
.
p <- Map2poly(map)
.
p_centers <- get.Pcent(map)
.
brks <- round(quantile(val, probs=seq(0,1,0.1)), digits=2) col <- rev(heat.colors(length(brks)))
.
col_val <- col[findInterval(val, brks, all.inside=TRUE)]
.
plot(p, col=col_val, forcefill=FALSE, axes=FALSE) text <- as.character(round(val,digits=1)) text(p_centers[,1], p_centers[,2], text, col="darkgreen", cex=0.75)
CRAN Task View http://cran.r-project.org/web/views/Cluster.html Cluster Analysis & Finite Mixture Models
. hclust() kmeans()
flexclust: Flexible Cluster Algorithms
The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, …), and bootstrap methods for the analysis of cluster stability.
CRAN Task View http://cran.nedmirror.nl/web/views/MachineLearning.html Machine Learning & Statistical Learning
= String Manipulation = == Concatenate == To create a string from different chunks use paste:
. > some_string ← “blabla” > paste(“a”, “b”, 14, some_string, sep=“-”) [1] “a-b-14-blabla” > paste(c(“a”, “b”, “c”), collate=“1”, sep=“_”) [1] “a_1” “b_1” “c_1”
Nice collection of howtos: http://help.nceas.ucsb.edu/R:_Spatial
Read in shapefiles, merge them and export to image:
library(maptools) shppointfile="./testdata/points.shp" # simple points file shppolyfile="./testdata/polys.shp" # simple points file shplinefile="./testdata/lines.shp" #simpleLines@data # Name Value #0 Highway 1 #1 Highway 1 #2 Arterial 2 #3 Arterial 2 #4 Arterial 2 #5 Arterial 2 png('./output/test.png') simpleLines <- readShapeLines(shplinefile) # returns a http://sekhon.berkeley.edu/library/sp/html/SpatialLinesDataFrame-class.html colours <- c('red','yellow','green','blue','black') # colors for different road classes plot(simpleLines,col=colours[simpleLines@data$Value], main="Route") legend("topright", fill=unique(simpleLines@data$Value), legend = as.character(unique(simpleLines@data$Value))) # add point layer simplePoints <- readShapePoints(shppointfile) plot(simplePoints,pch=20,add=T) # pch == plotting character # also polylines are possible #simplePolys <- readShapePoly(shppolyfile) #plot(simplePolys,col='blue', add=T) dev.off()
. myts = ts(data=c(1,2,3,4), start=16, end=20)
. extended_ts = window(some_ts, 0,96, extend=TRUE)
Most debugging takes place either through calls to <tt>browser</tt> or <tt>debug</tt>. Both of these functions rely on the same internal mechanism and both provide the user with a special prompt. Any command can be typed at the prompt.
There are five special commands that R interprets differently: ;<tt><RETURN></tt> :Go to the next statement if the function is being debugged. Continue execution if the browser was invoked. ;<tt>c</tt>, <tt>cont</tt> :Continue the execution. ;<tt>n</tt> :Execute the next statement in the function. This works from the browser as well. ;<tt>where</tt> :Show the call stack. ;<tt>Q</tt> :Halt execution and jump to the top-level immediately.
A call to the function <tt>browser</tt> causes R to halt execution at that point and to provide the user with a special prompt.
> foo <- function(s) { c <- 3 browser() } > foo(4) Called from: foo(4) Browse[1] > s [1] 4 Browse[1]> get("c") [1] 3
The debugger can be invoked on any function by using the command <tt>debug(fun)</tt>. Subsequently, each time that function is evaluated the debugger is invoked. The debugger allows you to control the evaluation of the statements in the body of the function. Before each statement is executed the statement is printed out and a special prompt provided.
. > debug(mean.default) > mean(1:10) debugging in: mean.default(1:10) debug: {
. if (na.rm) . x <- x[!is.na(x)] trim <- trim[1] n <- length(c(x, recursive = TRUE)) if (trim > 0) { . if (trim >= 0.5) . return(median(x, na.rm = FALSE)) lo <- floor(n * trim) + 1 hi <- n + 1 - lo x <- sort(x, partial = unique(c(lo, hi)))[lo:hi] n <- hi - lo + 1 } sum(x)/n
} Browse[1]> debug: if (na.rm) x ← x[!is.na(x)] Browse[1]> debug: trim ← trim[1] Browse[1]> debug: n ← length(c(x, recursive = TRUE)) Browse[1]> c exiting from: mean.default(1:10) [1] 5.5
Debugging is turned off by a call to <tt>undebug</tt> with the function as an argument.
Generate a matrix with n rows and m columns, where each entry is drawn from a normal distribution
.
replicate(m, rnorm(n))