2 min read



I spent almost all of my blogging time last month to follow an online course in Coursera, Computing for Data Analysis (with R) by Dr. Roger  Peng of Johns Hopkins, Biostatistics Department. I already checked out bunch of Coursera courses just to take a look at what else MOOC look like, this R course was the first and only one I took seriously: I reviewed all slides (although didn’t watch the videos since I didn’t have enough time) and most important, finished all programming assignments(and got full score!).

Not a frequent R user, I used some R packages for learning purpose (mostly in data mining). This Computing for Data Analysis course focused mostly on data manipulation and that’s why I chose it to dig into R. R by design is not a tool for data management (where SAS excels), but it’s nice to have (no hurt since it’s free). Those days I put most of my efforts on R data structures(vector, matrix, array, list, data frame) and by grouping processing(*ply functions). Few resource I found extremely useful as a starter:

  • http://stackoverflow.com/questions/tagged/r  when I google a specific R programming technique, mostly I will reach to stackoverflow website, a new generation Q&A hub to replace user forum, mailing list.
  • _An introduction to R_ by Longhow Lam, a free book. I like its Chapter 2, Data Objects, including R data types and data structures.
  • Data Manipulation with R by Phil Spector, one of the Use R! books. I used its Chapter 8, Data Aggregation on R by group processing.
  • RStudio, a must have R IDE, the best of best.
  • Package plyr, tools for splitting, applying and combining data. Actually, it is awful to use R to manipulate data from a SAS programmer point of view. Use plyr to alleviate the pain.
  • Package sqldf, like SAS Proc SQL against data frame. Reason, ibid. Use sqldf won’t enhance your R programming skill, but sometime when you feel down at R, you may want to launch piece of SQL scripts you really like.