Data analysis with R – getting started

After trying out several online courses, tutorials, and books, and learning a lot of useless information on what R is capable of, I realized the best way to really learn how to use R is to actually apply it. I’ll be discussing the steps on how to analyze a health-related data set from start to finish, and you can apply everything to your own data set. No prior experience with R is required.

First install R and RStudio on your computer if you haven’t already. R is the actual program running in the background, and RStudio is a more user-friendly “integrated development environment” (IDE), that will make your life a lot easier. The latest preview release (v0.99.879) of RStudio contains a few good upgrades by the way, such as viewing more than 1000 cases in the spreadsheet, but this is still a beta version at the time of writing. Besides the many good guides you can find anywhere online on how to use R, RStudio released a few handy cheat sheets (such as the data visualization sheet) that you can download as well.

To follow along with these blog posts you can use any data set you like. If you don’t have a data set at hand, do a quick google search in your field of interest; many are publicly available. Some are only available after signing up or explaining what you’re going to do with it. For your own project, it helps to be interested in the research topic, because you will come up with more interesting and meaningful research questions to answer along the way.

After installing R and Rstudio, the second step is to become familiar with the study design of your data set and the variables that have been measured, and to develop general hypotheses to test. Often data sets have background information included, so start reading these materials and come up with a few questions you want to answer.

As the data set I’m using contains a baseline and several follow-up questionnaires, one of my goals is to determine correlations of specific variables at different time points. I’ll also do regression, mediation and moderation analyses. As I’m learning along the way, any comments on steps that are missing, incorrect or inefficient are much appreciated! I’ll do my best to update posts to include these suggestions.

 

Next: Creating a new project in RStudio>>

Leave a Reply

Your email address will not be published. Required fields are marked *