Converting & opening data files for use with R

<< Previous post: Exploring RStudio’s basic layout & packages

Each person seems to have a different preference for data analysis software. Most people stick to whatever they have learned for their first large research project. Some people are forced into a specific software package because their department or field uses it, which makes it easier to exchange code with other people. None of the most widely used software packages (R, STATA, SPSS, SAS, MATLAB) are without flaws, although some are worse than others. In general, the easier it is to learn, the more limitations it has. R is very powerful, but is relatively hard to learn. The main benefits of R is that it is free, open source, it can handle large data sets, and there is a lot of documentation and help from other R users online in case you’re stuck. That being said, if you just need to do a relatively simple statistical test on a small data set, it’s easier to get started with, for example, SPSS.

The downside of the availability of all of these software packages is that data sets sometimes are in the wrong format. Data sets often only comes in SAS, SPSS or STATA format, and not as an .rdata file. R allows simple importing of these files with:

File → Import Dataset → From Stata/SAS/CSV…. etc.

However, sometimes you’ll find that some of the value or variable labels or other information was lost using this function.

You can also convert it with software such as StatTransfer before importing it into R, if you have access to this software. This may do a little bit better job than the RStudio interface.

Another option is to use the “foreign” package to import your data. It should be installed and present in the ‘Packages’ tab, and you can check the box next to it to activate it. Or if you can’t find it in the list, use:

install.packages("foreign")
library(foreign)

For example, to import an SPSS file:

data.frame <- read.spss("file path/file name to be converted", use.value.labels = FALSE, to.data.frame = TRUE)

For example:

data1 <- read.spss("/Users/MyName/Documents/data1.sav", use.value.labels = FALSE, to.data.frame = TRUE)

To check out details for other formats, click on the package name “foreign” in the packages tab in RStudio, this will load documentation into the same pane.

If you have more exotic data, you can do a google search – there may be a package out there (or built-in functionality in R) that can handle your type of data file.

Once you’ve imported a dataset, it will appear as a data.frame or tbl_df (depending on the package you used) in the Environment tab in the Environment/History pane, if you’ve selected this tab (if not, go to File → Preferences… → Pane Layout, and check the Environment checkbox). If you want to look at the source data in a spreadsheet style, click the small table icon to the right of the data.frame name and details in the Environment tab, and it should appear in the Source pane. You can also do this by typing in the console:

View(data.frame name)

Make sure View is written with a capital V, otherwise you’ll get an error. You can now scroll through your variables and rows to get a feel for your dataset.

 

Next: Merging data files with different variables>>

Leave a Reply

Your email address will not be published. Required fields are marked *