6.2
Using “R” to Analyze Data
6.2.1 What is “R”
You collected a data set using a GCDC logger and realized, “Wow, that's a lot
of data! Now what?”. Data analysis is tedious and the process is particular to
each user's application. Don't expect to find a magic software solution that
will reduce your data into your perfect answer. However, don't despair.
There are several options available, combined with a little bit of user effort,
that provide powerful and versatile analysis capabilities.
Spreadsheets, such as Microsoft Excel or OpenOffice Calc, are great choices for plotting moderately
sized data sets. The user interfaces are highly polished and customized plotting is easy to handle.
Although, most spreadsheets can handle only about 100,000 lines of data before performance begins to
slow. Furthermore, scripting complex analysis procedures in a spreadsheet is cumbersome. We
recommend trying “R” because it is more powerful than a spreadsheet and it is easy to learn.
“R” is a high-level programming language used most commonly for statistical analysis of data. R is
based on the “S” language, which was developed by the Bell Laboratories in the 1970s. R provides a
simple workspace environment that can manipulate large data sets using simple math commands and
complex function libraries. R is widely used by statisticians and data miners and the language is well
supported by the open source community. The software is compact, free, and available for Windows,
Mac, and Linux (visit
www.r-project.org
).
Matlab is another common software application for analyzing data but it is usually reserved to
universities or businesses with copious budgets (it's expensive software!). Octave is a free open source
adaptation of Matlab with nearly the same capabilities. Although, Octave is a significantly larger
download and more complicated installation than R. We favor R because it's small, easy to learn, and
free.
Gulf Coast Data Concepts
Page 25
X16-1D, Rev B
Figure 24: R Command Line Interface