Friday, June 11, 2021
10:00 am – 5:15 pm Eastern Time
R is a free and open source statistical programming language that has now surpassed in popularity all commercial equivalents. However, its use in cancer surveillance has still been limited. This course will provide a gentle introduction to R and its application to cancer surveillance, including reading in data, data cleaning, rate calculation, and producing graphics. The second part of the course will use passenger and crew data from the Titanic disaster to illustrate the concept of predictive modeling.
- Gain a basic understanding of the free and open source R programming language
- Gain a basic understanding of RStudio, a widely-used graphical user interface for R
- Learn how to read in external data, such as output from SEER*Stat, and perform what is known as data wrangling: filter by specific rows and columns, manage missing data, group data into categories, handle outliers, and so on.
- Perform typical.
- Produce simple graphics.
- Use R to build a predictive model, using the engaging example of the Titanic passengers and crew.
Participants should have both R and Rstudio pre-installed on a desktop or laptop computer. Instructions for installing this software will be provided a few weeks before the course date. Participants should have a basic familiarity with Windows or Mac OS, including how to install new software and the file and folder structure each uses. No prior programming experience or statistical expertise is presumed.
The essential minimum skills needed to begin using R back at the home office. These skills will be integrated into the two worked examples - first, reading in, wrangling, and displaying some cancer surveillance data, then using Titanic passenger data to build a simple predictive model.
- Francis P. Boscoe, Pumphandle