class: center, middle, inverse, title-slide # R4SPA: R Packages and Training to enable Statistical Programming in R ### Kieran Martin ### 2018/08/16 --- # Outline .font150[ Who are we? What is the problem we are facing? How are we solving it? ] --- # Who are we? - This work today is a collaboration between two people: - Kieran Martin and Craig Gower - check out https://github.com/gowerc and https://github.com/kieranjmartin - My twitter is @kjmartinstats - Both Data Analytic Specialists at Roche .footnote[Slides today are hosted here: https://kieranjmartin.github.io/R4SPA-talk/r4spa.html] --- # What is the problem? Statistical Programmers want to use R! <br><br> -- - Lots of attendance on R training courses - Lots of engagement in discussion around R -- <br><br><br> .content-box-green[But very little... actual R outputs!] --- # What is the problem? Lack of use cases for R -- One key piece of programming work is setting up and qcing analysis datasets -- Reluctance to use R for this task <br><br><br> -- .content-box-blue-centre[ **Why?** ] --- # Why? .content-box-blue[ Belief that data manipulation in R is **difficult** ] -- <br> .content-box-blue[ Lack of tools (proc compare) ] -- <br> .content-box-blue[ Training was disconnected from real tasks ] --- # What are we doing? .content-box-green[ In house training on **data manipulation** using the **tidyverse** ] -- **What makes this different?** -- - Focused on **one** task: data derivation -- - tidyverse makes **easier** to read code (those with less R experience) -- - Exercises based on our data and specifications -- - Plan to train people with a specific **use case** for the training --- #What we are doing? **diffdf** .content-box-green[ Package for **comparing datasets** Gives **informative feedback** on where issues are ] -- Main page: https://gowerc.github.io/diffdf/ Now on CRAN: https://CRAN.R-project.org/package=diffdf Check out github: https://github.com/gowerc/diffdf --- # diffdf: missing columns: <br> ```r library(diffdf) test_data2 <- test_data test_data2 <- test_data2[,-6] diffdf(test_data , test_data2) ``` ``` ## Differences found between the objects! ## ## A summary is given below. ## ## There are columns in BASE that are not in COMPARE !! ## All rows are shown in table below ## ## ========= ## COLUMNS ## --------- ## DATE ## --------- ``` --- # diffdf: missing rows <br> ```r test_data2 <- test_data test_data2 <- test_data2[1:(nrow(test_data2) - 2),] diffdf(test_data, test_data2, keys = "ID") ``` ``` ## Differences found between the objects! ## ## A summary is given below. ## ## There are rows in BASE that are not in COMPARE !! ## All rows are shown in table below ## ## ==== ## ID ## ---- ## 29 ## 30 ## ---- ``` --- # diffdf: different values <br> ```r test_data2 <- test_data test_data2[5,2] <- 6 difval <- diffdf(test_data , test_data2, keys = "ID" ) difval$NumDiff ``` ``` ## # A tibble: 1 x 2 ## Variable `No of Differences` ## * <chr> <int> ## 1 GROUP1 1 ``` ```r difval$VarDiff_GROUP1 ``` ``` ## # A tibble: 1 x 4 ## VARIABLE ID BASE COMPARE ## * <chr> <int> <dbl> <dbl> ## 1 GROUP1 5 1 6 ``` --- # diffdf: different attributes <br> ```r test_data2 <- test_data attr(test_data$ID , "label") <- "This is a interesting label" attr(test_data2$ID , "label") <- "A different label" diffdf(test_data , test_data2, keys = "ID" ) ``` ``` ## Differences found between the objects! ## ## A summary is given below. ## ## There are columns in BASE and COMPARE with differing attributes !! ## All rows are shown in table below ## ## ===================================================================== ## VARIABLE ATTR_NAME VALUES.BASE VALUES.COMP ## --------------------------------------------------------------------- ## ID label This is a interesting label A different label ## --------------------------------------------------------------------- ``` --- # Plans for the future .content-box-green[ Roll out training across sites ] -- <br> .content-box-blue[ Build more packages to address common problems ] -- <br> .content-box-green[ Build more training focusing on different tasks in R ]