4 - Advanced R

Objectives

  • An Introduction to data.table
  • An Introduction to dplyr
  • A Brief Overview of Parallel Computing with R, and some Big Data considerations
  • Getting basic understanding of single vs multithread computing, parallelism, benchmarking

Topics

  • data.table
    • The [i, j, by] idiom
    • fread and fwrite
    • dcast and melt
    • examples / case study in appendix
  • dplyr
    • filter, select, mutate
    • pipe (now native since R 4.1.0)
    • examples / case study
  • Parallel Computing with R which is single threaded
    • It math libraries may not be
    • parallel package as perfect start: mclapply, parLapply
    • simple benchmarking
    • big data / external memory / bigmemory
  • Efficient R Programming (Gillespie/Lovelace book)
    • Chapter 3: Efficient Programming
    • Chapter 5: Efficient I/O
    • Chapter 6: Efficient Data Carpentry
    • Chapter 7: Efficient Optimization
    • Efficient data wrangling

Core Material

Lecture Slides

Lecture Videos

Extras

Additional Resources