R Shiny App Download Large Rdata Files
Loading large data frames when building Shiny Apps can have a significant impact on the app initialization time. When we ran into this issue in a recent project, we decided to conduct a review of the available methods for reading data from csv files (as provided by our client) to R . In this article, we will identify the most efficient of these methods using benchmarking and explain our workflow.
We will compare the following:
-
read.csv
fromutils
, which was the standard way of reading csv files to R in RStudio, -
read_csv
fromreadr
which replaced the former method as a standard way of doing it in RStudio, -
load
andreadRDS
frombase
, and -
read_feather
fromfeather
andfread
fromdata.table
.
Data
We need to generate some random data to commence with our test…
set.seed ( 123 ) df <- data.frame ( replicate ( 10 , sample ( 0 : 2000 , 15 * 10 ^ 5 , rep = TRUE )), replicate ( 10 , stringi :: stri_rand_strings ( 1000 , 5 )))
…and save the files on a disk to evaluate the load time. Besides the csv
format we will also need feather
, RDS
and Rdata
files.
path_csv <- '../assets/data/fast_load/df.csv' path_feather <- '../assets/data/fast_load/df.feather' path_rdata <- '../assets/data/fast_load/df.RData' path_rds <- '../assets/data/fast_load/df.rds'
library ( feather ) library ( data.table )
write.csv ( df , file = path_csv , row.names = F ) write_feather ( df , path_feather ) save ( df , file = path_rdata ) saveRDS ( df , path_rds )
Next, we can check the resulting file sizes:
files <- c ( '../assets/data/fast_load/df.csv' , '../assets/data/fast_load/df.feather' , '../assets/data/fast_load/df.RData' , '../assets/data/fast_load/df.rds' ) info <- file.info ( files ) info $ size_mb <- info $ size / ( 1024 * 1024 ) print ( subset ( info , select = c ( "size_mb" )))
## size_mb ## ../assets/data/fast_load/df.csv 1780.3005 ## ../assets/data/fast_load/df.feather 1145.2881 ## ../assets/data/fast_load/df.RData 285.4836 ## ../assets/data/fast_load/df.rds 285.4837
Both csv
and feather
format files take up much more storage space. Csv
takes up 6 times and feather
4 times more space as compared toRDS
and RData
.
Looking to learn more about importing data into R, this DataCamp tutorial that covers all you need to know about importing simple text files to more advanced SPSS and SAS files.
Benchmark
We will use themicrobenchmark
library to compare the read times in 10 rounds for the following methods:
- utils::read.csv
- readr::read_csv
- data.table::fread
- base::load
- base::readRDS
- feather::read_feather
library ( microbenchmark ) benchmark <- microbenchmark ( readCSV = utils :: read.csv ( path_csv ), readrCSV = readr :: read_csv ( path_csv , progress = F ), fread = data.table :: fread ( path_csv , showProgress = F ), loadRdata = base :: load ( path_rdata ), readRds = base :: readRDS ( path_rds ), readFeather = feather :: read_feather ( path_feather ), times = 10 ) print ( benchmark , signif = 2 )
##Unit: seconds ## expr min lq mean median uq max neval ## readCSV 200.0 200.0 211.187125 210.0 220.0 240.0 10 ## readrCSV 27.0 28.0 29.770890 29.0 32.0 33.0 10 ## fread 15.0 16.0 17.250016 17.0 17.0 22.0 10 ## loadRdata 4.4 4.7 5.018918 4.8 5.5 5.9 10 ## readRds 4.6 4.7 5.053674 5.1 5.3 5.6 10 ## readFeather 1.5 1.8 2.988021 3.4 3.6 4.1 10
And the winner is… feather
! However, using feather
requires prior conversion of the file to the feather format.
Using load
or readRDS
can improve performance (second and third place in terms of speed) and has an added benefit of storing a smaller/compressed file. In both cases, it is necessary to first convert the file to the proper format.
When it comes to reading from thecsv
formatfread
significantly beats read_csv
and read.csv
, and thus is the best option to read a csv
file.
Supercharge your R Shiny dashboards with 10x faster data loading with Apache Arrow in R.
Ultimately, we chose to work with feather
files. The csv
to feather
conversion process is quick and we did not have a strict limitation on storage space in which case either the Rds
or RData
formats could probably have been a more appropriate choice.
The final workflow was:
- reading a
csv
file provided by our customer usingfread
, - writing it to
feather
usingwrite_feather
, and - loading a
feather
file on app initialization usingread_feather
.
The first two tasks were done once and outside of the Shiny App context.
There is also quite an interesting benchmark done by Hadley on reading complete files to R. Please note that if you use functions defined in that post, you will end up with a character-type object and will have to apply string manipulations to obtain a commonly and widely used dataframe.
If you run into any issues, as an RStudio Full Certified Partner, our team at Appsilon is ready to answer your questions about loading data into R and other topics related to R Shiny, Data Analytics, and Machine Learning. We're experts in this area, and we'd love to chat – you can reach out to us at [email protected]
Follow Us for More
- Follow @Appsilon on Twitter
- Follow Appsilon on LinkedIn
- Learn more about our R Shiny open source packages
Reach out to Appsilon
Olga Mierzwa-Sulima
Engineering Manager,
Data4Good Lead
Source: https://appsilon.com/fast-data-loading-from-files-to-r/
Posted by: morepeople54.blogspot.com