On Sat, Sep 2, 2017 at 8:58 PM, Jennifer Lyon <[email protected]> wrote: > I have a 2.1GB JSON file. Typically I use readLines() and > jsonlite:fromJSON() to extract data from a JSON file.
If your data consists of one json object per line, this is called 'ndjson'. There are several packages specialized to read ndjon files: - corpus::read_ndjson - ndjson::stream_in - jsonlite::stream_in In particular the 'corpus' package handles large files really well because it has an option to memory-map the file instead of reading all of its data into memory. If the data is too large to read, you can preprocess it using https://stedolan.github.io/jq/ to extract the fields that you need. You really don't need hadoop/spark/etc for this. ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
