Jeroen: Thank you for pointing me to ndjson, which I had not heard of and is exactly my case.
My experience: jsonlite::stream_in - segfaults ndjson::stream_in - my fault, I am running Ubuntu 14.04 and it is too old so it won't compile the package corpus::read_ndjson - works!!! Of course it does a different simplification than jsonlite::fromJSON, so I have to change some code, but it works beautifully at least in simple tests. The memory-map option may be of use in the future. Another correspondent said that strings in R can only be 2^31-1 long, which is why any "solution" that tries to load the whole file into R first as a string, will fail. Thanks for suggesting a path forward for me! Jen On Sun, Sep 3, 2017 at 2:15 AM, Jeroen Ooms <jeroeno...@gmail.com> wrote: > On Sat, Sep 2, 2017 at 8:58 PM, Jennifer Lyon <jennifer.s.l...@gmail.com> > wrote: > > I have a 2.1GB JSON file. Typically I use readLines() and > > jsonlite:fromJSON() to extract data from a JSON file. > > If your data consists of one json object per line, this is called > 'ndjson'. There are several packages specialized to read ndjon files: > > - corpus::read_ndjson > - ndjson::stream_in > - jsonlite::stream_in > > In particular the 'corpus' package handles large files really well > because it has an option to memory-map the file instead of reading all > of its data into memory. > > If the data is too large to read, you can preprocess it using > https://stedolan.github.io/jq/ to extract the fields that you need. > > You really don't need hadoop/spark/etc for this. > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel