How much data and what is the database source? Spark is probably the fastest way.
-- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>, wrote: > Hi, > > We are using DIH with SortedMapBackedCache but as data size increases we > need to provide more heap memory to solr JVM. > Can we use multiple CSV file instead of database queries and later data in > CSV files can be joined using zipper? So bottom line is to create CSV files > for each of entity in data-config.xml and join these CSV files using > zipper. > We also tried EHCache based DIH cache but since EHCache uses MMap IO its > not good to use with MMapDirectoryFactory and causes to exhaust physical > memory on machine. > Please suggest how can we handle use case of importing huge amount of data > into solr. > > -- > Thanks, > Sujay P Bawaskar > M:+91-77091 53669