How much data and what is the database source? Spark is probably the fastest 
way.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>, wrote:
> Hi,
>
> We are using DIH with SortedMapBackedCache but as data size increases we
> need to provide more heap memory to solr JVM.
> Can we use multiple CSV file instead of database queries and later data in
> CSV files can be joined using zipper? So bottom line is to create CSV files
> for each of entity in data-config.xml and join these CSV files using
> zipper.
> We also tried EHCache based DIH cache but since EHCache uses MMap IO its
> not good to use with MMapDirectoryFactory and causes to exhaust physical
> memory on machine.
> Please suggest how can we handle use case of importing huge amount of data
> into solr.
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669

Reply via email to