CSV -> Spark -> SolR https://github.com/lucidworks/spark-solr/blob/master/docs/examples/csv.adoc
If speed is not an issue there are other methods. Spring Batch / Spring Data might have all the tools you need to get speed without Spark. -- Rahul Singh rahul.si...@anant.us Anant Corporation On Apr 12, 2018, 1:10 PM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>, wrote: > Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data size > is around 100GB. > I am not much familiar with spark but are you suggesting that we should > create document by merging distinct RDBMS tables in using RDD? > > On Thu, Apr 12, 2018 at 10:06 PM, Rahul Singh <rahul.xavier.si...@gmail.com > wrote: > > > How much data and what is the database source? Spark is probably the > > fastest way. > > > > -- > > Rahul Singh > > rahul.si...@anant.us > > > > Anant Corporation > > > > On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>, > > wrote: > > > Hi, > > > > > > We are using DIH with SortedMapBackedCache but as data size increases we > > > need to provide more heap memory to solr JVM. > > > Can we use multiple CSV file instead of database queries and later data > > in > > > CSV files can be joined using zipper? So bottom line is to create CSV > > files > > > for each of entity in data-config.xml and join these CSV files using > > > zipper. > > > We also tried EHCache based DIH cache but since EHCache uses MMap IO its > > > not good to use with MMapDirectoryFactory and causes to exhaust physical > > > memory on machine. > > > Please suggest how can we handle use case of importing huge amount of > > data > > > into solr. > > > > > > -- > > > Thanks, > > > Sujay P Bawaskar > > > M:+91-77091 53669 > > > > > > -- > Thanks, > Sujay P Bawaskar > M:+91-77091 53669