Hi,

We are facing a huge performance issue while indexing the data to Solr, we
have around 15 million records in a PostgreSql database which has to be
indexed to Solr 5.3.1 server.
It takes around 16 hours to complete the indexing as of now.

To be noted that all the fields are stored so as to support the atomic
updates.

Current approach:
We use a ETL tool(Pentaho) to fetch the data from database in chunks of
1000 records, convert them into xml format and pushes to Solr. This is run
in 10 parallel threads.

System params
Solr Version: 5.3.1
Size on disk: 425 GB

Database, ETL machine and SOLR are of 16 core and 30 GB RAM
Database and SOLR Disk: RAID

Any pointers best approaches to index these kind of data would be helpful.

-- 
Regards,
Aneesh Mon N
Chennai
+91-8197-188-588

Reply via email to