Hi, We are facing a huge performance issue while indexing the data to Solr, we have around 15 million records in a PostgreSql database which has to be indexed to Solr 5.3.1 server. It takes around 16 hours to complete the indexing as of now.
To be noted that all the fields are stored so as to support the atomic updates. Current approach: We use a ETL tool(Pentaho) to fetch the data from database in chunks of 1000 records, convert them into xml format and pushes to Solr. This is run in 10 parallel threads. System params Solr Version: 5.3.1 Size on disk: 425 GB Database, ETL machine and SOLR are of 16 core and 30 GB RAM Database and SOLR Disk: RAID Any pointers best approaches to index these kind of data would be helpful. -- Regards, Aneesh Mon N Chennai +91-8197-188-588