It takes me 50 hours to index a total 9 G file(about 2,000,000 documents) with n-gram filter from min=6,max=10, my token before ngram filter is long(not a word, at most 300,000 bytes with white space). I split into 4 files and use the post.sh to update at the same time. I also tried to write a lucene to do the index myself(single thread). The time is almost the same. I would like to know what's the general bottleneck for the index in solr? Doesn't the solr handle the index update request concurrently?
1. Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 51 3005M 0 0 51 1557M 0 18902 46:19:14 23:59:46 22:19:28 0 2. Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 62 2623M 0 0 62 1632M 0 19839 38:31:16 23:58:01 14:33:15 76629 3. Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 65 2667M 0 0 65 1737M 0 21113 36:48:23 23:58:06 12:50:17 25537 4. Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 58 2766M 0 0 58 1625M 0 19752 40:47:34 23:58:28 16:49:06 81435 -- View this message in context: http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html Sent from the Solr - User mailing list archive at Nabble.com.