It takes me 50 hours to index a total 9 G file(about 2,000,000 documents)
with n-gram filter from min=6,max=10, my token before ngram filter is
long(not a word, at most 300,000 bytes with white space). I split into 4
files and use the post.sh to update at the same time. I also tried to write
a lucene to do the index myself(single thread). The time is almost the same.
I would like to know what's the general bottleneck for the index in solr?
Doesn't the solr handle the index update request concurrently?

1.
Posting file /ngram_678910/file1.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
Current
                                 Dload  Upload   Total   Spent    Left 
Speed
 51 3005M    0     0   51 1557M      0  18902 46:19:14 23:59:46 22:19:28    
0
2.
Posting file /ngram_678910/file2.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
Current
                                 Dload  Upload   Total   Spent    Left 
Speed
 62 2623M    0     0   62 1632M      0  19839 38:31:16 23:58:01 14:33:15
76629
3.
Posting file /ngram_678910/file3.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
Current
                                 Dload  Upload   Total   Spent    Left 
Speed
 65 2667M    0     0   65 1737M      0  21113 36:48:23 23:58:06 12:50:17
25537
4.
Posting file /ngram_678910/file4.xml to http://localhost:8988/solr/update
  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
Current
                                 Dload  Upload   Total   Spent    Left 
Speed
 58 2766M    0     0   58 1625M      0  19752 40:47:34 23:58:28 16:49:06
81435


--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-index-speed-in-the-solr-tp3931338p3931338.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to