>On Apr 1, 2009, at 9:39 AM, Fergus McMenemie wrote:
>
>> Grant,
>>
>> Redoing the work with your patch applied does not seem to
>
>>
>> make a difference! Is this the expected result?
>
>No, I didn't expect Solr 1095 to fix the problem. Overwrite = false +  
>1095, does, however, AFAICT by your last line, right?
>
>>
>>
>> I did run it again using the full file, this time using my Imac:-
>>      643465    took  22min 14sec             2008-04-01
>>      734796          73min 58sec             2009-01-15
>>      758795          70min 55sec             2009-03-26
>> Again using only the first 1M records with  
>> commit=false&overwrite=true:-
>>      643465    took  2m51.516s               2008-04-01
>>      734796          7m29.326s               2009-01-15
>>      758795          8m18.403s               2009-03-26
>>      SOLR-1095       7m41.699s
>> this time with commit=true&overwrite=true.
>>      643465    took  2m49.200s               2008-04-01
>>      734796          8m27.414s               2009-01-15
>>      758795          9m32.459s               2009-03-26
>>      SOLR-1095       7m58.825s
>> this time with commit=false&overwrite=false.
>>      643465    took  2m46.149s               2008-04-01
>>      734796          3m29.909s               2009-01-15
>>      758795          3m26.248s               2009-03-26
>>      SOLR-1095       2m49.997s
>>
Grant,

Hmmm, the big difference is made by &overwrite=false. But,
can you explain why &overwrite=false makes such a difference.
I am starting off with an empty index and I have checked the
content there are no duplicates in the uniqueKey field.

I guess if &overwrite=false then a few checks can be removed
from the indexing process, and if I am confident that my content
contains no duplicates then this is a good speed up. 

http://wiki.apache.org/solr/UpdateCSV says that if overwrite 
is true (the default) then overwrite documents based on the
uniqueKey. However what will solr/lucene do if the uniqueKey
is not unique and overwrite=false?  

fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | wc -l
 1000000
fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | sort -u | wc -l
 1000000
fergus: /usr/bin/head geonames.txt
RC      UFI     UNI     LAT     LONG    DMS_LAT DMS_LONG        MGRS    JOG     
FC      DSG     PC      CC1     ADM1    ADM2    POP     ELEV    CC2     NT      
LC      SHORT_FORM      GENERIC SORT_NAME       FULL_NAME       FULL_NAME_ND    
MODIFY_DATE
1       -1307828        60524   12.466667       -69.9   122800  -695400 
19PDP0219578323 ND19-14 T       MT              AA      00                      
                PALUMARGA       Palu Marga      Palu Marga      1995-03-23
1       -1307756        -1891720        12.5    -70.016667      123000  -700100 
19PCP8952982056 ND19-14 P       PPLX    

PS. do you want me to do some kind of chop through the
different versions to see where the slow down happened
or are you happy you have nailed it?    
-- 

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Reply via email to