>On Apr 1, 2009, at 9:39 AM, Fergus McMenemie wrote: > >> Grant, >> >> Redoing the work with your patch applied does not seem to > >> >> make a difference! Is this the expected result? > >No, I didn't expect Solr 1095 to fix the problem. Overwrite = false + >1095, does, however, AFAICT by your last line, right? > >> >> >> I did run it again using the full file, this time using my Imac:- >> 643465 took 22min 14sec 2008-04-01 >> 734796 73min 58sec 2009-01-15 >> 758795 70min 55sec 2009-03-26 >> Again using only the first 1M records with >> commit=false&overwrite=true:- >> 643465 took 2m51.516s 2008-04-01 >> 734796 7m29.326s 2009-01-15 >> 758795 8m18.403s 2009-03-26 >> SOLR-1095 7m41.699s >> this time with commit=true&overwrite=true. >> 643465 took 2m49.200s 2008-04-01 >> 734796 8m27.414s 2009-01-15 >> 758795 9m32.459s 2009-03-26 >> SOLR-1095 7m58.825s >> this time with commit=false&overwrite=false. >> 643465 took 2m46.149s 2008-04-01 >> 734796 3m29.909s 2009-01-15 >> 758795 3m26.248s 2009-03-26 >> SOLR-1095 2m49.997s >> Grant,
Hmmm, the big difference is made by &overwrite=false. But, can you explain why &overwrite=false makes such a difference. I am starting off with an empty index and I have checked the content there are no duplicates in the uniqueKey field. I guess if &overwrite=false then a few checks can be removed from the indexing process, and if I am confident that my content contains no duplicates then this is a good speed up. http://wiki.apache.org/solr/UpdateCSV says that if overwrite is true (the default) then overwrite documents based on the uniqueKey. However what will solr/lucene do if the uniqueKey is not unique and overwrite=false? fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | wc -l 1000000 fergus: perl -nlaF"\t" -e 'print "$F[2]";' geonames.txt | sort -u | wc -l 1000000 fergus: /usr/bin/head geonames.txt RC UFI UNI LAT LONG DMS_LAT DMS_LONG MGRS JOG FC DSG PC CC1 ADM1 ADM2 POP ELEV CC2 NT LC SHORT_FORM GENERIC SORT_NAME FULL_NAME FULL_NAME_ND MODIFY_DATE 1 -1307828 60524 12.466667 -69.9 122800 -695400 19PDP0219578323 ND19-14 T MT AA 00 PALUMARGA Palu Marga Palu Marga 1995-03-23 1 -1307756 -1891720 12.5 -70.016667 123000 -700100 19PCP8952982056 ND19-14 P PPLX PS. do you want me to do some kind of chop through the different versions to see where the slow down happened or are you happy you have nailed it? -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================