Can you try adding &overwrite=false and running against the latest
version? My current working theory is that Solr/Lucene has changed
how deletes are handled such that work that was deferred before is now
not deferred as often. In fact, you are not seeing this cost paid (or
at least not noticing it) because you are not committing, but I
believe you do see it when you are closing down Solr, which is why it
takes so long to exit. I also think that Lucene adding fsync() into
the equation may cause some slow down, but that is a penalty we are
willing to pay as it gives us higher data integrity.
So, depending on how you have your data, I think a workaround is to:
Add a field that contains a single term identifying the data type for
this particular CSV file, i.e. something like field: type, value:
fergs-csv
Then, before indexing, you can issue a Delete By Query: type:fergs-csv
and then add your CSV file using overwrite=false. This amounts to a
batch delete followed by a batch add, but without the add having to
issue deletes for each add.
In the meantime, I'm trying to see if I can pinpoint down a specific
change and see if there is anything that might help it perform better.
-Grant
On Mar 30, 2009, at 4:52 PM, Fergus McMenemie wrote:
Grant,
After all my playing about at boot camp, I gave things a rest. It
was not till months later that got back to looking at solr again.
So after 643465 (2008-Apr-01) the next version I tried was 694377
from (2008-Sep-11). Nothing in between. Yep so 643465 is the latest
version I tried that still performs. Every later revision is slower.
However I need to repeat the tests using 643465, 694377 and whatever
is the latest version. On my macbook I am only seeing a 2x slowdown
of 643465 vis today, where as I had been seeing a 3x slowdown using
my Imac.
Fergus
Fregus,
Is rev 643465 the absolute latest you tried that still performs?
i.e.
every revision after is slower?
-Grant
On Mar 30, 2009, at 12:45 PM, Grant Ingersoll wrote:
Fergus,
I think the problem may actually be due to something that was
introduced by a change to Solr's StopFilterFactory and the way it
loads the stop words set. See https://issues.apache.org/jira/browse/SOLR-1095
I am in the process of testing it out and will let you know.
-Grant
On Mar 28, 2009, at 11:00 AM, Grant Ingersoll wrote:
Hey Fergus,
Finally got a chance to run your scripts, etc. per the thread:
http://www.lucidimagination.com/search/document/5c3de15a4e61095c/upgrade_from_1_2_to_1_3_gives_3x_slowdown_script#8324a98d8840c623
I can reproduce your slowdown.
One oddity with rev 643465 is:
On the old version, there is an exception during startup:
Mar 28, 2009 10:44:31 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at
org
.apache
.solr
.handler
.component.SearchHandler.handleRequestBody(SearchHandler.java:129)
at
org
.apache
.solr
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:
125)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:953)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:968)
at
org
.apache
.solr
.core.QuerySenderListener.newSearcher(QuerySenderListener.java:
50)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:797)
at java.util.concurrent.FutureTask
$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:885)
at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:637)
I see two things in CHANGES.txt that might apply, but I'm not sure:
1. I think commons-csv was upgraded
2. The CSV loader stuff was refactored to share common code
I'm still investigating.
-Grant
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
--
===============================================================
Fergus McMenemie Email:fer...@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===============================================================
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search