Can you try adding &overwrite=false and running against the latest version? My current working theory is that Solr/Lucene has changed how deletes are handled such that work that was deferred before is now not deferred as often. In fact, you are not seeing this cost paid (or at least not noticing it) because you are not committing, but I believe you do see it when you are closing down Solr, which is why it takes so long to exit. I also think that Lucene adding fsync() into the equation may cause some slow down, but that is a penalty we are willing to pay as it gives us higher data integrity.

So, depending on how you have your data, I think a workaround is to:
Add a field that contains a single term identifying the data type for this particular CSV file, i.e. something like field: type, value: fergs-csv Then, before indexing, you can issue a Delete By Query: type:fergs-csv and then add your CSV file using overwrite=false. This amounts to a batch delete followed by a batch add, but without the add having to issue deletes for each add.

In the meantime, I'm trying to see if I can pinpoint down a specific change and see if there is anything that might help it perform better.

-Grant

On Mar 30, 2009, at 4:52 PM, Fergus McMenemie wrote:

Grant,

After all my playing about at boot camp, I gave things a rest. It
was not till months later that got back to looking at solr again.
So after 643465 (2008-Apr-01)  the next version I tried was 694377
from (2008-Sep-11). Nothing in between. Yep so 643465 is the latest
version I tried that still performs. Every later revision is slower.

However I need to repeat the tests using 643465, 694377 and whatever
is the latest version. On my macbook I am only seeing a 2x slowdown
of 643465 vis today, where as I had been seeing a 3x slowdown using
my Imac.

Fergus


Fregus,

Is rev 643465 the absolute latest you tried that still performs? i.e.
every revision after is slower?

-Grant

On Mar 30, 2009, at 12:45 PM, Grant Ingersoll wrote:

Fergus,

I think the problem may actually be due to something that was
introduced by a change to Solr's StopFilterFactory and the way it
loads the stop words set.  See https://issues.apache.org/jira/browse/SOLR-1095

I am in the process of testing it out and will let you know.

-Grant

On Mar 28, 2009, at 11:00 AM, Grant Ingersoll wrote:

Hey Fergus,

Finally got a chance to run your scripts, etc. per the thread:
http://www.lucidimagination.com/search/document/5c3de15a4e61095c/upgrade_from_1_2_to_1_3_gives_3x_slowdown_script#8324a98d8840c623

I can reproduce your slowdown.

One oddity with rev 643465 is:

On the old version, there is an exception during startup:
Mar 28, 2009 10:44:31 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
     at
org
.apache
.solr
.handler
.component.SearchHandler.handleRequestBody(SearchHandler.java:129)
     at
org
.apache
.solr
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:
125)
     at org.apache.solr.core.SolrCore.execute(SolrCore.java:953)
     at org.apache.solr.core.SolrCore.execute(SolrCore.java:968)
     at
org
.apache
.solr .core.QuerySenderListener.newSearcher(QuerySenderListener.java:
50)
     at org.apache.solr.core.SolrCore$3.call(SolrCore.java:797)
     at java.util.concurrent.FutureTask
$Sync.innerRun(FutureTask.java:303)
     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
     at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:885)
     at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:907)
     at java.lang.Thread.run(Thread.java:637)

I see two things in CHANGES.txt that might apply, but I'm not sure:
1. I think commons-csv was upgraded
2. The CSV loader stuff was refactored to share common code

I'm still investigating.

-Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search

--

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to