Hello Yonik,
Thank you for looking into this. Your question of if I'm using stock
solr put me in the right direction. I am in fact using a patched
version of solr to get hierarchal facet support (http://issues.apache.org/jira/browse/SOLR-64
). I took out the 4 hiefacet fields from the schema and the import was
back to normal times of less than a minute. This same configuration
worked fine with the 5/1 patched build.
Here is the field definition:
<fieldType name="hierarchy" class="solr.HierarchicalFacetField"
omitNorms="true" positionIncrementGap="0" indexed="true"
stored="false" delimiter="/" />
<!-- fields -->
<field name="category" type="hierarchy" indexed="true" stored="true"
multiValued="true"/>
<field name="category_seo" type="hierarchy" indexed="true"
stored="true" multiValued="true"/>
<!-- facet fields -->
<field name="category_facet" type="hierarchy" indexed="true"
stored="false" multiValued="true"/>
<field name="category_seo_facet" type="hierarchy" indexed="true"
stored="false" multiValued="true"/>
<copyField source="category" dest="category_facet"/>
<copyField source="category_seo" dest="category_seo_facet"/>
CSV file snippet:
category,category_seo
"T-Shirt Mens/Crew Neck/","t-shirt-mens/crew-neck/"
Thanks again!
Nasseam
On Oct 6, 2009, at 3:22 PM, Yonik Seeley wrote:
On Tue, Oct 6, 2009 at 1:06 PM, Nasseam Elkarra
<nass...@bodukai.com> wrote:
I had a dev build of 1.4 from 5/1/2009 and importing a 20K row took
less
than a minute. Updating to the latest as of yesterday, the import
is really
slow and I had to cancel it after a half hour. This prevented me from
upgrading a few months ago as well.
I haven't had any success at replicating this problem.
I just tried a 100K row CSV file, consisting of an id and a few text
fields. The total size of the file is 79MB.
On trunk (today): 22 seconds to index, another 5-7 secons to commit
5/21 version: 28 seconds to index, another 8 seconds to commit
Then I modified the 5/1 schema to closer match the trunk schema
(removing defaults, copyfields that could slow things down).
Modified 5/1 version: 25 seconds to index, another 8 seconds to commit
I only did 2 runs with trunk and 2 with one from 5/1, so the accuracy
is probably low... but good enough to see there wasn't a problem in
this test.
We really need more info to help reproduce this.
Are you using stock solr? Do you have any custom plugins, analyzers,
token filters, etc?
You're going to need to provide something so others can reproduce
this.
-Yonik
http://www.lucidimagination.com