Hello Yonik,

Thank you for looking into this. Your question of if I'm using stock solr put me in the right direction. I am in fact using a patched version of solr to get hierarchal facet support (http://issues.apache.org/jira/browse/SOLR-64 ). I took out the 4 hiefacet fields from the schema and the import was back to normal times of less than a minute. This same configuration worked fine with the 5/1 patched build.

Here is the field definition:
<fieldType name="hierarchy" class="solr.HierarchicalFacetField" omitNorms="true" positionIncrementGap="0" indexed="true" stored="false" delimiter="/" />

<!-- fields -->
<field name="category" type="hierarchy" indexed="true" stored="true" multiValued="true"/> <field name="category_seo" type="hierarchy" indexed="true" stored="true" multiValued="true"/>

<!-- facet fields -->
<field name="category_facet" type="hierarchy" indexed="true" stored="false" multiValued="true"/> <field name="category_seo_facet" type="hierarchy" indexed="true" stored="false" multiValued="true"/>

<copyField source="category" dest="category_facet"/>
<copyField source="category_seo" dest="category_seo_facet"/>

CSV file snippet:
category,category_seo
"T-Shirt Mens/Crew Neck/","t-shirt-mens/crew-neck/"

Thanks again!
Nasseam

On Oct 6, 2009, at 3:22 PM, Yonik Seeley wrote:

On Tue, Oct 6, 2009 at 1:06 PM, Nasseam Elkarra <nass...@bodukai.com> wrote:
I had a dev build of 1.4 from 5/1/2009 and importing a 20K row took less than a minute. Updating to the latest as of yesterday, the import is really
slow and I had to cancel it after a half hour. This prevented me from
upgrading a few months ago as well.

I haven't had any success at replicating this problem.

I just tried a 100K row CSV file, consisting of an id and a few text
fields.  The total size of the file is 79MB.

On trunk (today): 22 seconds to index, another 5-7 secons to commit
5/21 version: 28 seconds to index, another 8 seconds to commit

Then I modified the 5/1 schema to closer match the trunk schema
(removing defaults, copyfields that could slow things down).
Modified 5/1 version: 25 seconds to index, another 8 seconds to commit

I only did 2 runs with trunk and 2 with one from 5/1, so the accuracy
is probably low... but good enough to see there wasn't a problem in
this test.

We really need more info to help reproduce this.
Are you using stock solr?  Do you have any custom plugins, analyzers,
token filters, etc?

You're going to need to provide something so others can reproduce this.

-Yonik
http://www.lucidimagination.com

Reply via email to