As a follow up: I continued tweaking the data-config.xml, and have been able
to make the commit fail with as little as 3 fields in the sdc.xml, with only
one multivalued field. Even more strange, some fields work and some do not.
For instance, in my dc.xml:

<field column="Taxon"
xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon"
/>
.
.
.
<field column="GenPept"
xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept"
/>

and in the schema.xml:
<field name="GenPept" type="text" indexed="true" stored="false"
multiValued="true" />
.
.
.
<field name="Taxon" type="text" indexed="true" stored="false"
multiValued="true" />
but taxon works and genpept does not. What could possibly account for this
discrepancy? Again, the error logs from the server are exactly that seen in
the first post.

What is going on?


KyleMorrison wrote:
> 
> Yes, this is the most recent version of Solr, stream="true" and stopwords,
> lowercase and removeDuplicate being applied to all multivalued fields?
> Would the filters possibly be causing this? I will not use them and see
> what happens.
> 
> Kyle
> 
> 
> Shalin Shekhar Mangar wrote:
>> 
>> Hmm, strange.
>> 
>> This is Solr 1.3.0, right? Do you have any transformers applied to these
>> multi-valued fields? Do you have stream="true" in the entity?
>> 
>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]>
>> wrote:
>> 
>>>
>>> I apologize for spamming this mailing list with my problems, but I'm at
>>> my
>>> wits end. I'll get right to the point.
>>>
>>> I have an xml file which is ~1GB which I wish to index. If that is
>>> successful, I will move to a larger file of closer to 20GB. However,
>>> when I
>>> run my data-config(let's call it dc.xml) over it, the import only
>>> manages
>>> to
>>> get about 27 rows, out of roughly 200K. The exact same
>>> data-config(dc.xml)
>>> works perfectly on smaller data files of the same type.
>>>
>>> This data-config is quite large, maybe 250 fields. When I run a smaller
>>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
>>> perfectly. The only conclusion I can draw from this is that the
>>> data-config
>>> method just doesn't scale well.
>>>
>>> When the dc.xml fails, the server logs spit out:
>>>
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>> status=0
>>> QTime=95
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
>>> deleteAll
>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> SEVERE: Full Import failed
>>> java.util.ConcurrentModificationException
>>>        at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>        at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>        at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>> status=0
>>> QTime=77
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
>>> deleteAll
>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> SEVERE: Full Import failed
>>> java.util.ConcurrentModificationException
>>>        at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>        at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>        at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>        at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>>
>>> This mass of exceptions DOES NOT occur when I perform the same
>>> full-import
>>> with sdc.xml. As far as I can tell, the only difference between the two
>>> files is the amount of fields they contain.
>>>
>>> Any guidance or information would be greatly appreciated.
>>> Kyle
>>>
>>>
>>> PS The schema.xml in use specifies almost all fields as multivalued, and
>>> has
>>> a copyfield for almost every field. I can fix this if it is causing my
>>> problem, but I would prefer not to.
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> -- 
>> Regards,
>> Shalin Shekhar Mangar.
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19749991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to