I guess it is a threading problem. I can give you a patch. you can raise a bug --Noble
On Wed, Oct 1, 2008 at 2:11 AM, KyleMorrison <[EMAIL PROTECTED]> wrote: > > As a follow up: I continued tweaking the data-config.xml, and have been able > to make the commit fail with as little as 3 fields in the sdc.xml, with only > one multivalued field. Even more strange, some fields work and some do not. > For instance, in my dc.xml: > > <field column="Taxon" > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Taxonomy/Lineage/Taxon" > /> > . > . > . > <field column="GenPept" > xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/GenPept" > /> > > and in the schema.xml: > <field name="GenPept" type="text" indexed="true" stored="false" > multiValued="true" /> > . > . > . > <field name="Taxon" type="text" indexed="true" stored="false" > multiValued="true" /> > but taxon works and genpept does not. What could possibly account for this > discrepancy? Again, the error logs from the server are exactly that seen in > the first post. > > What is going on? > > > KyleMorrison wrote: >> >> Yes, this is the most recent version of Solr, stream="true" and stopwords, >> lowercase and removeDuplicate being applied to all multivalued fields? >> Would the filters possibly be causing this? I will not use them and see >> what happens. >> >> Kyle >> >> >> Shalin Shekhar Mangar wrote: >>> >>> Hmm, strange. >>> >>> This is Solr 1.3.0, right? Do you have any transformers applied to these >>> multi-valued fields? Do you have stream="true" in the entity? >>> >>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> >>> wrote: >>> >>>> >>>> I apologize for spamming this mailing list with my problems, but I'm at >>>> my >>>> wits end. I'll get right to the point. >>>> >>>> I have an xml file which is ~1GB which I wish to index. If that is >>>> successful, I will move to a larger file of closer to 20GB. However, >>>> when I >>>> run my data-config(let's call it dc.xml) over it, the import only >>>> manages >>>> to >>>> get about 27 rows, out of roughly 200K. The exact same >>>> data-config(dc.xml) >>>> works perfectly on smaller data files of the same type. >>>> >>>> This data-config is quite large, maybe 250 fields. When I run a smaller >>>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works >>>> perfectly. The only conclusion I can draw from this is that the >>>> data-config >>>> method just doesn't scale well. >>>> >>>> When the dc.xml fails, the server logs spit out: >>>> >>>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute >>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>>> status=0 >>>> QTime=95 >>>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter >>>> doFullImport >>>> INFO: Starting Full Import >>>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 >>>> deleteAll >>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter >>>> doFullImport >>>> SEVERE: Full Import failed >>>> java.util.ConcurrentModificationException >>>> at >>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>> at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>> at >>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute >>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>>> status=0 >>>> QTime=77 >>>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter >>>> doFullImport >>>> INFO: Starting Full Import >>>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 >>>> deleteAll >>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter >>>> doFullImport >>>> SEVERE: Full Import failed >>>> java.util.ConcurrentModificationException >>>> at >>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>> at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>> at >>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>> at >>>> >>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>>> >>>> This mass of exceptions DOES NOT occur when I perform the same >>>> full-import >>>> with sdc.xml. As far as I can tell, the only difference between the two >>>> files is the amount of fields they contain. >>>> >>>> Any guidance or information would be greatly appreciated. >>>> Kyle >>>> >>>> >>>> PS The schema.xml in use specifies almost all fields as multivalued, and >>>> has >>>> a copyfield for almost every field. I can fix this if it is causing my >>>> problem, but I would prefer not to. >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>>> >>> >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19749991.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul