DataImport troubleshooting
I have searched the forum and the internet at large to find an answer to my simple problem, but have been unable. I am trying to get a simple dataimport to work, and have not been able to. I have Solr installed on an Apache server on Unix. I am able to commit and search for files using the usual Simple* tools. These files begin with ... and so on. On the data import, I have inserted /R1/home/shoshana/kyle/Documents/data-config.xml into solrconfig, and the data import looks like this: http://helix.ccb.sickkids.ca:8080/"; encoding="UTF-8" /> I apologize for the ugly xml. Nonetheless, when I go to http://host:8080/solr/dataimport, I get a 404, and when I go to http://host:8080/solr/admin/dataimport.jsp and try to "debug", nothing happens. I have editted out the host name because I don't know if the employer would be ok with it. Any guidance? Thanks in advance, Kyle -- View this message in context: http://www.nabble.com/DataImport-troubleshooting-tp19630990p19630990.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImport troubleshooting
Thank you for help. The problem was actually just stupidity on my part, as it seems I was running the wrong startup and shutdown shells for the server, and thus the server was getting restarted. I restarted the server and I can at least access those pages. I'm getting some wonky output, but I assume this will be sorted out. Kyle Shalin Shekhar Mangar wrote: > > Are there any exceptions in the log file when you start Solr? > > On Tue, Sep 23, 2008 at 9:31 PM, KyleMorrison <[EMAIL PROTECTED]> wrote: > >> >> I have searched the forum and the internet at large to find an answer to >> my >> simple problem, but have been unable. I am trying to get a simple >> dataimport >> to work, and have not been able to. I have Solr installed on an Apache >> server on Unix. I am able to commit and search for files using the usual >> Simple* tools. These files begin with ... and so on. >> >> On the data import, I have inserted >> > class="org.apache.solr.handler.dataimport.DataImportHandler"> >> >> > name="config">/R1/home/shoshana/kyle/Documents/data-config.xml >> >> >> >> into solrconfig, and the data import looks like this: >> >>> baseUrl="http://helix.ccb.sickkids.ca:8080/"; encoding="UTF-8" /> >> >>> forEach="/iProClassDatabase/iProClassEntry/" >> url="/R1/home/shoshana/kyle/Documents/exampleIproResult.xml"> >>> >> xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/UniProtKB/UniProtKB_Accession"> >>> >> xpath="/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Enzyme_Function/EC/Nomenclature" >> /> >>> >> xpath="/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Bibliography/References/PMID" >> /> >>> xpath="/iProClassDatabase/iProClassEntry/SEQUENCE/Sequence_Length" /> >> >> >> >> >> I apologize for the ugly xml. Nonetheless, when I go to >> http://host:8080/solr/dataimport, I get a 404, and when I go to >> http://host:8080/solr/admin/dataimport.jsp and try to "debug", nothing >> happens. I have editted out the host name because I don't know if the >> employer would be ok with it. Any guidance? >> >> Thanks in advance, >> Kyle >> -- >> View this message in context: >> http://www.nabble.com/DataImport-troubleshooting-tp19630990p19630990.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/DataImport-troubleshooting-tp19630990p19635170.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Multiple Fields with the Same Name
I'm trying to index fields as such: 6100966 375010 2338917 1943701 1357528 3301821 2450046 8940112 6251457 293 6262769 2693214 2839489 6283093 2666401 6343085 1721838 6377309 3882429 6302075 And in the xml schema we see However, when I search for entries in PMID, the only one that ever gets stored is the last one in the list. For instance, q=PMID:6302075 returns a document, whereas q=PMID:3882429 does not. Shouldn't the data import handler take care of this, or am I misunderstanding the function of mulitValued="true"? Kyle -- View this message in context: http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19655285.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Multiple Fields with the Same Name
That was indeed the error, I apologize for wasting your time. Thank you very much for the help. Kyle Shalin Shekhar Mangar wrote: > > Is that a mis-spelling? > > mulitValued="true" > > On Thu, Sep 25, 2008 at 12:12 AM, KyleMorrison <[EMAIL PROTECTED]> wrote: > >> >> I'm trying to index fields as such: >>6100966 >>375010 >>2338917 >>1943701 >>1357528 >>3301821 >>2450046 >>8940112 >>6251457 >>293 >>6262769 >>2693214 >>2839489 >>6283093 >>2666401 >>6343085 >>1721838 >>6377309 >>3882429 >>6302075 >> >> And in the xml schema we see >> > stored="false" >> mulitValued="true"/> >> >> However, when I search for entries in PMID, the only one that ever gets >> stored is the last one in the list. For instance, q=PMID:6302075 returns >> a >> document, whereas q=PMID:3882429 does not. Shouldn't the data import >> handler >> take care of this, or am I misunderstanding the function of >> mulitValued="true"? >> >> Kyle >> >> -- >> View this message in context: >> http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19655285.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19743517.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing Large Files with Large DataImport: Problems
I apologize for spamming this mailing list with my problems, but I'm at my wits end. I'll get right to the point. I have an xml file which is ~1GB which I wish to index. If that is successful, I will move to a larger file of closer to 20GB. However, when I run my data-config(let's call it dc.xml) over it, the import only manages to get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml) works perfectly on smaller data files of the same type. This data-config is quite large, maybe 250 fields. When I run a smaller data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works perfectly. The only conclusion I can draw from this is that the data-config method just doesn't scale well. When the dc.xml fails, the server logs spit out: Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=95 Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=77 Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter doFullImport SEVERE: Full Import failed java.util.ConcurrentModificationException at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) at java.util.AbstractList$Itr.next(AbstractList.java:343) at org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) at org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) This mass of exceptions DOES NOT occur when I perform the same full-import with sdc.xml. As far as I can tell, the only difference between the two files is the amount of fields they contain. Any guidance or information would be greatly appreciated. Kyle PS The schema.xml in use specifies almost all fields as multivalued, and has a copyfield for almost every field. I can fix this if it is causing my problem, but I would prefer not to. -- View this message in context: http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Large Files with Large DataImport: Problems
Yes, this is the most recent version of Solr, stream="true" and stopwords, lowercase and removeDuplicate being applied to all multivalued fields? Would the filters possibly be causing this? I will not use them and see what happens. Kyle Shalin Shekhar Mangar wrote: > > Hmm, strange. > > This is Solr 1.3.0, right? Do you have any transformers applied to these > multi-valued fields? Do you have stream="true" in the entity? > > On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote: > >> >> I apologize for spamming this mailing list with my problems, but I'm at >> my >> wits end. I'll get right to the point. >> >> I have an xml file which is ~1GB which I wish to index. If that is >> successful, I will move to a larger file of closer to 20GB. However, when >> I >> run my data-config(let's call it dc.xml) over it, the import only manages >> to >> get about 27 rows, out of roughly 200K. The exact same >> data-config(dc.xml) >> works perfectly on smaller data files of the same type. >> >> This data-config is quite large, maybe 250 fields. When I run a smaller >> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works >> perfectly. The only conclusion I can draw from this is that the >> data-config >> method just doesn't scale well. >> >> When the dc.xml fails, the server logs spit out: >> >> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >> status=0 >> QTime=95 >> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> INFO: Starting Full Import >> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 >> deleteAll >> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> SEVERE: Full Import failed >> java.util.ConcurrentModificationException >>at >> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>at >> >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute >> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >> status=0 >> QTime=77 >> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> INFO: Starting Full Import >> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 >> deleteAll >> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter >> doFullImport >> SEVERE: Full Import failed >> java.util.ConcurrentModificationException >>at >> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>at >> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>at >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>at >> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>at >&g
Re: Indexing Large Files with Large DataImport: Problems
As a follow up: I continued tweaking the data-config.xml, and have been able to make the commit fail with as little as 3 fields in the sdc.xml, with only one multivalued field. Even more strange, some fields work and some do not. For instance, in my dc.xml: . . . and in the schema.xml: . . . but taxon works and genpept does not. What could possibly account for this discrepancy? Again, the error logs from the server are exactly that seen in the first post. What is going on? KyleMorrison wrote: > > Yes, this is the most recent version of Solr, stream="true" and stopwords, > lowercase and removeDuplicate being applied to all multivalued fields? > Would the filters possibly be causing this? I will not use them and see > what happens. > > Kyle > > > Shalin Shekhar Mangar wrote: >> >> Hmm, strange. >> >> This is Solr 1.3.0, right? Do you have any transformers applied to these >> multi-valued fields? Do you have stream="true" in the entity? >> >> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> >> wrote: >> >>> >>> I apologize for spamming this mailing list with my problems, but I'm at >>> my >>> wits end. I'll get right to the point. >>> >>> I have an xml file which is ~1GB which I wish to index. If that is >>> successful, I will move to a larger file of closer to 20GB. However, >>> when I >>> run my data-config(let's call it dc.xml) over it, the import only >>> manages >>> to >>> get about 27 rows, out of roughly 200K. The exact same >>> data-config(dc.xml) >>> works perfectly on smaller data files of the same type. >>> >>> This data-config is quite large, maybe 250 fields. When I run a smaller >>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works >>> perfectly. The only conclusion I can draw from this is that the >>> data-config >>> method just doesn't scale well. >>> >>> When the dc.xml fails, the server logs spit out: >>> >>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute >>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>> status=0 >>> QTime=95 >>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> INFO: Starting Full Import >>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2 >>> deleteAll >>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> SEVERE: Full Import failed >>> java.util.ConcurrentModificationException >>>at >>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) >>>at java.util.AbstractList$Itr.next(AbstractList.java:343) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304) >>>at >>> >>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178) >>>at >>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386) >>>at >>> >>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute >>> INFO: [] webapp=/solr path=/dataimport params={command=full-import} >>> status=0 >>> QTime=77 >>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> INFO: Starting Full Import >>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2 >>> deleteAll >>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX >>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter >>> doFullImport >>> SEVERE: Full Import failed >>> java.util.ConcurrentModificationException >>>at >>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:3