DataImport troubleshooting

2008-09-23 Thread KyleMorrison

I have searched the forum and the internet at large to find an answer to my
simple problem, but have been unable. I am trying to get a simple dataimport
to work, and have not been able to. I have Solr installed on an Apache
server on Unix. I am able to commit and search for files using the usual
Simple* tools. These files begin with ... and so on.

On the data import, I have inserted
  

  /R1/home/shoshana/kyle/Documents/data-config.xml  

  

into solrconfig, and the data import looks like this:

http://helix.ccb.sickkids.ca:8080/"; encoding="UTF-8" />








 

I apologize for the ugly xml. Nonetheless, when I go to
http://host:8080/solr/dataimport, I get a 404, and when I go to
http://host:8080/solr/admin/dataimport.jsp and try to "debug", nothing
happens. I have editted out the host name because I don't know if the
employer would be ok with it. Any guidance?

Thanks in advance,
Kyle
-- 
View this message in context: 
http://www.nabble.com/DataImport-troubleshooting-tp19630990p19630990.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImport troubleshooting

2008-09-23 Thread KyleMorrison

Thank you for help. The problem was actually just stupidity on my part, as it
seems I was running the wrong startup and shutdown shells for the server,
and thus the server was getting restarted. I restarted the server and I can
at least access those pages. I'm getting some wonky output, but I assume
this will be sorted out.

Kyle



Shalin Shekhar Mangar wrote:
> 
> Are there any exceptions in the log file when you start Solr?
> 
> On Tue, Sep 23, 2008 at 9:31 PM, KyleMorrison <[EMAIL PROTECTED]> wrote:
> 
>>
>> I have searched the forum and the internet at large to find an answer to
>> my
>> simple problem, but have been unable. I am trying to get a simple
>> dataimport
>> to work, and have not been able to. I have Solr installed on an Apache
>> server on Unix. I am able to commit and search for files using the usual
>> Simple* tools. These files begin with ... and so on.
>>
>> On the data import, I have inserted
>>  > class="org.apache.solr.handler.dataimport.DataImportHandler">
>>
>>  > name="config">/R1/home/shoshana/kyle/Documents/data-config.xml
>>
>>  
>>
>> into solrconfig, and the data import looks like this:
>> 
>>> baseUrl="http://helix.ccb.sickkids.ca:8080/"; encoding="UTF-8" />
>>
>>> forEach="/iProClassDatabase/iProClassEntry/"
>> url="/R1/home/shoshana/kyle/Documents/exampleIproResult.xml">
>>>
>> xpath="/iProClassDatabase/iProClassEntry/GENERAL_INFORMATION/Protein_Name_and_ID/UniProtKB/UniProtKB_Accession">
>>>
>> xpath="/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Enzyme_Function/EC/Nomenclature"
>> />
>>>
>> xpath="/iProClassDatabase/iProClassEntry/CROSS_REFERENCES/Bibliography/References/PMID"
>> />
>>> xpath="/iProClassDatabase/iProClassEntry/SEQUENCE/Sequence_Length" />
>>
>>
>> 
>>
>> I apologize for the ugly xml. Nonetheless, when I go to
>> http://host:8080/solr/dataimport, I get a 404, and when I go to
>> http://host:8080/solr/admin/dataimport.jsp and try to "debug", nothing
>> happens. I have editted out the host name because I don't know if the
>> employer would be ok with it. Any guidance?
>>
>> Thanks in advance,
>> Kyle
>> --
>> View this message in context:
>> http://www.nabble.com/DataImport-troubleshooting-tp19630990p19630990.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/DataImport-troubleshooting-tp19630990p19635170.html
Sent from the Solr - User mailing list archive at Nabble.com.



Indexing Multiple Fields with the Same Name

2008-09-24 Thread KyleMorrison

I'm trying to index fields as such:
6100966
375010
2338917
1943701
1357528
3301821
2450046
8940112
6251457
293
6262769
2693214
2839489
6283093
2666401
6343085
1721838
6377309
3882429
6302075

And in the xml schema we see
   

However, when I search for entries in PMID, the only one that ever gets
stored is the last one in the list. For instance, q=PMID:6302075 returns a
document, whereas q=PMID:3882429 does not. Shouldn't the data import handler
take care of this, or am I misunderstanding the function of
mulitValued="true"?

Kyle

-- 
View this message in context: 
http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19655285.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing Multiple Fields with the Same Name

2008-09-30 Thread KyleMorrison

That was indeed the error, I apologize for wasting your time. Thank you very
much for the help.
Kyle



Shalin Shekhar Mangar wrote:
> 
> Is that a mis-spelling?
> 
> mulitValued="true"
> 
> On Thu, Sep 25, 2008 at 12:12 AM, KyleMorrison <[EMAIL PROTECTED]> wrote:
> 
>>
>> I'm trying to index fields as such:
>>6100966
>>375010
>>2338917
>>1943701
>>1357528
>>3301821
>>2450046
>>8940112
>>6251457
>>293
>>6262769
>>2693214
>>2839489
>>6283093
>>2666401
>>6343085
>>1721838
>>6377309
>>3882429
>>6302075
>>
>> And in the xml schema we see
>>   > stored="false"
>> mulitValued="true"/>
>>
>> However, when I search for entries in PMID, the only one that ever gets
>> stored is the last one in the list. For instance, q=PMID:6302075 returns
>> a
>> document, whereas q=PMID:3882429 does not. Shouldn't the data import
>> handler
>> take care of this, or am I misunderstanding the function of
>> mulitValued="true"?
>>
>> Kyle
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19655285.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Indexing-Multiple-Fields-with-the-Same-Name-tp19655285p19743517.html
Sent from the Solr - User mailing list archive at Nabble.com.



Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread KyleMorrison

I apologize for spamming this mailing list with my problems, but I'm at my
wits end. I'll get right to the point.

I have an xml file which is ~1GB which I wish to index. If that is
successful, I will move to a larger file of closer to 20GB. However, when I
run my data-config(let's call it dc.xml) over it, the import only manages to
get about 27 rows, out of roughly 200K. The exact same data-config(dc.xml)
works perfectly on smaller data files of the same type.

This data-config is quite large, maybe 250 fields. When I run a smaller
data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
perfectly. The only conclusion I can draw from this is that the data-config
method just doesn't scale well.

When the dc.xml fails, the server logs spit out:

Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=95
Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at
org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0
QTime=77
Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
doFullImport
SEVERE: Full Import failed
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at
org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
at
org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)

This mass of exceptions DOES NOT occur when I perform the same full-import
with sdc.xml. As far as I can tell, the only difference between the two
files is the amount of fields they contain.

Any guidance or information would be greatly appreciated.
Kyle


PS The schema.xml in use specifies almost all fields as multivalued, and has
a copyfield for almost every field. I can fix this if it is causing my
problem, but I would prefer not to.
-- 
View this message in context: 
http://www.nabble.com/Indexing-Large-Files-with-Large-DataImport%3A-Problems-tp19746831p19746831.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread KyleMorrison

Yes, this is the most recent version of Solr, stream="true" and stopwords,
lowercase and removeDuplicate being applied to all multivalued fields? Would
the filters possibly be causing this? I will not use them and see what
happens.

Kyle


Shalin Shekhar Mangar wrote:
> 
> Hmm, strange.
> 
> This is Solr 1.3.0, right? Do you have any transformers applied to these
> multi-valued fields? Do you have stream="true" in the entity?
> 
> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]> wrote:
> 
>>
>> I apologize for spamming this mailing list with my problems, but I'm at
>> my
>> wits end. I'll get right to the point.
>>
>> I have an xml file which is ~1GB which I wish to index. If that is
>> successful, I will move to a larger file of closer to 20GB. However, when
>> I
>> run my data-config(let's call it dc.xml) over it, the import only manages
>> to
>> get about 27 rows, out of roughly 200K. The exact same
>> data-config(dc.xml)
>> works perfectly on smaller data files of the same type.
>>
>> This data-config is quite large, maybe 250 fields. When I run a smaller
>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
>> perfectly. The only conclusion I can draw from this is that the
>> data-config
>> method just doesn't scale well.
>>
>> When the dc.xml fails, the server logs spit out:
>>
>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>> status=0
>> QTime=95
>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> INFO: Starting Full Import
>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
>> deleteAll
>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> SEVERE: Full Import failed
>> java.util.ConcurrentModificationException
>>at
>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>> status=0
>> QTime=77
>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> INFO: Starting Full Import
>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
>> deleteAll
>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
>> doFullImport
>> SEVERE: Full Import failed
>> java.util.ConcurrentModificationException
>>at
>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>at
>>
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>at
>>
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>at
>&g

Re: Indexing Large Files with Large DataImport: Problems

2008-09-30 Thread KyleMorrison

As a follow up: I continued tweaking the data-config.xml, and have been able
to make the commit fail with as little as 3 fields in the sdc.xml, with only
one multivalued field. Even more strange, some fields work and some do not.
For instance, in my dc.xml:


.
.
.


and in the schema.xml:

.
.
.

but taxon works and genpept does not. What could possibly account for this
discrepancy? Again, the error logs from the server are exactly that seen in
the first post.

What is going on?


KyleMorrison wrote:
> 
> Yes, this is the most recent version of Solr, stream="true" and stopwords,
> lowercase and removeDuplicate being applied to all multivalued fields?
> Would the filters possibly be causing this? I will not use them and see
> what happens.
> 
> Kyle
> 
> 
> Shalin Shekhar Mangar wrote:
>> 
>> Hmm, strange.
>> 
>> This is Solr 1.3.0, right? Do you have any transformers applied to these
>> multi-valued fields? Do you have stream="true" in the entity?
>> 
>> On Tue, Sep 30, 2008 at 11:01 PM, KyleMorrison <[EMAIL PROTECTED]>
>> wrote:
>> 
>>>
>>> I apologize for spamming this mailing list with my problems, but I'm at
>>> my
>>> wits end. I'll get right to the point.
>>>
>>> I have an xml file which is ~1GB which I wish to index. If that is
>>> successful, I will move to a larger file of closer to 20GB. However,
>>> when I
>>> run my data-config(let's call it dc.xml) over it, the import only
>>> manages
>>> to
>>> get about 27 rows, out of roughly 200K. The exact same
>>> data-config(dc.xml)
>>> works perfectly on smaller data files of the same type.
>>>
>>> This data-config is quite large, maybe 250 fields. When I run a smaller
>>> data-config (let's call it sdc.xml) over the 1GB file, the sdc.xml works
>>> perfectly. The only conclusion I can draw from this is that the
>>> data-config
>>> method just doesn't scale well.
>>>
>>> When the dc.xml fails, the server logs spit out:
>>>
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>> status=0
>>> QTime=95
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Sep 30, 2008 11:40:18 AM org.apache.solr.update.DirectUpdateHandler2
>>> deleteAll
>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>> Sep 30, 2008 11:40:20 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> SEVERE: Full Import failed
>>> java.util.ConcurrentModificationException
>>>at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>>>at java.util.AbstractList$Itr.next(AbstractList.java:343)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFieldValue(DocBuilder.java:402)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.addFields(DocBuilder.java:373)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:304)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:178)
>>>at
>>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:136)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:334)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:386)
>>>at
>>>
>>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.core.SolrCore execute
>>> INFO: [] webapp=/solr path=/dataimport params={command=full-import}
>>> status=0
>>> QTime=77
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> INFO: Starting Full Import
>>> Sep 30, 2008 11:41:18 AM org.apache.solr.update.DirectUpdateHandler2
>>> deleteAll
>>> INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
>>> Sep 30, 2008 11:41:19 AM org.apache.solr.handler.dataimport.DataImporter
>>> doFullImport
>>> SEVERE: Full Import failed
>>> java.util.ConcurrentModificationException
>>>at
>>> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:3