I do want to import all documents.  My understanding of the way things work,
correct me if I'm wrong, is that there can be a certain number of documents
included in a single atomic update.  Instead of having all my 16 Million
documents be part of a single update (that could more easily fail being so
big), I was thinking that it would be better to be able to stipulate how
many docs are part of an update and my 16 Million doc import would consist
of 16M/100 updates.


Shalin Shekhar Mangar wrote:
> 
> Hi Mike,
> 
> Just curious to know the use-case here. Why do you want to limit updates
> to
> 100 instead of importing all documents?
> 
> On Tue, Jun 24, 2008 at 10:23 AM, mike segv <[EMAIL PROTECTED]> wrote:
> 
>>
>> That fixed it.
>>
>> If I'm inserting millions of documents, how do I control docs/update? 
>> E.g.
>> if there are 50K docs per file, I'm thinking that I should probably code
>> up
>> my own DataSource that allows me to stipulate docs/update.  Like say, 100
>> instead of 50K.  Does this make sense?
>>
>> Mike
>>
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>> >
>> > hi ,
>> > You have not registered any datasources . the second entity needs a
>> > datasource.
>> > Remove the dataSource="null"  and add a name for the second entity
>> > (good practice). No need for baseDir attribute for second entity .
>> > See the modified xml added below
>> > --Noble
>> >
>> > <dataConfig>
>> > <dataSource type="FileDataSource"/>
>> > <document>
>> > <entity name="f" processor="FileListEntityProcessor" fileName=".*xml"
>> > newerThan="'NOW-10DAYS'" recursive="true" rootEntity="false"
>> > dataSource="null"  baseDir="/san/tomcat-services/solr-medline">
>> >  <entity name="x" processor="XPathEntityProcessor"
>> > forEach="/MedlineCitation"
>> > url="${f.fileAbsolutePath}" >
>> >     <field column="pmid" xpath="/MedlineCitation/PMID"/>
>> >  </entity>
>> > </entity>
>> > </document>
>> > </dataConfig>
>> >
>> > On Tue, Jun 24, 2008 at 6:39 AM, mike segv <[EMAIL PROTECTED]> wrote:
>> >>
>> >> I'm trying to use the fileListEntityProcessor to add some xml
>> documents
>> >> to a
>> >> solr index.  I'm running a nightly version of solr-1.3 with SOLR-469
>> and
>> >> SOLR-563.  I've been able to successfuly run the slashdot
>> httpDataSource
>> >> example.  My data-config.xml file loads without errors.  When I
>> attempt
>> >> the
>> >> full-import command I get the exception below.  Thanks for any help.
>> >>
>> >> Mike
>> >>
>> >> WARNING: No lockType configured for
>> >> /san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple'
>> >> Jun 23, 2008 7:59:49 PM
>> org.apache.solr.handler.dataimport.DataImporter
>> >> doFullImport
>> >> SEVERE: Full Import failed
>> >> java.lang.RuntimeException: java.lang.NullPointerException
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377)
>> >> Caused by: java.lang.NullPointerException
>> >>        at java.io.Reader.<init>(Reader.java:61)
>> >>        at java.io.BufferedReader.<init>(BufferedReader.java:76)
>> >>        at
>> com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775)
>> >>        at com.bea.xml.stream.MXParser.setInput(MXParser.java:806)
>> >>        at
>> >>
>> com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261)
>> >>        at
>> >>
>> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93)
>> >>        ... 10 more
>> >>
>> >> Here is my data-config:
>> >>
>> >> <dataConfig>
>> >> <document>
>> >> <entity name="f" processor="FileListEntityProcessor" fileName=".*xml"
>> >> newerThan="'NOW-10DAYS'" recursive="true" rootEntity="false"
>> >> dataSource="null" baseDi
>> >> r="/san/tomcat-services/solr-medline">
>> >>  <entity processor="XPathEntityProcessor" forEach="/MedlineCitation"
>> >> url="${f.fileAbsolutePath}" dataSource="null">
>> >>     <field column="pmid" xpath="/MedlineCitation/PMID"/>
>> >>  </entity>
>> >> </entity>
>> >> </document>
>> >> </dataConfig>
>> >>
>> >> And a snippet from an xml file:
>> >> <MedlineCitation Owner="PIP" Status="MEDLINE">
>> >> <PMID>12236137</PMID>
>> >> <DateCreated>
>> >> <Year>1980</Year>
>> >> <Month>01</Month>
>> >> <Day>03</Day>
>> >> </DateCreated>
>> >>
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > --Noble Paul
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18083747.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18095951.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to