I do want to import all documents. My understanding of the way things work, correct me if I'm wrong, is that there can be a certain number of documents included in a single atomic update. Instead of having all my 16 Million documents be part of a single update (that could more easily fail being so big), I was thinking that it would be better to be able to stipulate how many docs are part of an update and my 16 Million doc import would consist of 16M/100 updates.
Shalin Shekhar Mangar wrote: > > Hi Mike, > > Just curious to know the use-case here. Why do you want to limit updates > to > 100 instead of importing all documents? > > On Tue, Jun 24, 2008 at 10:23 AM, mike segv <[EMAIL PROTECTED]> wrote: > >> >> That fixed it. >> >> If I'm inserting millions of documents, how do I control docs/update? >> E.g. >> if there are 50K docs per file, I'm thinking that I should probably code >> up >> my own DataSource that allows me to stipulate docs/update. Like say, 100 >> instead of 50K. Does this make sense? >> >> Mike >> >> >> Noble Paul നോബിള് नोब्ळ् wrote: >> > >> > hi , >> > You have not registered any datasources . the second entity needs a >> > datasource. >> > Remove the dataSource="null" and add a name for the second entity >> > (good practice). No need for baseDir attribute for second entity . >> > See the modified xml added below >> > --Noble >> > >> > <dataConfig> >> > <dataSource type="FileDataSource"/> >> > <document> >> > <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" >> > newerThan="'NOW-10DAYS'" recursive="true" rootEntity="false" >> > dataSource="null" baseDir="/san/tomcat-services/solr-medline"> >> > <entity name="x" processor="XPathEntityProcessor" >> > forEach="/MedlineCitation" >> > url="${f.fileAbsolutePath}" > >> > <field column="pmid" xpath="/MedlineCitation/PMID"/> >> > </entity> >> > </entity> >> > </document> >> > </dataConfig> >> > >> > On Tue, Jun 24, 2008 at 6:39 AM, mike segv <[EMAIL PROTECTED]> wrote: >> >> >> >> I'm trying to use the fileListEntityProcessor to add some xml >> documents >> >> to a >> >> solr index. I'm running a nightly version of solr-1.3 with SOLR-469 >> and >> >> SOLR-563. I've been able to successfuly run the slashdot >> httpDataSource >> >> example. My data-config.xml file loads without errors. When I >> attempt >> >> the >> >> full-import command I get the exception below. Thanks for any help. >> >> >> >> Mike >> >> >> >> WARNING: No lockType configured for >> >> /san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple' >> >> Jun 23, 2008 7:59:49 PM >> org.apache.solr.handler.dataimport.DataImporter >> >> doFullImport >> >> SEVERE: Full Import failed >> >> java.lang.RuntimeException: java.lang.NullPointerException >> >> at >> >> >> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97) >> >> at >> >> >> org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212) >> >> at >> >> >> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166) >> >> at >> >> >> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149) >> >> at >> >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286) >> >> at >> >> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312) >> >> at >> >> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) >> >> at >> >> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140) >> >> at >> >> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) >> >> at >> >> >> org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386) >> >> at >> >> >> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) >> >> Caused by: java.lang.NullPointerException >> >> at java.io.Reader.<init>(Reader.java:61) >> >> at java.io.BufferedReader.<init>(BufferedReader.java:76) >> >> at >> com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775) >> >> at com.bea.xml.stream.MXParser.setInput(MXParser.java:806) >> >> at >> >> >> com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261) >> >> at >> >> >> org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93) >> >> ... 10 more >> >> >> >> Here is my data-config: >> >> >> >> <dataConfig> >> >> <document> >> >> <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" >> >> newerThan="'NOW-10DAYS'" recursive="true" rootEntity="false" >> >> dataSource="null" baseDi >> >> r="/san/tomcat-services/solr-medline"> >> >> <entity processor="XPathEntityProcessor" forEach="/MedlineCitation" >> >> url="${f.fileAbsolutePath}" dataSource="null"> >> >> <field column="pmid" xpath="/MedlineCitation/PMID"/> >> >> </entity> >> >> </entity> >> >> </document> >> >> </dataConfig> >> >> >> >> And a snippet from an xml file: >> >> <MedlineCitation Owner="PIP" Status="MEDLINE"> >> >> <PMID>12236137</PMID> >> >> <DateCreated> >> >> <Year>1980</Year> >> >> <Month>01</Month> >> >> <Day>03</Day> >> >> </DateCreated> >> >> >> >> >> >> -- >> >> View this message in context: >> >> >> http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >> >> >> > >> > >> > >> > -- >> > --Noble Paul >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18083747.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > Regards, > Shalin Shekhar Mangar. > > -- View this message in context: http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18095951.html Sent from the Solr - User mailing list archive at Nabble.com.