Ok, I got your point. DataImportHandler currently creates documents and adds them one-by-one to Solr. A commit/optimize is called once after all documents are finished. If a document fails to add due to any exception then the import fails.
You can still achieve the functionality you want by setting maxDocs under the autoCommit section in solrconfig.xml On Tue, Jun 24, 2008 at 11:01 PM, mike segv <[EMAIL PROTECTED]> wrote: > > I do want to import all documents. My understanding of the way things > work, > correct me if I'm wrong, is that there can be a certain number of documents > included in a single atomic update. Instead of having all my 16 Million > documents be part of a single update (that could more easily fail being so > big), I was thinking that it would be better to be able to stipulate how > many docs are part of an update and my 16 Million doc import would consist > of 16M/100 updates. > > > Shalin Shekhar Mangar wrote: > > > > Hi Mike, > > > > Just curious to know the use-case here. Why do you want to limit updates > > to > > 100 instead of importing all documents? > > > > On Tue, Jun 24, 2008 at 10:23 AM, mike segv <[EMAIL PROTECTED]> wrote: > > > >> > >> That fixed it. > >> > >> If I'm inserting millions of documents, how do I control docs/update? > >> E.g. > >> if there are 50K docs per file, I'm thinking that I should probably code > >> up > >> my own DataSource that allows me to stipulate docs/update. Like say, > 100 > >> instead of 50K. Does this make sense? > >> > >> Mike > >> > >> > >> Noble Paul നോബിള് नोब्ळ् wrote: > >> > > >> > hi , > >> > You have not registered any datasources . the second entity needs a > >> > datasource. > >> > Remove the dataSource="null" and add a name for the second entity > >> > (good practice). No need for baseDir attribute for second entity . > >> > See the modified xml added below > >> > --Noble > >> > > >> > <dataConfig> > >> > <dataSource type="FileDataSource"/> > >> > <document> > >> > <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" > >> > newerThan="'NOW-10DAYS'" recursive="true" rootEntity="false" > >> > dataSource="null" baseDir="/san/tomcat-services/solr-medline"> > >> > <entity name="x" processor="XPathEntityProcessor" > >> > forEach="/MedlineCitation" > >> > url="${f.fileAbsolutePath}" > > >> > <field column="pmid" xpath="/MedlineCitation/PMID"/> > >> > </entity> > >> > </entity> > >> > </document> > >> > </dataConfig> > >> > > >> > On Tue, Jun 24, 2008 at 6:39 AM, mike segv <[EMAIL PROTECTED]> wrote: > >> >> > >> >> I'm trying to use the fileListEntityProcessor to add some xml > >> documents > >> >> to a > >> >> solr index. I'm running a nightly version of solr-1.3 with SOLR-469 > >> and > >> >> SOLR-563. I've been able to successfuly run the slashdot > >> httpDataSource > >> >> example. My data-config.xml file loads without errors. When I > >> attempt > >> >> the > >> >> full-import command I get the exception below. Thanks for any help. > >> >> > >> >> Mike > >> >> > >> >> WARNING: No lockType configured for > >> >> /san/tomcat-services/solr-medline/solr/data/index/ assuming 'simple' > >> >> Jun 23, 2008 7:59:49 PM > >> org.apache.solr.handler.dataimport.DataImporter > >> >> doFullImport > >> >> SEVERE: Full Import failed > >> >> java.lang.RuntimeException: java.lang.NullPointerException > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:97) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:212) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:166) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:149) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:286) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:312) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:179) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:140) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:335) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:386) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:377) > >> >> Caused by: java.lang.NullPointerException > >> >> at java.io.Reader.<init>(Reader.java:61) > >> >> at java.io.BufferedReader.<init>(BufferedReader.java:76) > >> >> at > >> com.bea.xml.stream.MXParser.checkForXMLDecl(MXParser.java:775) > >> >> at com.bea.xml.stream.MXParser.setInput(MXParser.java:806) > >> >> at > >> >> > >> > com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:261) > >> >> at > >> >> > >> > org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:93) > >> >> ... 10 more > >> >> > >> >> Here is my data-config: > >> >> > >> >> <dataConfig> > >> >> <document> > >> >> <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" > >> >> newerThan="'NOW-10DAYS'" recursive="true" rootEntity="false" > >> >> dataSource="null" baseDi > >> >> r="/san/tomcat-services/solr-medline"> > >> >> <entity processor="XPathEntityProcessor" forEach="/MedlineCitation" > >> >> url="${f.fileAbsolutePath}" dataSource="null"> > >> >> <field column="pmid" xpath="/MedlineCitation/PMID"/> > >> >> </entity> > >> >> </entity> > >> >> </document> > >> >> </dataConfig> > >> >> > >> >> And a snippet from an xml file: > >> >> <MedlineCitation Owner="PIP" Status="MEDLINE"> > >> >> <PMID>12236137</PMID> > >> >> <DateCreated> > >> >> <Year>1980</Year> > >> >> <Month>01</Month> > >> >> <Day>03</Day> > >> >> </DateCreated> > >> >> > >> >> > >> >> -- > >> >> View this message in context: > >> >> > >> > http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18081671.html > >> >> Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> > > >> > > >> > > >> > -- > >> > --Noble Paul > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18083747.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > > > > -- > View this message in context: > http://www.nabble.com/Attempting-dataimport-using-FileListEntityProcessor-tp18081671p18095951.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.