Hello, I have some questions about DataImportHandler and Solr statistics...
1.) I'm using the DataImportHandler for creating my Lucene index from XML files: ### $ cat data-config.xml <dataConfig> <dataSource type="FileDataSource" /> <document> <entity name="xmlFile" processor="FileListEntityProcessor" baseDir="/tmp/files" fileName="myDoc_.*\.xml" newerThan="'NOW-30DAYS'" recursive="false" rootEntity="false" dataSource="null"> <entity name="myDoc" url="${xmlFile.fileAbsolutePath}" processor="XPathEntityProcessor" forEach="/myDoc"> ... </dataConfig> ### No problems with this configuration - All works fine for full-imports, but... ===> What means 'rootEntity="false"' and 'dataSource="null"'? 2.) The documentation from DataImportHandler describes the index update process for SQL databases only... My scenario: - My application creates, deletes and modifies files from /tmp/files every night. - delta-import / DataImportHandler should "mirror" _all_ this changes to my lucene index (=> create, delete, update documents). ===> Is this possible with delta-import / DataImportHandler? ===> If not: Do you have any suggestions on how to do this? 3.) My scenario: - /tmp/files contains 682 'myDoc_.*\.xml' XML files. - Each XML file contains 12 XML elements (e.g. <title>foo</title>). - DataImportHandler transfer only 5 from this 12 elements to the lucene index. I don't understand the output from 'solr/dataimport' (=> status): ### <response> ... <lst name="statusMessages"> <str name="Total Requests made to DataSource">0</str> <str name="Total Rows Fetched">1363</str> <str name="Total Documents Skipped">0</str> <str name="Full Dump Started">2008-10-24 13:19:03</str> <str name=""> Indexing completed. Added/Updated: 681 documents. Deleted 0 documents. </str> <str name="Committed">2008-10-24 13:19:05</str> <str name="Optimized">2008-10-24 13:19:05</str> <str name="Time taken ">0:0:2.648</str> </lst> ... </response> ===> What is "Total Rows Fetched" rsp. what is a "row" in a XML file? An element? Why 1363? ===> Why shows the "Added/Updated" counter 681 and not 682? 4.) And my last questions about Solr statistics/informations... ===> Is it possible to get informations (number of indexed documents, stored values from documents etc.) from the current lucene index? ===> The admin webinterface shows 'numDocs' and 'maxDoc' in 'statistics/core'. Is 'numDocs' = number of indexed documents? What means 'maxDocs'? Thanks a lot! gisto -- GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion! http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196