Hello,

I have some questions about DataImportHandler and Solr statistics...


1.)
I'm using the DataImportHandler for creating my Lucene index from XML files:

###
$ cat data-config.xml 
<dataConfig>
 <dataSource type="FileDataSource" />
  <document>
   <entity name="xmlFile"
        processor="FileListEntityProcessor"
        baseDir="/tmp/files"
        fileName="myDoc_.*\.xml"
        newerThan="'NOW-30DAYS'"
        recursive="false"
        rootEntity="false"
        dataSource="null">
    <entity name="myDoc"
          url="${xmlFile.fileAbsolutePath}"
          processor="XPathEntityProcessor"
          forEach="/myDoc">
          ...
</dataConfig>
###

No problems with this configuration - All works fine for full-imports, but...

===> What means 'rootEntity="false"' and 'dataSource="null"'?



2.)
The documentation from DataImportHandler describes the index update process for 
SQL databases only...

My scenario:
- My application creates, deletes and modifies files from /tmp/files every 
night.
- delta-import / DataImportHandler should "mirror" _all_ this changes to my 
lucene index (=> create, delete, update documents).

===> Is this possible with delta-import / DataImportHandler?
===> If not: Do you have any suggestions on how to do this?



3.)
My scenario:
- /tmp/files contains 682 'myDoc_.*\.xml' XML files. 
- Each XML file contains 12 XML elements (e.g. <title>foo</title>).
- DataImportHandler transfer only 5 from this 12 elements to the lucene index. 


I don't understand the output from 'solr/dataimport' (=> status):

###
<response>
 ...
 <lst name="statusMessages">
  <str name="Total Requests made to DataSource">0</str>
  <str name="Total Rows Fetched">1363</str>
  <str name="Total Documents Skipped">0</str>
  <str name="Full Dump Started">2008-10-24 13:19:03</str>
  <str name="">
    Indexing completed. Added/Updated: 681 documents. Deleted 0 documents.
  </str>
  <str name="Committed">2008-10-24 13:19:05</str>
  <str name="Optimized">2008-10-24 13:19:05</str>
  <str name="Time taken ">0:0:2.648</str>
  </lst>
...
</response>

===> What is "Total Rows Fetched" rsp. what is a "row" in a XML file? An 
element? Why 1363?
===> Why shows the "Added/Updated" counter 681 and not 682?



4.)
And my last questions about Solr statistics/informations...

===> Is it possible to get informations (number of indexed documents, stored 
values from documents etc.) from the current lucene index?
===> The admin webinterface shows 'numDocs' and 'maxDoc' in 'statistics/core'. 
Is 'numDocs' = number of indexed documents? What means 'maxDocs'?


Thanks a lot!
gisto
-- 
GMX Kostenlose Spiele: Einfach online spielen und Spaß haben mit Pastry Passion!
http://games.entertainment.gmx.net/de/entertainment/games/free/puzzle/6169196

Reply via email to