Thanks for your very fast response :-)

> > 2.)
> > The documentation from DataImportHandler describes the index update
> process for SQL databases only...
> >
> > My scenario:
> > - My application creates, deletes and modifies files from /tmp/files
> every night.
> > - delta-import / DataImportHandler should "mirror" _all_ this changes to
> my lucene index (=> create, delete, update documents).
> The only Entityprocessor which supports delta is SqlEntityProcessor.
> The XPathEntityProcessor has not implemented it , because we do not
> know of a consistent way of finding deltas for XML. So ,
> unfortunately,no delta support for XML. But that said you can
> implement those methods in XPathEntityProcessor . The methods are
> explained in EntityProcessor.java. if you have questions specific to
> this I can help.Probably we can contribute it back
> >
> > ===> Is this possible with delta-import / DataImportHandler?
> > ===> If not: Do you have any suggestions on how to do this?

Ok so, at the moment I have to do a full-import to update my index. What 
happens with (user) queries while full-import is running? Does Solr block this 
queries the import is finished? Which configuration options control this 
behavior? 



> > My scenario:
> > - /tmp/files contains 682 'myDoc_.*\.xml' XML files.
> > - Each XML file contains 12 XML elements (e.g. <title>foo</title>).
> > - DataImportHandler transfer only 5 from this 12 elements to the lucene
> index.
> >
> >
> > I don't understand the output from 'solr/dataimport' (=> status):
> >
> > ###
> > <response>
> >  ...
> >  <lst name="statusMessages">
> >  <str name="Total Requests made to DataSource">0</str>
> >  <str name="Total Rows Fetched">1363</str>
> >  <str name="Total Documents Skipped">0</str>
> >  <str name="Full Dump Started">2008-10-24 13:19:03</str>
> >  <str name="">
> >    Indexing completed. Added/Updated: 681 documents. Deleted 0
> documents.
> >  </str>
> >  <str name="Committed">2008-10-24 13:19:05</str>
> >  <str name="Optimized">2008-10-24 13:19:05</str>
> >  <str name="Time taken ">0:0:2.648</str>
> >  </lst>
> > ...
> > </response>
> >
> > ===> Why shows the "Added/Updated" counter 681 and not 682?
> 
> Added updated is the no:of docs . How do you know the number is not
> accurate?


/tmp/files$ ls myDoc_*.xml | wc -l
682

But "Added/Updated" shows 681. Does this mean that one file has an XML error? 
But the statistic says "Total Documents Skipped" = 0?!

 

> > 4.)
> > And my last questions about Solr statistics/informations...
> >
> > ===> Is it possible to get informations (number of indexed documents,
> stored values from documents etc.) from the current lucene index?
> > ===> The admin webinterface shows 'numDocs' and 'maxDoc' in
> 'statistics/core'. Is 'numDocs' = number of indexed documents? What means 
> 'maxDocs'?

Do you have answers for this questions too?

Bye,
Simon
-- 
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! 
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer

Reply via email to