On 6/10/2014 2:11 AM, usmanZahid wrote:
> I am working with Solr search engine and on a stage where i have to take
> some implementation decision.
>
> I have a large file directory (1TB) and while indexing for the first time we
> need to maintain history of our indexing position so that we can run the
> indexer for 10 hours every day untill the complete documents are indexed.
>
> How can we keep track of where Solr left indexing last time for an accurate
> indexing start next time?
>
> How to track changes in the documents which are already indexed or if there
> are new files? (do i have start the index again from scratch in this case?)
>
> how can i increase performance while indexing?

All this depends on how you are indexing and what information needs to
be tracked.  If you are using the dataimport handler and the information
you need to track is the timestamp when the last import started, then
the dataimport handler automatically tracks this information and it will
be available for a delta-import.

If the information that needs to be tracked is different (an identifier,
a filename, autoincrement value, etc) or you are not using the
dataimport handler, then you must keep track of the information
yourself, in any way that makes sense for your program.

To increase indexing speed, use multiple threads in your indexing
program that are indexing documents simultaneously.  Solr can handle
many threads all receiving update requests at the same time.

Thanks,
Shawn

Reply via email to