On 6/10/2014 2:11 AM, usmanZahid wrote: > I am working with Solr search engine and on a stage where i have to take > some implementation decision. > > I have a large file directory (1TB) and while indexing for the first time we > need to maintain history of our indexing position so that we can run the > indexer for 10 hours every day untill the complete documents are indexed. > > How can we keep track of where Solr left indexing last time for an accurate > indexing start next time? > > How to track changes in the documents which are already indexed or if there > are new files? (do i have start the index again from scratch in this case?) > > how can i increase performance while indexing?
All this depends on how you are indexing and what information needs to be tracked. If you are using the dataimport handler and the information you need to track is the timestamp when the last import started, then the dataimport handler automatically tracks this information and it will be available for a delta-import. If the information that needs to be tracked is different (an identifier, a filename, autoincrement value, etc) or you are not using the dataimport handler, then you must keep track of the information yourself, in any way that makes sense for your program. To increase indexing speed, use multiple threads in your indexing program that are indexing documents simultaneously. Solr can handle many threads all receiving update requests at the same time. Thanks, Shawn