On 9/12/2013 11:04 AM, phanichaitanya wrote:
> So, now I want to know when that document becomes searchable or when it is
> committed. I've the following scenario:
> 
> 1) Indexing starts at say 9:00 AM - with the above additions to the
> schema.xml I'll know the indexed time of each document I send to Solr via
> the update handler. Say 9:01, 9:02 and so on ... lets say I send a document
> for every second between 9 - 9:30 AM and it makes it 30*60 = 1800 docs
> 2) Now at 9:30 AM, I issue a hard commit and now I'll be able to search
> these 1800 documents which is fine.
> 3) Now I want to know that I can search these 1800 documents only at >=9:30
> AM but not < 9:30 AM as I did not do a hard commit before 9:30 AM. 
> 
> In order to know that, is there a way in Solr rather than some application
> keeping track of the documents it sends to Solr between any two commits. The
> reason I'm asking is, if there are say two parallel processes indexing to
> the same index and one process issues a commit - then whatever documents
> process two indexed until that point of time would also be committed right ?
> Now if I keep track of commit times in each process it doesn't reflect the
> true commit times as they are inter-twined.

>From what I understand, if you use the default of NOW for a field in
your schema, then all documents indexed in that request will have the
timestamp of the time that indexing started.

Assuming what I understand is the way it actually works, if you want the
time to reflect anything even close to commit time, then you will need
to send very small batches and you will need to commit after every
batch.  If you are indexing very quickly, you'll probably want those
commits to be soft commits.

You'll also want to have an autoCommit set up to do hard commits less
frequently with openSearcher=false, or you'll run into the problem
described at the link below.  There is a good autoCommit example there:

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

I've heard (but have not tested) that with the NOW default, large
imports with the dataimporthandler will all have the timestamp of when
the DIH request started, no matter what you do with autoCommit or
autoSoftCommit.

Thanks,
Shawn

Reply via email to