Any chance you are indexing to a Master, then synching to a Slave and you 
aren't seeing those last 20 on the Slave?
There is an issue with synching between Master and Slave that we've 
experienced. If the last commit is very small (20 sounds possible!) it can 
occur in the same clock second on that machine. The Master will see the commit 
and its index will show the data fine. However, the Slave cannot see the second 
commit on the same clock second, so it will be missing the last 20 due to sync 
between the two.

It's an edge case, but we ran into it recently.

-Todd

-----Original Message-----
From: SharmilaR [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 1:07 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr commits before documents are added


Solr version is 1.3
I am indexing total of 1.4 million documents. Yes, I commit(waitFlush="true"
waitSearcher="true") every 100k documents and then one at the end. 
I have a counter next to addDoc(SolrDocument) statement to keep track of
number of documents added. When I query Solr after commit,  the total number
of documents returned does not match the number of documents added. This
happens only when I index millions of documents and not when I index like
500 documents. In this case, I know its the last 20 documents which are not
committed because each document has a field 'RECORD_ID' which is assigned
sequential number(in java code). When I query Solr using Solr admin
interface, the documents with last 20 RECORD_ID are missing.(example the
last id is 999,980 instead of 1,000,000)

- Sharmila


Feak, Todd wrote:
> 
> A few questions to help the troubleshooting.
> 
> Solr version #?
> 
> Is there just 1 commit through Solrj for the millions of documents? 
> 
> Or do you do it on a regular interval (every 100k documents for example)
> and then one at the end to be sure?
> 
> How are you observing that the last few didn't make it in? Are you looking
> at a slave or master?
> 
> -Todd
> 
> 
-----Original Message-----
From: Ranganathan, Sharmila [mailto:sranganat...@library.rochester.edu] 
Sent: Monday, October 19, 2009 9:19 AM
To: solr-user@lucene.apache.org
Subject: Solr commits before documents are added

Hi,

My application indexes huge number of documents(like in millions). Below
is the snapshot of my code where I add all documents to Solr, and then
at last issue commit command. I use Solrj. I find that last few
documents are not  committed to Solr. Is this because adding documents
to Solr took longer time and it reached commit command even before it
finished adding documents? Is there are way to ensure that solr waits
for all documents to be added and then commits? Please advise me how to
solve this issue.

 

For loop

                solrServer.add(doc);   // Add document to Solr

End for loop

solrServer.commit();          // Commit to Solr

 

 

Thanks,

Sharmila





-- 
View this message in context: 
http://www.nabble.com/Solr-commits-before-documents-are-added-tp25961191p25964770.html
Sent from the Solr - User mailing list archive at Nabble.com.



Reply via email to