Hello,
I have a quasi-realtime indexing application where documents are grouped
into collections and documents can be added or removed from collections.
The document has an id and multiple collection id (collid) fields
reflecting the collections that contain that document. The collid field
is a filter query to limit a search to a given collection. When a
document is added, a new version of that document is constructed and
indexed. The new version has the collid fields that match the
collections containing that document. The id / collids relationship is
stored in a database table. The indexing is run from a cron job.
I need show a user the state (searchable/not indexed yet) of a document
he's added to one of his collections. I think that is exactly when the
commit is finished. I want to record that document state in the
database. So I have to understand how Solr serializes commits, adds and
segment merges. I understand that an a commit blocks adds. But what if
the commit times out?
Suppose I commit a number of adds but the adds have caused a lengthy
segment merge and the commit blocks and times out. How can I tell when
all my adds are searchable? Do I have to query solr for the given
collid field value for each document the users document list view?
In another case, can a commit sneak ahead of a list of adds or is the
commit blocked until the adds complete? If the commit is not blocked, I
can't use the commit to mark a document ready to be searched because
it's coll id fields may not be indexed yet.
If the commit is blocked on the adds but the adds take a long time to
process the commit could time out. Again, the commit is not a reliable
indicator of when the document is ready to be searched.
Is it true that commits, when timeouts enter the picture, will not work
to determine state and I need to query instead? Or am I missing something?
Thanks,
Phil
- Solr queuing behavior Phillip Farber
-