That sounds very similar to my use case, too. (Mentioned in the recent thread "Updating a solr record"). So +1 on allowing updates!
Jason Rutherglen wrote:
Don,
I started work on fixing this a while back. However I plan to
resume again soon. Basically one would be able to update fields
to a parallel index, without reindexing the entire document.
There are other use cases I've seen for this such as caching.

-J

On Fri, Aug 28, 2009 at 8:49 AM, Don Werve<d...@madwombat.com> wrote:
Short version:

Is there a way to either do partial updates to documents (update/add one or
two fields only), or to search across multiple documents grouped by a
(non-unique) key stored in a field?

Long version:

I've run into an issue with the way I'm indexing documents for a new
product, and figure that somebody else has run into the same problem.  In a
nutshell, we're building a system that deals with a lot of incoming and
outgoing text documents (email, word docs, short comments, etc), grouped
together by some common factor (basically, email threads), and want to do
full-text search across those threads.

We've settled on Solr, of course. :)

Right now, I'm adding each new incoming/outgoing message as a new document,
and can search just fine, unless I want to look for multiple terms that span
documents.  So, "foo" is in the first document, "bar" is in the second, and
although they both have a 'thread_id' field identifying them as belonging to
the same group, searching for "+foo +bar" doesn't yield results (which is
not surprising).

Now, I can modify the code to store one document for each group of messages
without too much work.  But as I understand it, this means that for every
new message coming in, I need to hand an aggregate of all previous messages
to the indexer, because Solr will re-create the document (which indexes the
entire group of messages) when I do update/add.  Since there can be some
fairly large files sitting in there (50-100M in some cases), I'd rather not
have to shove that down Solr's pipe every time something changes.

So, first question, is what I think I know about update/add correct?

Second, if so, is there a way that I can update single-valued fields and
append new multivalued fields, without having to re-index the whole
document?

Third, am I just totally wrong about the way I'm trying to do this, and is
there a better way?

Thanks-in-advance!



Reply via email to