Re: Partial update vs full update performance

Jack Krupansky Wed, 12 Jun 2013 09:18:20 -0700

Yes, you need to have all the fields stored to do a partial update.

Generally, not storing field values causes all sorts of headaches that faroutweigh the modest benefit in memory savings.

Generally, make everything stored - unless you have specific and VERYCOMPELLING need not to. Back in the early days of Lucene and Solr memory usewas much more compelling. Now, not so much. And even if memory is an issue,the downside of not storing all values seems much more likely to overwhelmthe benefits.

Sure, there are some apps where you may not want to store much if anythingbesides the key (I recall one presentation at Lucene Revolution in SanDiego, and DataStax Enterprise does this because all the data is stored inCassandra already), but generally apps would be better off biting the bulletand throwing memory at the problem.


And DocValues are an alternative if heap space is a critical issue.

2. Large field values are simply a potential issue since they are a lot ofbytes to be retrieved and then re-stored.


-- Jack Krupansky

-----Original Message-----From: adfel70

Sent: Wednesday, June 12, 2013 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Partial update vs full update performance

1. To support partial updates, I must have all the fields stored (most of
which I don't need stored)
Wouldn't I suffer in query perforemnce if I store all  these fields?

2. Can you elaborate on the large fields issue?
Why does it matter if the fields are large in the context of partial
updates?
One way or another, lucene will index the field..

Jack Krupansky-2 wrote

Correct.

Generally, I think most apps will benefit from partial update, especially
if
they have a lot of fields. Otherwise, they will have two round trip
requests
rather than one. Solr does the reading of existing document values more
efficiently, under the hood, with no need to format for the response and
parse the incoming (redundant) values.

OTOH, if the client has all the data anyway (maybe because it wants to
display the data before update), it may be easier to do a full update.

You could do an actual performance test, but I would suggest that
(generally) partial update will be more efficient than a full update.

And Lucene can do add and delete rather quickly, so that should not be a
concern for modest to medium size documents, but clearly would be an issue
for large and very large documents (hundreds of fields or large field
values.)

-- Jack Krupansky

-----Original Message-----From: adfel70

Sent: Wednesday, June 12, 2013 10:40 AM
To:

solr-user@.apache

Subject: Partial update vs full update performance

Hi
As I understand, even if I use partial update, lucene can't really update
documents. Solr will use the stored fields in order to pass the values to
lucene, and a delete,add opeartions will still be performed.

If this is the case is there a performance issue when comparing partial
update to full update?

My documents have dozens of fields, most of them are not stored.
I sometimes need to go through a portion of the documents and modify a
single field.
What I do right now is deleting the portion I want to update, and adding
them with the updated field.
This of course takes a lot of time (I'm talking about ten of millions of
documents).

Should I move to using partial update? will it improve the indexing time
at
all? will it improve the indexing time in such extent that I would better
be
storing the fields I don't need stored just for the partial update
feature?

thanks






--
View this message in context:
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
Sent from the Solr - User mailing list archive at Nabble.com.

--

View this message in context:http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948p4069973.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: Partial update vs full update performance

Reply via email to