My question would be, why are you updating 10m documents? Is it because of denormalised fields? E.g. one system I have needs to reindex all data for a publication when that publication switches between active and inactive.
If this is the case, you can perhaps achieve the same using joins. Store the publications, and their status, in another core. Then, to find documents for active publications could be: q=harry potter&fq={!join fromIndex=pubs from=pubID to=pubID}status:active This would find documents containing the terms 'harry potter' which are associated with active publications. Changing the status of a publication would require a single document in the 'pubs' core to be changed, rather than re-indexing all documents. Does this hit what you are trying to achieve? Upayavira On Wed, Jun 12, 2013, at 03:51 PM, Jack Krupansky wrote: > Correct. > > Generally, I think most apps will benefit from partial update, especially > if > they have a lot of fields. Otherwise, they will have two round trip > requests > rather than one. Solr does the reading of existing document values more > efficiently, under the hood, with no need to format for the response and > parse the incoming (redundant) values. > > OTOH, if the client has all the data anyway (maybe because it wants to > display the data before update), it may be easier to do a full update. > > You could do an actual performance test, but I would suggest that > (generally) partial update will be more efficient than a full update. > > And Lucene can do add and delete rather quickly, so that should not be a > concern for modest to medium size documents, but clearly would be an > issue > for large and very large documents (hundreds of fields or large field > values.) > > -- Jack Krupansky > > -----Original Message----- > From: adfel70 > Sent: Wednesday, June 12, 2013 10:40 AM > To: solr-user@lucene.apache.org > Subject: Partial update vs full update performance > > Hi > As I understand, even if I use partial update, lucene can't really update > documents. Solr will use the stored fields in order to pass the values to > lucene, and a delete,add opeartions will still be performed. > > If this is the case is there a performance issue when comparing partial > update to full update? > > My documents have dozens of fields, most of them are not stored. > I sometimes need to go through a portion of the documents and modify a > single field. > What I do right now is deleting the portion I want to update, and adding > them with the updated field. > This of course takes a lot of time (I'm talking about ten of millions of > documents). > > Should I move to using partial update? will it improve the indexing time > at > all? will it improve the indexing time in such extent that I would better > be > storing the fields I don't need stored just for the partial update > feature? > > thanks > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html > Sent from the Solr - User mailing list archive at Nabble.com. >