Re: how to update billions of docs

2016-03-24 Thread Mohsin Beg Beg
An update on how I ended up implementing the requirement in case it helps others. There are lots of other code I did not include but the general logic is below. While performance is still not great, it is 10x faster than atomic updates ( because RealTimeGetComponent.getInputDocument() is not n

Re: how to update billions of docs

2016-03-20 Thread Ishan Chattopadhyaya
Hi Mohsin, There's some work in progress for in-place updates to docValued fields, https://issues.apache.org/jira/browse/SOLR-5944. Can you try the latest patch there (or ping me if you need a git branch)? It would be nice to know how fast the updates go for your usecase with that patch. Please not

Re: how to update billions of docs

2016-03-19 Thread Toke Eskildsen
Mohsin Beg Beg wrote: > I have a requirement to replace a value of a field in 100B's of docs > in 100's of cores. The field is multiValued=false docValues=true > type=StrField stored=true indexed=true. If this is just a simple one-time search-replace, then don't update the value in the index. In

Re: how to update billions of docs

2016-03-19 Thread sudsport s
I think there are no inplace updates in solr , that means updates behaves like inserts and marking old version deleted. so behaviors should be same as indexing billions of docs. On Wed, Mar 16, 2016 at 3:52 PM, Mohsin Beg Beg wrote: > Hi, > > I have a requirement to replace a value of a field in

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
It would be nice to have a wiki/doc for "Bulk Field Update" that listed all of these techniques and tricks. And, of course, it would be so much better to have an explicit Lucene feature for this. It could work in the background like merge and process one segment at a time as efficiently as possibl

Re: how to update billions of docs

2016-03-19 Thread Jack Krupansky
That's another great example of a mode that Bulk Field Update (my mythical feature) needs - switch a list of fields from stored to docvalues. And maybe even the opposite since there are scenarios in which docValues is worse than stored and you would only find that out after indexing... billions of

RE: how to update billions of docs

2016-03-19 Thread Ken Krugler
As others noted, currently updating a field means deleting and inserting the entire document. Depending on how you use the field, you might be able to create another core/container with that one field (plus the key field), and use join support. Note that https://issues.apache.org/jira/browse/LU