Re: Atomic updates and indexed fields

Bram Van Dam Mon, 08 Jul 2013 07:25:37 -0700

see: https://issues.apache.org/jira/browse/LUCENE-4258
I'm sure the people working on this would gladly get all
the help they can. WARNING: I suspect (although I haven't
looked myself) that this is very hairy code <G>.

Ah excellent! Thanks! Exactly what I was looking for. Looks like thishas been in the pipeline for a good while now. I'll have a look over thepatches, and if it's not too hairy I'll see what I can do.

I'll challenge this statement a bit, knowing full well that I don't
understand your problem space just by saying I've seen
some pretty big, high-throughput installations go ahead and
store all the fields and use them for atomic updates. As in
billions of documents. And note that "index size" as it relates
to storing content is orthogonal to searching. By that I mean
the index bloat you get when storing fields doesn't
really impact search memory requirements much, the stored
data is kept in separate files and only assembled for docs
as you return them (i.e. a page worth).

Without going into too much detail about this, I'll say that we havebillions of documents with ~50 indexed fields, fewer than 5 of whichneed to be updated, though some documents have to be updated 10 times ina reasonably short timespan. All the while maintaining an indexingthroughput of ~4k messages/second. Near real time. On COTS hardware.Every IO-operation we can spare is a major win for us.

Impact on index size is around ~15% in my tests. I will need a littlemore time to measure the impact on throughput and querying, but my gutinstinct tells me that it won't be pretty.


 - Bram

Re: Atomic updates and indexed fields

Reply via email to