I can't disagree. You bring up some of the points that make me _extremely_ reluctant to try to get this in to 5.x though. 6.0 at the earliest I should think.
And who knows? Java may get a GC process that's geared to modern amounts of memory and get by the current pain.... Best, Erick On Sat, Jan 3, 2015 at 1:00 PM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > Erick Erickson [erickerick...@gmail.com] wrote: > > Of course I wouldn't be doing the work so I really don't have much of > > a vote, but it's not clear to me at all that enough people would actually > > have a use-case for 2b+ docs in a single shard to make it > > worthwhile. At that scale GC potentially becomes really unpleasant for > > instance.... > > Over the last years we have seen a few use cases here on the mailing list. > I would be very surprised if the number of such cases does not keep rising. > Currently the work for a complete overhaul does not measure up to the > rewards, but that is slowly changing. At the very least I find it prudent > to not limit new Lucene/Solr interfaces to ints. > > As for GC: Right now a lot of structures are single-array oriented (for > example using a long-array to represent bits in a bitset), which might not > work well with current garbage collectors. A change to higher limits also > means re-thinking such approaches: If the garbage collectors likes objects > below a certain size then split the arrays into that. Likewise, iterations > over structures linear in size to the index could be threaded. These are > issues even with the current 2b limitation. > > - Toke Eskildsen >