Erick Erickson [erickerick...@gmail.com] wrote: > I guess my $0.02 is that you'd have to have strong evidence that extending > Lucene to 64 bit is even useful. Or more generally, useful enough to pay the > penalty. All the structures that allocate maxDoc id arrays would suddenly > require twice the memory for instance,
Are there any such structures? It was my impressions that ID-structures in Solr were either bitmaps, hashmaps or queues. Anyway, if the number of places with full-size ID-arrays is low, there could be dual implementations selected by maxDoc. > plus all the coding effort that could be spend doing other things. Very true. I agree that at the current stage, > 2b/shard is still a bit too special to spend a lot of effort on it. However, 2b is just the hard limit. As has been discussed before, single shards works best in the lower end of the hundreds of millions of documents. One reason is that many parts of Lucene works single-threaded on structures that scale linear to document count. Having some hundreds of millions of documents (log analysis being the typical case) is not uncommon these days. A gradual shift to more multi-thread oriented processing would fit well with current trends in hardware as well as use cases. As opposed to the int->long switch, there would be little to no penalty for setups with low maxDocs (they would just use 1 thread). - Toke Eskildsen