[ 
https://issues.apache.org/jira/browse/LUCENE-8321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17037035#comment-17037035
 ] 

Erick Erickson commented on LUCENE-8321:
----------------------------------------

Part of the rabbit hole would be the number of segments. TMP has a default 
segment size cap of 5G for instance. We could certainly up that or create a new 
merge policy for indexes with lots of docs...

On a separate note I've seen instances of terabyte-scale indexes on disk. 
Allowing that to grow by a factor of 8 would be another part of the rabbit hole.

That said, I'm not against the idea at all. I'm pretty sure operational issues 
would pop out, but that's progress...

 

> Allow composite readers to have more than 2B documents
> ------------------------------------------------------
>
>                 Key: LUCENE-8321
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8321
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> I would like to start discussing removing the limit of ~2B documents that we 
> have for indices, while still enforcing it at the segment level for practical 
> reasons.
> Postings, stored fields, and all other codec APIs would keep working on 
> integers to represent doc ids. Only top-level doc ids and numbers of 
> documents would need to move to a long. I say "only" because we now mostly 
> consume indices per-segment, but there is still a number of places where we 
> identify documents by their top-level doc ID like {{IndexReader#document}}, 
> top-docs collectors, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to