jtibshirani commented on a change in pull request #1948: URL: https://github.com/apache/lucene-solr/pull/1948#discussion_r504899193
########## File path: lucene/core/src/java/org/apache/lucene/index/OrdinalMap.java ########## @@ -271,13 +273,26 @@ protected boolean lessThan(TermsEnumIndex a, TermsEnumIndex b) { globalOrd++; } - this.firstSegments = firstSegments.build(); - this.globalOrdDeltas = globalOrdDeltas.build(); + long ramBytesUsed = BASE_RAM_BYTES_USED + segmentMap.ramBytesUsed(); + this.valueCount = globalOrd; + + // If the first segment contains all of the global ords, then we can apply a small optimization + // and hardcode the first segments and global ord deltas as all zeroes. + if (ordDeltaBits.length > 0 && ordDeltaBits[0] == 0L && ordDeltas[0].size() == this.valueCount) { Review comment: > Do we (somewhere, couldn't find it here) pre-sort all segments by the cardinality descending? We do in fact -- the segments are sorted by 'weight', which in all call sites corresponds to the number of unique terms. This was added in [LUCENE-5782](https://issues.apache.org/jira/browse/LUCENE-5782). > Does our PackedLongValues.monotonicBuilder already optimize for the case where it is all 0s, for the case where another segment (not the first) has all the global values as well? It does look like it -- when constructing the individual `PackedInts.Reader` instances, we identify the all 0s case and use the lightweight `PackedInts.NullReader`. It's great we optimize that case, but it does mean this PR doesn't make an enormous space difference. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org