[GitHub] [lucene] jpountz commented on a change in pull request #264: LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals (instead of custom binary format)

GitBox Thu, 26 Aug 2021 05:24:11 -0700


jpountz commented on a change in pull request #264:
URL: https://github.com/apache/lucene/pull/264#discussion_r696569831




##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -410,7 +411,16 @@ private void processFacetFields(
 
       // Facet counts:
       // DocValues are considered stored fields:
-      doc.add(new BinaryDocValuesField(indexFieldName, 
dedupAndEncode(ordinals.get())));
+      IntsRef o = ordinals.get();
+      Arrays.sort(o.ints, o.offset, o.length);

Review comment:
       The last parameter of Arrays#sort is the end index, not the length.
   
   ```suggestion
         Arrays.sort(o.ints, o.offset, o.offset+o.length);
   ```

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/DocValuesOrdinalsReader.java
##########
@@ -59,16 +53,21 @@ public void get(int docID, IntsRef ordinals) throws 
IOException {
               "docs out of order: lastDocID=" + lastDocID + " vs docID=" + 
docID);
         }
         lastDocID = docID;
-        if (docID > values.docID()) {
-          values.advance(docID);
-        }
-        final BytesRef bytes;
-        if (values.docID() == docID) {
-          bytes = values.binaryValue();
-        } else {
-          bytes = new BytesRef(BytesRef.EMPTY_BYTES);
+
+        ordinals.offset = 0;
+        ordinals.length = 0;
+
+        if (dv.advanceExact(docID)) {

Review comment:
       this change probably contributes to the speedup, we had seen 
non-negligible speedups when moving from advance to advanceExact

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/DocValuesOrdinalsReader.java
##########
@@ -41,12 +40,7 @@ public DocValuesOrdinalsReader(String field) {
 
   @Override
   public OrdinalsSegmentReader getReader(LeafReaderContext context) throws 
IOException {

Review comment:
       Do we actually need the OrdinalsSegmentReader or could we use 
`SortedNumericDocValues` instead? Both seem to be about fetching some sorted 
integers for a given doc ID. This might help get some more performance as we 
would no longer have to copy these ints into an IntsRef.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #264: LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals (instead of custom binary format)

Reply via email to