[GitHub] [lucene] gsmiller commented on a change in pull request #443: LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals

GitBox Wed, 17 Nov 2021 09:21:45 -0800


gsmiller commented on a change in pull request #443:
URL: https://github.com/apache/lucene/pull/443#discussion_r751462508




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/OrdinalMappingLeafReader.java
##########
@@ -107,6 +113,64 @@ public BytesRef binaryValue() {
     }
   }
 
+  private class OrdinalMappingSortedNumericDocValues extends 
FilterSortedNumericDocValues {
+    private final IntArrayList currentValues;
+    private int currIndex;
+
+    OrdinalMappingSortedNumericDocValues(SortedNumericDocValues in) {
+      super(in);
+      currentValues = new IntArrayList(32);
+    }
+
+    @Override
+    public boolean advanceExact(int target) throws IOException {
+      boolean result = in.advanceExact(target);
+      if (result) {
+        reloadValues();
+      }
+      return result;
+    }
+
+    @Override
+    public int advance(int target) throws IOException {
+      int result = in.advance(target);
+      if (result != DocIdSetIterator.NO_MORE_DOCS) {
+        reloadValues();
+      }
+      return result;
+    }
+
+    @Override
+    public int nextDoc() throws IOException {
+      int result = in.nextDoc();
+      if (result != DocIdSetIterator.NO_MORE_DOCS) {
+        reloadValues();
+      }
+      return result;
+    }
+
+    @Override
+    public int docValueCount() {
+      return currentValues.elementsCount;
+    }
+
+    private void reloadValues() throws IOException {
+      currIndex = 0;
+      currentValues.clear();
+      for (int i = 0; i < in.docValueCount(); i++) {
+        currentValues.add(ordinalMap[(int) in.nextValue()]);
+      }
+      Arrays.sort(currentValues.buffer, 0, currentValues.elementsCount);

Review comment:
       The ordinal map gets created inside the `DirectoryTaxonomyWriter` merge 
logic, so I'm relying on that producing correct (1:1) mappings. If this were to 
happen though, we have parity between "old" and "new" though right now in the 
sense that both will end up storing the duplicate values. Do you think there's 
some additional checks or error handling we should include here? We could 
explicitly check for dups after mapping/sorting, but I'm not sure what error 
handling we'd want to put in place at that point to gracefully handle the 
problem?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #443: LUCENE-10062: Switch to numeric doc values for encoding taxonomy ordinals

Reply via email to