[GitHub] [lucene] mikemccand commented on a change in pull request #442: LUCENE-10122 Use NumericDocValue to store taxonomy parent array

GitBox Wed, 17 Nov 2021 07:24:23 -0800


mikemccand commented on a change in pull request #442:
URL: https://github.com/apache/lucene/pull/442#discussion_r751350351




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/TaxonomyIndexArrays.java
##########
@@ -130,40 +125,82 @@ private void initParents(IndexReader reader, int first) 
throws IOException {
       return;
     }
 
+    if (tryLoadParentUsingTermPosition(reader, first)) {
+      return;
+    }
+
+    for (LeafReaderContext leafContext : reader.leaves()) {
+      int leafDocNum = leafContext.reader().maxDoc();
+      if (leafContext.docBase + leafDocNum <= first) {
+        // skip this leaf if it does not contain new categories
+        continue;
+      }
+      NumericDocValues parentValues =
+          
leafContext.reader().getNumericDocValues(Consts.FIELD_PARENT_ORDINAL_NDV);
+      if (parentValues == null) {
+        throw new CorruptIndexException(
+            "Parent data field " + Consts.FIELD_PARENT_ORDINAL_NDV + "not 
exists",
+            leafContext.reader().toString());
+      }
+
+      for (int doc = Math.max(first - leafContext.docBase, 0); doc < 
leafDocNum; doc++) {
+        if (parentValues.advanceExact(doc) == false) {
+          throw new CorruptIndexException(
+              "Missing parent data for category " + (doc + 
leafContext.docBase), reader.toString());
+        }
+        // we're putting an int and converting it back so it should be safe
+        parents[doc + leafContext.docBase] = 
Math.toIntExact(parentValues.longValue());
+      }
+    }
+  }
+
+  /**
+   * Try loading the old way of storing parent ordinal first, return true if 
the parent array is
+   * loaded Or false if not, and we will try loading using NumericDocValues
+   */
+  // TODO: Remove in Lucene 10, this is only for back-compatibility
+  private boolean tryLoadParentUsingTermPosition(IndexReader reader, int 
first) throws IOException {
     // it's ok to use MultiTerms because we only iterate on one posting list.
     // breaking it to loop over the leaves() only complicates code for no
     // apparent gain.
     PostingsEnum positions =
         MultiTerms.getTermPostingsEnum(
             reader, Consts.FIELD_PAYLOADS, Consts.PAYLOAD_PARENT_BYTES_REF, 
PostingsEnum.PAYLOADS);
 
+    if (positions == null) {
+      // try using NumericDocValues then
+      return false;
+    }
+
     // shouldn't really happen, if it does, something's wrong
-    if (positions == null || positions.advance(first) == 
DocIdSetIterator.NO_MORE_DOCS) {
+    if (positions.advance(first) == DocIdSetIterator.NO_MORE_DOCS) {
       throw new CorruptIndexException(
-          "Missing parent data for category " + first, reader.toString());
+          "[Lucene 8]Missing parent data for category " + first, 
reader.toString());

Review comment:
       Could we add a space between the `[Lucene 8]` and the message?  E.g. 
`[Lucene 8] Missing ...`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mikemccand commented on a change in pull request #442: LUCENE-10122 Use NumericDocValue to store taxonomy parent array

Reply via email to