gautamworah96 commented on a change in pull request #179:
URL: https://github.com/apache/lucene/pull/179#discussion_r661074967
##########
File path:
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -351,12 +349,139 @@ public FacetLabel getPath(int ordinal) throws
IOException {
}
synchronized (categoryCache) {
- categoryCache.put(catIDInteger, ret);
+ categoryCache.put(ordinal, ret);
}
return ret;
}
+ private FacetLabel getPathFromCache(int ordinal) {
+ // TODO: can we use an int-based hash impl, such as IntToObjectMap,
+ // wrapped as LRU?
+ synchronized (categoryCache) {
+ return categoryCache.get(ordinal);
+ }
+ }
+
+ private void checkOrdinalBounds(int ordinal, int indexReaderMaxDoc)
+ throws IllegalArgumentException {
+ if (ordinal < 0 || ordinal >= indexReaderMaxDoc) {
+ throw new IllegalArgumentException(
+ "ordinal "
+ + ordinal
+ + " is out of the range of the indexReader "
+ + indexReader.toString());
+ }
+ }
+
+ /**
+ * Returns an array of FacetLabels for a given array of ordinals.
+ *
+ * <p>This API is generally faster than iteratively calling {@link
#getPath(int)} over an array of
+ * ordinals. It uses the {@link #getPath(int)} method iteratively when it
detects that the index
+ * was created using StoredFields (with no performance gains) and uses
DocValues based iteration
+ * when the index is based on DocValues.
+ *
+ * @param ordinals Array of ordinals that are assigned to categories
inserted into the taxonomy
+ * index
+ */
+ public FacetLabel[] getBulkPath(int... ordinals) throws IOException {
+ ensureOpen();
+
+ int ordinalsLength = ordinals.length;
+ FacetLabel[] bulkPath = new FacetLabel[ordinalsLength];
+ // remember the original positions of ordinals before they are sorted
+ int[] originalPosition = new int[ordinalsLength];
+ Arrays.setAll(originalPosition, IntUnaryOperator.identity());
+ int indexReaderMaxDoc = indexReader.maxDoc();
+
+ for (int i = 0; i < ordinalsLength; i++) {
+ // check whether the ordinal is valid before accessing the cache
+ checkOrdinalBounds(ordinals[i], indexReaderMaxDoc);
+ // check the cache before trying to find it in the index
+ FacetLabel ordinalPath = getPathFromCache(ordinals[i]);
+ if (ordinalPath != null) {
+ bulkPath[i] = ordinalPath;
+ }
+ }
+
+ /* parallel sort the ordinals and originalPosition array based on the
values in the ordinals array */
+ new InPlaceMergeSorter() {
+ @Override
+ protected void swap(int i, int j) {
+ int x = ordinals[i];
+ ordinals[i] = ordinals[j];
+ ordinals[j] = x;
+
+ x = originalPosition[i];
+ originalPosition[i] = originalPosition[j];
+ originalPosition[j] = x;
+ }
+ ;
+
+ @Override
+ public int compare(int i, int j) {
+ return Integer.compare(ordinals[i], ordinals[j]);
+ }
+ }.sort(0, ordinalsLength);
+
+ int readerIndex;
+ int leafReaderMaxDoc = 0;
+ int leafReaderDocBase = 0;
+ LeafReader leafReader;
+ LeafReaderContext leafReaderContext;
+ BinaryDocValues values = null;
+
+ for (int i = 0; i < ordinalsLength; i++) {
+ if (bulkPath[originalPosition[i]] == null) {
+ /*
+ If ordinals[i] >= leafReaderMaxDoc then we find the next leaf that
contains our ordinal
+ */
+ if (values == null || ordinals[i] >= leafReaderMaxDoc) {
Review comment:
This was a big miss. Thanks for catching this.
Here is why the test was passing:
When the ordinals were all in the first leaf, the ordinal was <
leafReaderMaxDoc and the wrong code worked correctly.
When the ordinals started spilling into the second leaf, the wrong code
would recalculate the leaf each time and then find the correct `Label`. Thus
were not actually making use of the increasing values of the ordinals (and were
recalculating the `leafContext` each time).
This would go on for all other leaves as well.
I'll rerun benchmarks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]