mikemccand commented on a change in pull request #179:
URL: https://github.com/apache/lucene/pull/179#discussion_r697880899
##########
File path:
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyCombined.java
##########
@@ -366,9 +366,9 @@ public void testReaderBasic() throws Exception {
}
}
// (also test invalid ordinals:)
- assertNull(tr.getPath(-1));
- assertNull(tr.getPath(tr.getSize()));
- assertNull(tr.getPath(TaxonomyReader.INVALID_ORDINAL));
+ expectThrows(IllegalArgumentException.class, () -> tr.getPath(-1));
Review comment:
I guess this is a small API break? Previously this API leniently
returned `null` but now it throws exception. I think that's OK, but could you
advertise this in the `CHANGES.txt` entry?
##########
File path:
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -351,12 +348,142 @@ public FacetLabel getPath(int ordinal) throws
IOException {
}
synchronized (categoryCache) {
- categoryCache.put(catIDInteger, ret);
+ categoryCache.put(ordinal, ret);
}
return ret;
}
+ private FacetLabel[] getPathFromCache(int... ordinals) {
+ FacetLabel[] facetLabels = new FacetLabel[ordinals.length];
+ // TODO LUCENE-10068: can we use an int-based hash impl, such as
IntToObjectMap,
+ // wrapped as LRU?
+ synchronized (categoryCache) {
+ for (int i = 0; i < ordinals.length; i++) {
+ facetLabels[i] = categoryCache.get(ordinals[i]);
+ }
+ }
+ return facetLabels;
+ }
+
+ /**
+ * Checks if the ordinals in the array are >=0 and < {@code
+ * DirectoryTaxonomyReader#indexReader.maxDoc()}
+ *
+ * @param ordinals Integer array of ordinals
+ * @throws IllegalArgumentException Throw an IllegalArgumentException if one
of the ordinals is
+ * out of bounds
+ */
+ private void checkOrdinalBounds(int... ordinals) throws
IllegalArgumentException {
+ for (int ordinal : ordinals) {
+ if (ordinal < 0 || ordinal >= indexReader.maxDoc()) {
+ throw new IllegalArgumentException(
+ "ordinal "
+ + ordinal
+ + " is out of the range of the indexReader "
+ + indexReader.toString()
+ + ". The maximum possible ordinal number is "
+ + (indexReader.maxDoc() - 1));
+ }
+ }
+ }
+
+ /**
+ * Returns an array of FacetLabels for a given array of ordinals.
+ *
+ * <p>This API is generally faster than iteratively calling {@link
#getPath(int)} over an array of
+ * ordinals. It uses the {@link #getPath(int)} method iteratively when it
detects that the index
+ * was created using StoredFields (with no performance gains) and uses
DocValues based iteration
+ * when the index is based on BinaryDocValues. Lucene switched to
BinaryDocValues in version 9.0
+ *
+ * @param ordinals Array of ordinals that are assigned to categories
inserted into the taxonomy
+ * index
+ */
+ @Override
+ public FacetLabel[] getBulkPath(int... ordinals) throws IOException {
+ ensureOpen();
+ checkOrdinalBounds(ordinals);
+
+ int ordinalsLength = ordinals.length;
+ FacetLabel[] bulkPath = new FacetLabel[ordinalsLength];
+ // remember the original positions of ordinals before they are sorted
+ int[] originalPosition = new int[ordinalsLength];
+ Arrays.setAll(originalPosition, IntUnaryOperator.identity());
+
+ getPathFromCache(ordinals);
+
+ /* parallel sort the ordinals and originalPosition array based on the
values in the ordinals array */
+ new InPlaceMergeSorter() {
+ @Override
+ protected void swap(int i, int j) {
+ int x = ordinals[i];
+ ordinals[i] = ordinals[j];
+ ordinals[j] = x;
+
+ x = originalPosition[i];
+ originalPosition[i] = originalPosition[j];
+ originalPosition[j] = x;
+ }
+
+ @Override
+ public int compare(int i, int j) {
+ return Integer.compare(ordinals[i], ordinals[j]);
+ }
+ }.sort(0, ordinalsLength);
+
+ int readerIndex;
+ int leafReaderMaxDoc = 0;
+ int leafReaderDocBase = 0;
+ LeafReader leafReader;
+ LeafReaderContext leafReaderContext;
+ BinaryDocValues values = null;
+ List<Integer> uncachedOrdinalPositions = new ArrayList<>();
+
+ for (int i = 0; i < ordinalsLength; i++) {
+ if (bulkPath[originalPosition[i]] == null) {
+ /*
+ If ordinals[i] >= leafReaderDocBase + leafReaderMaxDoc then we find
the next leaf that contains our ordinal.
+ Remember: ordinals[i] operates in the global ordinal space and hence
we add leafReaderDocBase to the leafReaderMaxDoc
+ (which is the maxDoc of the specific leaf)
+ */
+ if (values == null || ordinals[i] >= leafReaderDocBase +
leafReaderMaxDoc) {
+
+ readerIndex = ReaderUtil.subIndex(ordinals[i], indexReader.leaves());
+ leafReaderContext = indexReader.leaves().get(readerIndex);
+ leafReader = leafReaderContext.reader();
+ leafReaderMaxDoc = leafReader.maxDoc();
+ leafReaderDocBase = leafReaderContext.docBase;
+ values = leafReader.getBinaryDocValues(Consts.FULL);
+
+ /*
+ If the index is constructed with the older StoredFields it will not
have any BinaryDocValues field and will return null
+ */
+ if (values == null) {
Review comment:
It's sort of weird to do this check per-segment, since we can know up
front, based on indexed created version, whether it's stored fields or doc
values? I.e. this is not a segment by segment decision. But let's do it in
follow-on issue. I think this one is ready!
##########
File path:
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/directory/TestBackwardsCompatibility.java
##########
@@ -89,6 +89,28 @@ private void createNewTaxonomyIndex(String dirName) throws
IOException {
dir.close();
}
+ // Opens up a pre-existing index and tries to run getBulkPath on it
+ public void testGetBulkPathOnOlderCodec() throws Exception {
+ Path indexDir = createTempDir(oldTaxonomyIndexName);
+ TestUtil.unzip(getDataInputStream(oldTaxonomyIndexName + ".zip"),
indexDir);
+ Directory dir = newFSDirectory(indexDir);
+
+ DirectoryTaxonomyWriter writer = new DirectoryTaxonomyWriter(dir);
+ DirectoryTaxonomyReader reader = new DirectoryTaxonomyReader(writer);
+
+ FacetLabel[] facetLabels = {
+ new FacetLabel("a"), new FacetLabel("b"),
+ };
+ int[] ordinals =
+ new int[] {reader.getOrdinal(facetLabels[0]),
reader.getOrdinal(facetLabels[1])};
+ // The zip file already contains a category "a" and a category "b" stored
with the older
Review comment:
Maybe the test should assert that `ordinals[0] != -1` and same for
`ordinals[1]`?
##########
File path:
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyReader.java
##########
@@ -351,12 +348,140 @@ public FacetLabel getPath(int ordinal) throws
IOException {
}
synchronized (categoryCache) {
- categoryCache.put(catIDInteger, ret);
+ categoryCache.put(ordinal, ret);
}
return ret;
}
+ private FacetLabel[] getPathFromCache(int... ordinals) {
+ FacetLabel[] facetLabels = new FacetLabel[ordinals.length];
+ // TODO LUCENE-10068: can we use an int-based hash impl, such as
IntToObjectMap,
+ // wrapped as LRU?
+ synchronized (categoryCache) {
+ for (int i = 0; i < ordinals.length; i++) {
+ facetLabels[i] = categoryCache.get(ordinals[i]);
+ }
+ }
+ return facetLabels;
+ }
+
+ /**
+ * Checks if the ordinals in the array are >=0 and < {@code
+ * DirectoryTaxonomyReader#indexReader.maxDoc()}
+ *
+ * @param ordinals Integer array of ordinals
+ * @throws IllegalArgumentException Throw an IllegalArgumentException if one
of the ordinals is
+ * out of bounds
+ */
+ private void checkOrdinalBounds(int... ordinals) throws
IllegalArgumentException {
+ for (int ordinal : ordinals) {
+ if (ordinal < 0 || ordinal >= indexReader.maxDoc()) {
+ throw new IllegalArgumentException(
+ "ordinal "
+ + ordinal
+ + " is out of the range of the indexReader "
+ + indexReader.toString()
+ + ". The maximum possible ordinal number is "
+ + (indexReader.maxDoc() - 1));
+ }
+ }
+ }
+
+ /**
+ * Returns an array of FacetLabels for a given array of ordinals.
+ *
+ * <p>This API is generally faster than iteratively calling {@link
#getPath(int)} over an array of
+ * ordinals. It uses the {@link #getPath(int)} method iteratively when it
detects that the index
+ * was created using StoredFields (with no performance gains) and uses
DocValues based iteration
+ * when the index is based on BinaryDocValues. Lucene switched to
BinaryDocValues in version 9.0
+ *
+ * @param ordinals Array of ordinals that are assigned to categories
inserted into the taxonomy
+ * index
+ */
+ @Override
+ public FacetLabel[] getBulkPath(int... ordinals) throws IOException {
+ ensureOpen();
+ checkOrdinalBounds(ordinals);
+
+ int ordinalsLength = ordinals.length;
+ FacetLabel[] bulkPath = new FacetLabel[ordinalsLength];
+ // remember the original positions of ordinals before they are sorted
+ int[] originalPosition = new int[ordinalsLength];
+ Arrays.setAll(originalPosition, IntUnaryOperator.identity());
+
+ getPathFromCache(ordinals);
Review comment:
Yeah good point ...
Though I do think it's weird that we set `categoryCache = null` in `doClose`
because this means if the application is (illegally) still trying to do
faceting in one query thread, while another thread closes, then the query
threads will hit `NullPointerException`. I think it'd be better to make
`categoryCache` `final` and throw `AlreadyClosedException`. But let's not do
that here/now! Can you open a followon issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]