gsmiller commented on a change in pull request #282:
URL: https://github.com/apache/lucene/pull/282#discussion_r702274019



##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##########
@@ -159,13 +160,15 @@ private FacetResult getDim(String dim, OrdRange ordRange, 
int topN) throws IOExc
     final MatchingDocs hits;
     final OrdinalMap ordinalMap;
     final int segOrd;
+    final Bits liveDocs;
 
     public CountOneSegment(
         LeafReader leafReader, MatchingDocs hits, OrdinalMap ordinalMap, int 
segOrd) {
       this.leafReader = leafReader;
       this.hits = hits;
       this.ordinalMap = ordinalMap;
       this.segOrd = segOrd;
+      this.liveDocs = (leafReader != null) ? leafReader.getLiveDocs() : null;

Review comment:
       I wonder if it makes sense to just `assert leafReader != null` at the 
top of this ctor? If it's `null`, when `call()` gets invoked it will throw an 
NPE anyway (down in `DocValues.getSortedSet`), so might be nice to fail early 
on this. What do you think?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -152,7 +153,8 @@ private FacetResult getDim(String dim, OrdRange ordRange, 
int topN) throws IOExc
   }
 
   private void countOneSegment(
-      OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits) 
throws IOException {
+      OrdinalMap ordinalMap, LeafReader reader, int segOrd, MatchingDocs hits, 
Bits liveDocs)

Review comment:
       I left a similar comment in the concurrent version of this, but I think 
we should specialize the count-all case so we're not checking `liveDocs` in the 
common case where `hits` is non-null (since `hits` will already do that check 
essentially, and should only be providing live docs).

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/ConcurrentSortedSetDocValuesFacetCounts.java
##########
@@ -207,11 +210,17 @@ public Void call() throws IOException {
           // Remap every ord to global ord as we iterate:
           if (singleValues != null) {
             for (int doc = it.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; 
doc = it.nextDoc()) {
+              if (liveDocs != null && liveDocs.get(doc) == false) {

Review comment:
       Hmm, this is a little tricky. We only need to check `liveDocs` if `hits` 
is `null` (i.e., we're counting "all" docs). If `hits` is non-null, then we 
know it's only providing "live" docs already, so we don't need to re-check. 
Other `Facets` implementations actually provide a separate implementation / 
code-path for the "count all" case to specialize, and I wonder if we should do 
the same here. The code will end up being fairly duplicated (like elsewhere), 
but it would be more efficient for the common case where `hits` is non-null.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java
##########
@@ -272,7 +273,8 @@ private void count(FacetsCollector facetsCollector) throws 
IOException {
       // Assuming the state is valid, ordinalMap should be null since we have 
one segment:
       assert ordinalMap == null;
 
-      countOneSegment(docValues, hits.context.ord, hits);
+      // hits contain live documents only, no need to pass live docs bitset 
explicitly
+      countOneSegment(docValues, hits.context.ord, hits, null);

Review comment:
       Similar comment to what I left in 
`ConcurrentSortedSetDocValueFacetCounts` about whether-or-not it makes sense to 
split out a separate implementation for the count-all case. I think we should 
even though there might be significant code duplication. Checking liveness 
isn't necessary when `hits` is non-null, and will just add overhead to the 
common case. Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to