Re: [PR] Compute facets while collecting [lucene]

via GitHub Mon, 29 Jul 2024 07:11:28 -0700


gsmiller commented on code in PR #13568:
URL: https://github.com/apache/lucene/pull/13568#discussion_r1695312400



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/recorders/FacetRecorder.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.sandbox.facet.recorders;
+
+import java.io.IOException;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.sandbox.facet.cutters.LeafFacetCutter;
+import org.apache.lucene.sandbox.facet.misc.FacetRollup;
+import org.apache.lucene.sandbox.facet.ordinals.OrdinalIterator;
+
+/**
+ * Record data for each facet of each doc.
+ *
+ * <p>TODO: In the next iteration we can add an extra layer between 
FacetRecorder and
+ * LeafFacetRecorder, e.g. SliceFacetRecorder. The new layer will be created 
per {@link
+ * org.apache.lucene.search.Collector}, which means that collecting of 
multiple leafs (segments)
+ * within a slice is sequential and can be done to a single non-sync map to 
improve performance and
+ * reduce memory consumption. We already tried that, but didn't see any 
performance improvement.
+ * Given that it also makes lazy leaf recorder init in {@link
+ * org.apache.lucene.sandbox.facet.FacetFieldCollector} trickier, it was 
decided to rollback the
+ * initial attempt and try again later, in the next iteration.
+ */
+public interface FacetRecorder {
+  /** Get leaf recorder. */
+  LeafFacetRecorder getLeafRecorder(LeafReaderContext context) throws 
IOException;
+
+  /** Return next collected ordinal, or {@link LeafFacetCutter#NO_MORE_ORDS} */
+  OrdinalIterator recordedOrds();
+
+  /** True if there are no records */
+  boolean isEmpty();

Review Comment:
   Got it. OK cool. If there's a use-case I have no problem keeping it. I'd 
rather have `isEmpty` than `size` since `isEmpty` should generally be easier to 
implement while `size` could put more restrictions on future implementations 
(potentially). Said differently, if you can implement "size" you can most 
certainly implement "is empty," but maybe not vice versa. Let's keep it but 
eventually incorporate it into demo code so it's clear how it can be useful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Compute facets while collecting [lucene]

Reply via email to