Re: [PR] Adding profiling support for concurrent segment search [lucene]

via GitHub Fri, 13 Jun 2025 08:24:20 -0700


jpountz commented on code in PR #14413:
URL: https://github.com/apache/lucene/pull/14413#discussion_r2145271196



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryLeafProfilerAggregator.java:
##########
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.List;
+
+interface QueryLeafProfilerAggregator {

Review Comment:
   I wonder if we actually need this interface since it seems to have a single 
implementation?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryLeafProfilerBreakdown.java:
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.lucene.util.CollectionUtil;
+
+/**
+ * A record of timings for the various operations that may happen during query 
execution. A node's
+ * time may be composed of several internal attributes (rewriting, weighting, 
scoring, etc).
+ */
+public class QueryLeafProfilerBreakdown {
+  private static final Collection<QueryProfilerTimingType> 
LEAF_LEVEL_TIMING_TYPE =
+      Arrays.stream(QueryProfilerTimingType.values()).filter(t -> 
t.isLeafLevel()).toList();
+
+  /** The accumulated timings for this query node */
+  private final QueryProfilerTimer[] timers;
+
+  /** Sole constructor. */
+  public QueryLeafProfilerBreakdown() {
+    timers = new QueryProfilerTimer[LEAF_LEVEL_TIMING_TYPE.size()];
+    for (int i = 0; i < timers.length; ++i) {
+      timers[i] = new QueryProfilerTimer();
+    }
+  }
+
+  public QueryProfilerTimer getTimer(QueryProfilerTimingType type) {
+    return timers[type.ordinal()];
+  }
+
+  /** Build a timing count breakdown. */
+  public final Map<String, Long> toBreakdownMap() {
+    Map<String, Long> map = CollectionUtil.newHashMap(timers.length * 2);
+    for (QueryProfilerTimingType type : LEAF_LEVEL_TIMING_TYPE) {
+      map.put(type.toString(), timers[type.ordinal()].getApproximateTiming());
+      map.put(type.toString() + "_count", timers[type.ordinal()].getCount());
+    }
+    return Collections.unmodifiableMap(map);
+  }
+
+  public final AggregatedQueryLeafProfilerResult getSliceProfilerResult(long 
sliceId) {
+    final Map<String, Long> map = 
CollectionUtil.newHashMap(LEAF_LEVEL_TIMING_TYPE.size() * 2);
+    long sliceStartTime = Long.MAX_VALUE;
+    long sliceEndTime = Long.MIN_VALUE;
+    for (QueryProfilerTimingType type : LEAF_LEVEL_TIMING_TYPE) {
+      final QueryProfilerTimer timer = timers[type.ordinal()];
+      // Consider timer for updating start/total time only
+      // if it was used
+      if (timer.getCount() > 0) {
+        sliceStartTime = Math.min(sliceStartTime, 
timer.getEarliestTimerStartTime());
+        sliceEndTime =
+            Math.max(
+                sliceEndTime, timer.getEarliestTimerStartTime() + 
timer.getApproximateTiming());

Review Comment:
   It doesn't feel right the compute the end time as the start time plus the 
approximate timing, since operations will often be interleaved. What about 
reporting the sum of the approximate timings across operation types instead, 
ie. the value of `toTotalTime()`?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryLeafProfilerBreakdown.java:
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import org.apache.lucene.util.CollectionUtil;
+
+/**
+ * A record of timings for the various operations that may happen during query 
execution. A node's
+ * time may be composed of several internal attributes (rewriting, weighting, 
scoring, etc).
+ */
+public class QueryLeafProfilerBreakdown {
+  private static final Collection<QueryProfilerTimingType> 
LEAF_LEVEL_TIMING_TYPE =
+      Arrays.stream(QueryProfilerTimingType.values()).filter(t -> 
t.isLeafLevel()).toList();
+
+  /** The accumulated timings for this query node */
+  private final QueryProfilerTimer[] timers;

Review Comment:
   What about using an `EnumMap`, which will be implemented by an array like 
this one under the hood?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerTimingType.java:
##########
@@ -21,16 +21,29 @@
 
 /** This enum breaks down the query into different sections to describe what 
was timed. */
 public enum QueryProfilerTimingType {
-  CREATE_WEIGHT,
-  COUNT,
-  BUILD_SCORER,
-  NEXT_DOC,
-  ADVANCE,
-  MATCH,
-  SCORE,
-  SHALLOW_ADVANCE,
-  COMPUTE_MAX_SCORE,
-  SET_MIN_COMPETITIVE_SCORE;
+  COUNT(true),
+  BUILD_SCORER(true),
+  NEXT_DOC(true),
+  ADVANCE(true),
+  MATCH(true),
+  SCORE(true),
+  SHALLOW_ADVANCE(true),
+  COMPUTE_MAX_SCORE(true),
+  SET_MIN_COMPETITIVE_SCORE(true),
+
+  // IMPORTANT: Global timer types must be defined after all the
+  // slice level timers to preserve the contiguous enum ordinals
+  CREATE_WEIGHT(false);
+
+  private boolean leafLevel;
+
+  private QueryProfilerTimingType(boolean leafLevel) {
+    this.leafLevel = leafLevel;
+  }
+
+  public boolean isLeafLevel() {

Review Comment:
   Add some javadocs? I believe that this means that this operation runs on a 
LeafReader as opposed to the top-level IndexReader?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/AggregatedQueryLeafProfilerResult.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * This class is the internal representation of a profiled Query, 
corresponding to a single node in
+ * the query tree. It is built after the query has finished executing and is 
merely a structured
+ * representation, rather than the entity that collects the timing profile.
+ */
+public class AggregatedQueryLeafProfilerResult {
+  private final long id;
+  private final Map<String, Long> breakdown;
+  private final long startTime;
+  private final long totalTime;
+
+  public AggregatedQueryLeafProfilerResult(
+      long id, Map<String, Long> breakdown, long startTime, long totalTime) {
+    this.id = id;
+    this.breakdown = Objects.requireNonNull(breakdown, "required breakdown 
argument missing");
+    this.startTime = startTime;
+    this.totalTime = totalTime;
+  }
+
+  /** Retrieve the lucene description of this query (e.g. the "explain" text) 
*/
+  public long getId() {

Review Comment:
   There seems to be a mismatch between javadocs and the method signature.



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/AggregatedQueryLeafProfilerResult.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.Objects;
+
+/**
+ * This class is the internal representation of a profiled Query, 
corresponding to a single node in
+ * the query tree. It is built after the query has finished executing and is 
merely a structured
+ * representation, rather than the entity that collects the timing profile.
+ */
+public class AggregatedQueryLeafProfilerResult {
+  private final long id;
+  private final Map<String, Long> breakdown;
+  private final long startTime;
+  private final long totalTime;
+
+  public AggregatedQueryLeafProfilerResult(
+      long id, Map<String, Long> breakdown, long startTime, long totalTime) {
+    this.id = id;
+    this.breakdown = Objects.requireNonNull(breakdown, "required breakdown 
argument missing");
+    this.startTime = startTime;
+    this.totalTime = totalTime;
+  }
+
+  /** Retrieve the lucene description of this query (e.g. the "explain" text) 
*/
+  public long getId() {
+    return id;
+  }
+
+  /** The timing breakdown for this node. */
+  public Map<String, Long> getTimeBreakdown() {
+    return Collections.unmodifiableMap(breakdown);

Review Comment:
   You could do the wrapping only once in the constructor?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryProfilerResult.java:
##########
@@ -31,39 +31,69 @@
  * queries if applicable
  */
 public class QueryProfilerResult {
+  enum AggregationType {
+    // Leaf level is most verbose, practically no aggregation
+    LEAF,
+    // Aggregate leaf level breakdowns based on slice
+    SLICE,
+    // Aggregate leaf level breakdowns based on thread execution
+    THREAD
+  }

Review Comment:
   Is this actually used? I think that from previous discussions we agreed that 
`SLICE` wouldn't work since Lucene doesn't tell us what slice it's in. And I 
don't see LEAF being used in this `PR`, only `THREAD`?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryLeafProfilerThreadAggregator.java:
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+/** Implementation of QueryLeafProfilerAggregator that aggregates leaf 
breakdowns at thread level */
+class QueryLeafProfilerThreadAggregator implements QueryLeafProfilerAggregator 
{
+  private final ConcurrentMap<Long, QueryLeafProfilerBreakdown> 
queryThreadBreakdowns;
+  private long queryStartTime = Long.MAX_VALUE;
+  private long queryEndTime = Long.MIN_VALUE;
+
+  public QueryLeafProfilerThreadAggregator() {
+    queryThreadBreakdowns = new ConcurrentHashMap<>();
+  }
+
+  @Override
+  public QueryProfilerResult.AggregationType getAggregationType() {
+    return QueryProfilerResult.AggregationType.THREAD;
+  }
+
+  @Override
+  public long getQueryStartTime() {
+    return queryStartTime;
+  }
+
+  @Override
+  public long getQueryEndTime() {
+    return queryEndTime;
+  }
+
+  private QueryLeafProfilerBreakdown getQuerySliceProfilerBreakdown() {

Review Comment:
   Let's replace `slice` by `thread` in this method name (and a few other 
places) to better reflect what it actually does?



##########
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/QueryLeafProfilerThreadAggregator.java:
##########
@@ -0,0 +1,87 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.sandbox.search;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+
+/** Implementation of QueryLeafProfilerAggregator that aggregates leaf 
breakdowns at thread level */
+class QueryLeafProfilerThreadAggregator implements QueryLeafProfilerAggregator 
{
+  private final ConcurrentMap<Long, QueryLeafProfilerBreakdown> 
queryThreadBreakdowns;

Review Comment:
   I wonder if this should use the actual `Thread` object as a key and then 
include it in the result profile, so that users can better know what thread 
specifically ran some operations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Adding profiling support for concurrent segment search [lucene]

Reply via email to