[pinot] branch release-1.0-rc updated: [maven-release-plugin] prepare release release-1.0-rc0

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a commit to branch release-1.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


The following commit(s) were added to refs/heads/release-1.0-rc by this push:
 new e39591750b [maven-release-plugin] prepare release release-1.0-rc0
e39591750b is described below

commit e39591750b1c989b5dbd76f3b323f071f3093834
Author: Saurabh Dubey 
AuthorDate: Thu Aug 10 12:53:56 2023 +0530

[maven-release-plugin] prepare release release-1.0-rc0
---
 contrib/pinot-fmpp-maven-plugin/pom.xml|  6 +--
 pinot-broker/pom.xml   |  5 +--
 pinot-clients/pinot-java-client/pom.xml|  5 +--
 pinot-clients/pinot-jdbc-client/pom.xml|  8 ++--
 pinot-clients/pom.xml  |  6 +--
 pinot-common/pom.xml   | 43 +++---
 pinot-compatibility-verifier/pom.xml   |  5 +--
 pinot-connectors/pinot-flink-connector/pom.xml |  8 ++--
 pinot-connectors/pinot-spark-2-connector/pom.xml   |  5 +--
 pinot-connectors/pinot-spark-3-connector/pom.xml   |  5 +--
 pinot-connectors/pinot-spark-common/pom.xml|  5 +--
 pinot-connectors/pom.xml   |  6 +--
 pinot-controller/pom.xml   |  5 +--
 pinot-core/pom.xml |  5 +--
 pinot-distribution/pom.xml |  7 ++--
 pinot-integration-test-base/pom.xml|  5 +--
 pinot-integration-tests/pom.xml|  5 +--
 pinot-minion/pom.xml   |  5 +--
 pinot-perf/pom.xml |  5 +--
 .../pinot-batch-ingestion-common/pom.xml   |  6 +--
 .../pinot-batch-ingestion-hadoop/pom.xml   |  6 +--
 .../pinot-batch-ingestion-spark-2.4/pom.xml|  6 +--
 .../pinot-batch-ingestion-spark-3.2/pom.xml|  6 +--
 .../pinot-batch-ingestion-standalone/pom.xml   |  6 +--
 pinot-plugins/pinot-batch-ingestion/pom.xml|  6 +--
 .../pinot-environment/pinot-azure/pom.xml  |  6 +--
 pinot-plugins/pinot-environment/pom.xml|  6 +--
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml |  5 +--
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml  |  6 +--
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml |  5 +--
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml   |  9 ++---
 pinot-plugins/pinot-file-system/pom.xml|  6 +--
 .../pinot-input-format/pinot-avro-base/pom.xml |  5 +--
 .../pinot-input-format/pinot-avro/pom.xml  |  5 +--
 .../pinot-input-format/pinot-clp-log/pom.xml   |  5 +--
 .../pinot-confluent-avro/pom.xml   |  5 +--
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml |  5 +--
 .../pinot-input-format/pinot-json/pom.xml  |  5 +--
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml |  6 +--
 .../pinot-input-format/pinot-parquet/pom.xml   |  5 +--
 .../pinot-input-format/pinot-protobuf/pom.xml  |  5 +--
 .../pinot-input-format/pinot-thrift/pom.xml|  5 +--
 pinot-plugins/pinot-input-format/pom.xml   |  6 +--
 .../pinot-metrics/pinot-dropwizard/pom.xml |  5 +--
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml   |  5 +--
 pinot-plugins/pinot-metrics/pom.xml|  6 +--
 .../pinot-minion-builtin-tasks/pom.xml |  5 +--
 pinot-plugins/pinot-minion-tasks/pom.xml   |  6 +--
 .../pinot-segment-uploader-default/pom.xml |  6 +--
 pinot-plugins/pinot-segment-uploader/pom.xml   |  6 +--
 .../pinot-segment-writer-file-based/pom.xml|  6 +--
 pinot-plugins/pinot-segment-writer/pom.xml |  6 +--
 .../pinot-stream-ingestion/pinot-kafka-2.0/pom.xml |  6 +--
 .../pinot-kafka-base/pom.xml   |  6 +--
 .../pinot-stream-ingestion/pinot-kinesis/pom.xml   | 12 ++
 .../pinot-stream-ingestion/pinot-pulsar/pom.xml|  6 +--
 pinot-plugins/pinot-stream-ingestion/pom.xml   |  6 +--
 pinot-plugins/pom.xml  | 11 ++
 pinot-query-planner/pom.xml|  5 +--
 pinot-query-runtime/pom.xml|  5 +--
 pinot-segment-local/pom.xml|  5 +--
 pinot-segment-spi/pom.xml  |  5 +--
 pinot-server/pom.xml   |  5 +--
 pinot-spi/pom.xml  |  5 +--
 pinot-tools/pom.xml|  5 +--
 pom.xml| 13 +++
 66 files changed, 163 insertions(+), 263 deletions(-)

diff --git a/contrib/pinot-fmpp-maven-plugin/pom.xml 
b/contrib/pinot-fmpp-maven-plugin/pom.xml
index 7ed1a2dab6..5c1c7cc3ac 100644
--- a/contrib/pinot-fmpp-maven-plugin/pom.xml
+++ b/contrib/pinot-fmpp-maven-plugin/pom.xml
@@ -19,14 +19,12 @@
 under the License.
 
 -->
-http://maven.apache.org/POM/4.0.0";
- xmlns:xsi="http

[pinot] annotated tag release-1.0-rc0 updated (e39591750b -> 6da0ff3571)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to annotated tag release-1.0-rc0
in repository https://gitbox.apache.org/repos/asf/pinot.git


*** WARNING: tag release-1.0-rc0 was modified! ***

from e39591750b (commit)
  to 6da0ff3571 (tag)
 tagging e39591750b1c989b5dbd76f3b323f071f3093834 (commit)
  by Saurabh Dubey
  on Thu Aug 10 12:54:11 2023 +0530

- Log -
[maven-release-plugin] copy for tag release-1.0-rc0
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch release-1.0-rc updated: [maven-release-plugin] prepare for next development iteration

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a commit to branch release-1.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


The following commit(s) were added to refs/heads/release-1.0-rc by this push:
 new 49fcd8c128 [maven-release-plugin] prepare for next development 
iteration
49fcd8c128 is described below

commit 49fcd8c12894ac5d7bc438b2ce79a96c9f41a68d
Author: Saurabh Dubey 
AuthorDate: Thu Aug 10 12:54:16 2023 +0530

[maven-release-plugin] prepare for next development iteration
---
 contrib/pinot-fmpp-maven-plugin/pom.xml   | 2 +-
 pinot-broker/pom.xml  | 2 +-
 pinot-clients/pinot-java-client/pom.xml   | 2 +-
 pinot-clients/pinot-jdbc-client/pom.xml   | 2 +-
 pinot-clients/pom.xml | 2 +-
 pinot-common/pom.xml  | 2 +-
 pinot-compatibility-verifier/pom.xml  | 2 +-
 pinot-connectors/pinot-flink-connector/pom.xml| 2 +-
 pinot-connectors/pinot-spark-2-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-3-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-common/pom.xml   | 2 +-
 pinot-connectors/pom.xml  | 2 +-
 pinot-controller/pom.xml  | 2 +-
 pinot-core/pom.xml| 2 +-
 pinot-distribution/pom.xml| 2 +-
 pinot-integration-test-base/pom.xml   | 2 +-
 pinot-integration-tests/pom.xml   | 2 +-
 pinot-minion/pom.xml  | 2 +-
 pinot-perf/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-common/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-standalone/pom.xml| 2 +-
 pinot-plugins/pinot-batch-ingestion/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pinot-azure/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pom.xml   | 2 +-
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml | 2 +-
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml  | 2 +-
 pinot-plugins/pinot-file-system/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro-base/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-clp-log/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-confluent-avro/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-json/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-parquet/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-protobuf/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-thrift/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-dropwizard/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pom.xml  | 2 +-
 .../pinot-segment-uploader/pinot-segment-uploader-default/pom.xml | 2 +-
 pinot-plugins/pinot-segment-uploader/pom.xml  | 2 +-
 .../pinot-segment-writer/pinot-segment-writer-file-based/pom.xml  | 2 +-
 pinot-plugins/pinot-segment-writer/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/pom.xml  | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-base/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kinesis/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-pulsar/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pom.xml 

[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11262: WIP: Vector data type in Pinot

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11262:
URL: https://github.com/apache/pinot/pull/11262#discussion_r1289738443


##
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/Vector.java:
##
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import java.util.Arrays;
+
+
+public class Vector implements Comparable {
+  public enum VectorType {
+FLOAT, INT
+  }
+
+  private int _dimension;
+  private float[] _floatValues;
+  private int[] _intValues;
+  private VectorType _type;
+
+  public Vector(int dimension, float[] values) {
+_dimension = dimension;
+_floatValues = values;
+_type = VectorType.FLOAT;
+  }
+
+  public Vector(int dimension, int[] values) {
+_dimension = dimension;
+_intValues = values;
+_type = VectorType.INT;
+  }
+
+  public int getDimension() {
+return _dimension;
+  }
+
+  public void setDimension(int dimension) {
+_dimension = dimension;
+  }
+
+  @JsonIgnore
+  public float[] getFloatValues() {

Review Comment:
   Let's not mix vector and index.
   
   Vector is just float/int array.
   
   Index is based on different algorithms, distance measure and threshold etc, 
meaning one vector could also have multiple indexes for different use cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11262: WIP: Vector data type in Pinot

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11262:
URL: https://github.com/apache/pinot/pull/11262#discussion_r1289754840


##
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/Vector.java:
##
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import java.util.Arrays;
+
+
+public class Vector implements Comparable {
+  public enum VectorType {
+FLOAT, INT

Review Comment:
   Do you think FLOAT is enough? Where INT could be used?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289769475


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/ExecutorServiceUtils.java:
##
@@ -0,0 +1,57 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.executor;
+
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.common.utils.NamedThreadFactory;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class ExecutorServiceUtils {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(ExecutorServiceUtils.class);
+
+  private ExecutorServiceUtils() {
+  }
+
+  public static ExecutorService createDefault(String baseName) {
+return Executors.newCachedThreadPool(new NamedThreadFactory(baseName));
+  }
+
+  public static ExecutorService create(PinotConfiguration conf, String 
confPrefix, String baseName) {
+//TODO: make this configurable
+return Executors.newCachedThreadPool(new NamedThreadFactory(baseName));
+  }

Review Comment:
   The idea here is that in Java 21 we may want to use a virtual pool here. 
   
   That can be disabled by default (therefore we would need to set a property 
in order to use loom) or enabled by default (in which case we would use a 
property to do not use loom). Given that these executors are created in 
different parts of the code, this util will be used to read the properties in a 
single place.
   
   One thing I don't like about the current approach is that `create` and 
`createDefault` are quite generic. Should be something like `createUnbound` 
because the idea is that we should call this method whenever we want to create 
a unbound thread pool executor



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang opened a new pull request, #11315: Enhance Minion task management

2023-08-10 Thread via GitHub


Jackie-Jiang opened a new pull request, #11315:
URL: https://github.com/apache/pinot/pull/11315

   Fix #11282 
   
   - Remove the unnecessary wait after scheduling a task
   - Enhance other task status APIs to first read workflow/job config to 
determine whether the task exists because when task is just created, the 
context might not be created yet
   
   No new test is added because we already have good coverage on these APIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289819281


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/ExecutorServiceUtils.java:
##
@@ -0,0 +1,57 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.executor;
+
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.common.utils.NamedThreadFactory;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class ExecutorServiceUtils {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(ExecutorServiceUtils.class);
+
+  private ExecutorServiceUtils() {
+  }
+
+  public static ExecutorService createDefault(String baseName) {
+return Executors.newCachedThreadPool(new NamedThreadFactory(baseName));
+  }
+
+  public static ExecutorService create(PinotConfiguration conf, String 
confPrefix, String baseName) {
+//TODO: make this configurable
+return Executors.newCachedThreadPool(new NamedThreadFactory(baseName));
+  }

Review Comment:
   Here you have an example on how to make executors customizable: 
https://github.com/gortiz/pinot/pull/6
   
   We can merge it here if you want, although it may be better to do that in 
another PR to keep this one simpler



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] t0mpere commented on issue #11282: Minion Batch ingestion scheduling bottleneck

2023-08-10 Thread via GitHub


t0mpere commented on issue #11282:
URL: https://github.com/apache/pinot/issues/11282#issuecomment-1672859111

   Thanks this will help a lot 🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289828568


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/ExecutorServiceUtils.java:
##
@@ -0,0 +1,57 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.executor;
+
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.common.utils.NamedThreadFactory;
+import org.apache.pinot.spi.env.PinotConfiguration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class ExecutorServiceUtils {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(ExecutorServiceUtils.class);
+
+  private ExecutorServiceUtils() {
+  }
+
+  public static ExecutorService createDefault(String baseName) {
+return Executors.newCachedThreadPool(new NamedThreadFactory(baseName));
+  }
+
+  public static ExecutorService create(PinotConfiguration conf, String 
confPrefix, String baseName) {
+//TODO: make this configurable
+return Executors.newCachedThreadPool(new NamedThreadFactory(baseName));
+  }
+
+  public static void close(ExecutorService executorService) {

Review Comment:
   Done in d09a481a3df7728cb07286918bc6fb7f599f0e90



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289833588


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlock.java:
##
@@ -188,4 +193,10 @@ private boolean isType(MetadataBlock.MetadataBlockType 
type) {
 MetadataBlock metadata = (MetadataBlock) _dataBlock;
 return metadata.getType() == type;
   }
+
+  @Override
+  public String toString() {
+String blockType = isErrorBlock() ? "error" : 
isSuccessfulEndOfStreamBlock() ? "eos" : "data";
+return "TransferableBlock{blockType=" + blockType + ", _numRows=" + 
_numRows + ", _container=" + _container + '}';

Review Comment:
   Removed in 55d9513640904c5651be88b06bc7a99db5a57c62



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289857744


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/utils/BlockingMultiStreamConsumer.java:
##
@@ -0,0 +1,246 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.operator.utils;
+
+import java.util.List;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.TimeUnit;
+import javax.annotation.Nullable;
+import org.apache.pinot.common.exception.QueryException;
+import org.apache.pinot.query.runtime.blocks.TransferableBlock;
+import org.apache.pinot.query.runtime.blocks.TransferableBlockUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public abstract class BlockingMultiStreamConsumer implements AutoCloseable {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(BlockingMultiStreamConsumer.class);
+  private final Object _id;
+  protected final List> _mailboxes;
+  protected final ArrayBlockingQueue _newDataReady = new 
ArrayBlockingQueue<>(1);
+  private final long _deadlineMs;
+  /**
+   * An index that used to calculate where do we are going to start reading.
+   * The invariant is that we are always going to start reading from {@code 
_lastRead + 1}.
+   * Therefore {@link #_lastRead} must be in the range {@code [-1, 
mailbox.size() - 1]}
+   */
+  protected int _lastRead;
+  private E _errorBlock = null;
+
+  public BlockingMultiStreamConsumer(Object id, long deadlineMs, List> asyncProducers) {
+_id = id;
+_deadlineMs = deadlineMs;
+AsyncStream.OnNewData onNewData = this::onData;
+_mailboxes = asyncProducers;
+_mailboxes.forEach(blockProducer -> 
blockProducer.addOnNewDataListener(onNewData));
+_lastRead = _mailboxes.size() - 1;
+  }
+
+  protected abstract boolean isError(E element);
+
+  protected abstract boolean isEos(E element);
+
+  protected abstract E onTimeout();
+
+  protected abstract E onException(Exception e);
+
+  protected abstract E onEos();
+
+  @Override
+  public void close() {
+cancelRemainingMailboxes();
+  }
+
+  public void cancel(Throwable t) {
+cancelRemainingMailboxes();
+  }
+
+  protected void cancelRemainingMailboxes() {
+for (AsyncStream mailbox : _mailboxes) {
+  mailbox.cancel();
+}
+  }
+
+  public void onData() {
+if (_newDataReady.offer(Boolean.TRUE)) {
+  if (LOGGER.isTraceEnabled()) {
+LOGGER.trace("New data notification delivered on " + _id + ". " + 
System.identityHashCode(_newDataReady));
+  }
+} else if (LOGGER.isTraceEnabled()) {
+  LOGGER.trace("New data notification ignored on " + _id + ". " + 
System.identityHashCode(_newDataReady));
+}
+  }
+
+  /**
+   * Reads the next block for any ready mailbox or blocks until some of them 
is ready.
+   *
+   * The method implements a sequential read semantic. Meaning that:
+   * 
+   *   EOS is only returned when all mailboxes already emitted EOS or 
there are no mailboxes
+   *   If an error is read from a mailbox, the error is returned
+   *   If data is read from a mailbox, that data block is returned
+   *   If no mailbox is ready, the calling thread is blocked
+   * 
+   *
+   * Right now the implementation tries to be fair. If one call returned the 
block from mailbox {@code i}, then next
+   * call will look for mailbox {@code i+1}, {@code i+2}... in a circular 
manner.
+   *
+   * In order to unblock a thread blocked here, {@link #onData()} should be 
called.   *
+   */
+  public E readBlockBlocking() {
+if (LOGGER.isTraceEnabled()) {
+  LOGGER.trace("==[RECEIVE]== Enter getNextBlock from: " + _id + " 
mailboxSize: " + _mailboxes.size());
+}
+// Standard optimistic execution. First we try to read without acquiring 
the lock.
+E block = readDroppingSuccessEos();
+long timeoutMs = _deadlineMs - System.currentTimeMillis();
+boolean timeout = false;
+try {
+  while (block == null) { // we didn't find a mailbox ready to read, so we 
need to be pessimistic

Review Comment:
   > Once we reach the timeout we should immediately return timeout error.
   
   We do, although the code may not be 100% clea

[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289862143


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/utils/BlockingMultiStreamConsumer.java:
##
@@ -0,0 +1,246 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.operator.utils;
+
+import java.util.List;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.TimeUnit;
+import javax.annotation.Nullable;
+import org.apache.pinot.common.exception.QueryException;
+import org.apache.pinot.query.runtime.blocks.TransferableBlock;
+import org.apache.pinot.query.runtime.blocks.TransferableBlockUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public abstract class BlockingMultiStreamConsumer implements AutoCloseable {
+  private static final Logger LOGGER = 
LoggerFactory.getLogger(BlockingMultiStreamConsumer.class);
+  private final Object _id;
+  protected final List> _mailboxes;
+  protected final ArrayBlockingQueue _newDataReady = new 
ArrayBlockingQueue<>(1);
+  private final long _deadlineMs;
+  /**
+   * An index that used to calculate where do we are going to start reading.
+   * The invariant is that we are always going to start reading from {@code 
_lastRead + 1}.
+   * Therefore {@link #_lastRead} must be in the range {@code [-1, 
mailbox.size() - 1]}
+   */
+  protected int _lastRead;
+  private E _errorBlock = null;
+
+  public BlockingMultiStreamConsumer(Object id, long deadlineMs, List> asyncProducers) {
+_id = id;
+_deadlineMs = deadlineMs;
+AsyncStream.OnNewData onNewData = this::onData;
+_mailboxes = asyncProducers;
+_mailboxes.forEach(blockProducer -> 
blockProducer.addOnNewDataListener(onNewData));
+_lastRead = _mailboxes.size() - 1;
+  }
+
+  protected abstract boolean isError(E element);
+
+  protected abstract boolean isEos(E element);
+
+  protected abstract E onTimeout();
+
+  protected abstract E onException(Exception e);
+
+  protected abstract E onEos();
+
+  @Override
+  public void close() {
+cancelRemainingMailboxes();
+  }
+
+  public void cancel(Throwable t) {
+cancelRemainingMailboxes();
+  }
+
+  protected void cancelRemainingMailboxes() {
+for (AsyncStream mailbox : _mailboxes) {
+  mailbox.cancel();
+}
+  }
+
+  public void onData() {
+if (_newDataReady.offer(Boolean.TRUE)) {
+  if (LOGGER.isTraceEnabled()) {
+LOGGER.trace("New data notification delivered on " + _id + ". " + 
System.identityHashCode(_newDataReady));
+  }
+} else if (LOGGER.isTraceEnabled()) {
+  LOGGER.trace("New data notification ignored on " + _id + ". " + 
System.identityHashCode(_newDataReady));
+}
+  }
+
+  /**
+   * Reads the next block for any ready mailbox or blocks until some of them 
is ready.
+   *
+   * The method implements a sequential read semantic. Meaning that:
+   * 
+   *   EOS is only returned when all mailboxes already emitted EOS or 
there are no mailboxes
+   *   If an error is read from a mailbox, the error is returned
+   *   If data is read from a mailbox, that data block is returned
+   *   If no mailbox is ready, the calling thread is blocked
+   * 
+   *
+   * Right now the implementation tries to be fair. If one call returned the 
block from mailbox {@code i}, then next
+   * call will look for mailbox {@code i+1}, {@code i+2}... in a circular 
manner.
+   *
+   * In order to unblock a thread blocked here, {@link #onData()} should be 
called.   *
+   */
+  public E readBlockBlocking() {
+if (LOGGER.isTraceEnabled()) {
+  LOGGER.trace("==[RECEIVE]== Enter getNextBlock from: " + _id + " 
mailboxSize: " + _mailboxes.size());
+}
+// Standard optimistic execution. First we try to read without acquiring 
the lock.
+E block = readDroppingSuccessEos();
+long timeoutMs = _deadlineMs - System.currentTimeMillis();
+boolean timeout = false;
+try {
+  while (block == null) { // we didn't find a mailbox ready to read, so we 
need to be pessimistic
+if (LOGGER.isDebugEnabled()) {
+  LOGGER.debug("==[RECEIVE]== Blocked on : " + _id + ". " + 
System.identityHashCode(_newDataRe

[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289865538


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/TransformOperator.java:
##
@@ -86,13 +86,14 @@ protected TransferableBlock getNextBlock() {
 try {
   TransferableBlock block = _upstreamOperator.nextBlock();
   return transform(block);
-} catch (Exception e) {
+} catch (RuntimeException e) {

Review Comment:
   I don't actually know what we want or should do here. I'm keeping the old 
behavior but being more specific.
   
   Previous code was catching `Exception` while no checked exception was thrown 
here. So I assumed that even older code was throwing a checked exception. Now 
that only runtime exceptions are thrown I think it is better to explicitly only 
catch `RuntimeException`s.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289872541


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultiStageOperator.java:
##
@@ -44,6 +45,7 @@ public MultiStageOperator(OpChainExecutionContext context) {
 _opChainStats = _context.getStats();
   }
 
+  @Nonnull

Review Comment:
   I don't know if I told you, but I was used to work in kotlin, when 
nullability is checked at compile time. That is why I prefer to have these 
warnings and deal with them at compile time.
   
   But I understand that is not something we would like to add here. I've just 
add the annotations trying to get some help from the IDE to solve some NPE I've 
found while doing the PR.
   
   Removing them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289872541


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultiStageOperator.java:
##
@@ -44,6 +45,7 @@ public MultiStageOperator(OpChainExecutionContext context) {
 _opChainStats = _context.getStats();
   }
 
+  @Nonnull

Review Comment:
   I don't know if I told you, but I was used to work in kotlin, when 
nullability is checked at compile time. That is why I prefer to have these 
warnings and deal with them at compile time.
   
   But I understand that is not something we would like to add here. I've just 
add the annotations trying to get some help from the IDE to solve some NPE I've 
found while doing the PR.
   
   Removing them in a26c79a7edb3a0d67da0979de548d695f810e1f0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289877958


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MailboxSendOperator.java:
##
@@ -118,46 +118,52 @@ public String toExplainString() {
 
   @Override
   protected TransferableBlock getNextBlock() {
-boolean canContinue = true;
 TransferableBlock transferableBlock;
+if (LOGGER.isDebugEnabled()) {
+  LOGGER.debug("==[SEND]== Enter getNextBlock from: " + _context.getId());
+}
 try {
   transferableBlock = _sourceOperator.nextBlock();
-  if (transferableBlock.isNoOpBlock()) {
-return transferableBlock;
-  } else if (transferableBlock.isEndOfStreamBlock()) {
-if (transferableBlock.isSuccessfulEndOfStreamBlock()) {
-  // Stats need to be populated here because the block is being sent 
to the mailbox
-  // and the receiving opChain will not be able to access the stats 
from the previous opChain
-  TransferableBlock eosBlockWithStats = 
TransferableBlockUtils.getEndOfStreamTransferableBlock(
-  
OperatorUtils.getMetadataFromOperatorStats(_opChainStats.getOperatorStatsMap()));
-  sendTransferableBlock(eosBlockWithStats);
-} else {
-  sendTransferableBlock(transferableBlock);
-}
-  } else { // normal blocks
-// check whether we should continue depending on exchange queue 
condition.
-canContinue = sendTransferableBlock(transferableBlock);
+  if (transferableBlock.isSuccessfulEndOfStreamBlock()) {
+// Stats need to be populated here because the block is being sent to 
the mailbox
+// and the receiving opChain will not be able to access the stats from 
the previous opChain
+TransferableBlock eosBlockWithStats = 
TransferableBlockUtils.getEndOfStreamTransferableBlock(
+
OperatorUtils.getMetadataFromOperatorStats(_opChainStats.getOperatorStatsMap()));
+sendTransferableBlock(eosBlockWithStats, false);
+  } else {
+sendTransferableBlock(transferableBlock, true);
   }
 } catch (Exception e) {
   transferableBlock = TransferableBlockUtils.getErrorTransferableBlock(e);
   try {
 LOGGER.error("Exception while transferring data on opChain: " + 
_context.getId(), e);
-sendTransferableBlock(transferableBlock);
+sendTransferableBlock(transferableBlock, false);
   } catch (Exception e2) {
 LOGGER.error("Exception while sending error block.", e2);
   }
 }
-// yield if we cannot continue to put transferable block into the sending 
queue
-return canContinue ? transferableBlock : 
TransferableBlockUtils.getNoOpTransferableBlock();
+return transferableBlock;
   }
 
-  private boolean sendTransferableBlock(TransferableBlock block)
+  private void sendTransferableBlock(TransferableBlock block, boolean 
throwIfTimeout)

Review Comment:
   The idea here is that sometimes we don't want to do that. Specially when we 
found an exception in the happy path. There we send the exception downstream as 
an error block. In case that error blocks times out... what do we want to do?
   
   If we decide to throw a timeout there we would hide the original exception. 
IIRC @walterddr's original code is the one that introduce this _I don't want to 
throw timeout_ parameter and I though it was a good idea. But we can discuss 
about that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] codecov-commenter commented on pull request #11315: Enhance Minion task management

2023-08-10 Thread via GitHub


codecov-commenter commented on PR #11315:
URL: https://github.com/apache/pinot/pull/11315#issuecomment-1672924075

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/11315?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   > Merging 
[#11315](https://app.codecov.io/gh/apache/pinot/pull/11315?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (04e6892) into 
[master](https://app.codecov.io/gh/apache/pinot/commit/ee4044c54bbe0613ff5d52b218577ce19865fbf8?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (ee4044c) will **decrease** coverage by `0.12%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@Coverage Diff @@
   ##   master   #11315  +/-   ##
   ==
   - Coverage0.11%0.00%   -0.12% 
   ==
 Files2231 2215  -16 
 Lines  120139   119711 -428 
 Branches1821818165  -53 
   ==
   - Hits  1370 -137 
   + Misses 119982   119711 -271 
   + Partials   200  -20 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `0.00% <0.00%> (ø)` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Files 
Changed](https://app.codecov.io/gh/apache/pinot/pull/11315?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[...lix/core/minion/PinotHelixTaskResourceManager.java](https://app.codecov.io/gh/apache/pinot/pull/11315?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdEhlbGl4VGFza1Jlc291cmNlTWFuYWdlci5qYXZh)
 | `0.00% <0.00%> (ø)` | |
   | 
[...elix/core/minion/generator/TaskGeneratorUtils.java](https://app.codecov.io/gh/apache/pinot/pull/11315?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvVGFza0dlbmVyYXRvclV0aWxzLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   
   ... and [18 files with indirect coverage 
changes](https://app.codecov.io/gh/apache/pinot/pull/11315/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD 
build times. [Learn 
more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289886813


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/mailbox/GrpcSendingMailbox.java:
##
@@ -61,6 +61,9 @@ public GrpcSendingMailbox(String id, ChannelManager 
channelManager, String hostn
   @Override
   public void send(TransferableBlock block)
   throws IOException {
+if (LOGGER.isDebugEnabled()) {
+  LOGGER.debug("==[GRPC SEND]== sending data to: " + _id);

Review Comment:
   There are several ways to do the logging. Usually the `{}` is the best in 
terms of readability vs performance. But in terms of performance, what I'm 
doing here is better. Specifically, order in performance (from higher to lower) 
is:
   1. add if ward and use + in the log. We don't use this option that much 
because it is the worse in terms of readibility
   2. use `{}`. This requires to call a `LOGGER.debug(String, Object...)` 
method, so it creates an array. In case `debug` is disabled, the array is not 
used, but it is very probable that the JIT will detect that and do not allocate 
the array. In that case the performance of this option should be as good as 1, 
but it is not guaranteed that the JIT will be smart enough. In case debug is 
enabled, the array will always be created and therefore performance will be 
better in 1.
   4. use concatenation everywhere. This is the worse if debug is disabled 
because we need to call toString in all objects and we need to do the 
concatenation... just to discover that we don't actually need the String. JIT 
may get rid of this, but seems very unlikely given how large the code may be 
and the implications toString may have.
   
   We can also add the if and use `{}` inside. It would probably be faster than 
2, but not so much and legibility won't be great.
   
   In this case @walterddr was worried about the performance implications of 
the extra logs he added, so I tried to use the solution that give us more 
performance in order to reduce that impact to the minimum



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289954302


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/OpChainSchedulerService.java:
##
@@ -31,151 +32,73 @@
 import org.slf4j.LoggerFactory;
 
 
-/**
- * This class provides the implementation for scheduling multistage queries on 
a single node based
- * on the {@link OpChainScheduler} logic that is passed in. Multistage queries 
support partial execution
- * and will return a NOOP metadata block as a "yield" signal, indicating that 
the next operator
- * chain ({@link OpChainScheduler#next} will be requested.
- */
-@SuppressWarnings("UnstableApiUsage")
-public class OpChainSchedulerService extends AbstractExecutionThreadService {
+public class OpChainSchedulerService implements SchedulerService {
   private static final Logger LOGGER = 
LoggerFactory.getLogger(OpChainSchedulerService.class);
-  /**
-   * Default time scheduler is allowed to wait for a runnable OpChain to be 
available.
-   */
-  private static final long DEFAULT_SCHEDULER_NEXT_WAIT_MS = 100;
-  /**
-   * Default cancel signal retention, this should be set to several times 
larger than
-   * {@link 
org.apache.pinot.query.service.QueryConfig#DEFAULT_SCHEDULER_RELEASE_TIMEOUT_MS}.
-   */
-  private static final long SCHEDULER_CANCELLATION_SIGNAL_RETENTION_MS = 
60_000L;
 
-  private final OpChainScheduler _scheduler;
-  private final ExecutorService _workerPool;
-  private final Cache _cancelledRequests = 
CacheBuilder.newBuilder()
-  .expireAfterWrite(SCHEDULER_CANCELLATION_SIGNAL_RETENTION_MS, 
TimeUnit.MILLISECONDS).build();
+  private final ExecutorService _executorService;
+  private final ConcurrentHashMap> _submittedOpChainMap;
 
-  public OpChainSchedulerService(OpChainScheduler scheduler, ExecutorService 
workerPool) {
-_scheduler = scheduler;
-_workerPool = workerPool;
+  public OpChainSchedulerService(ExecutorService executorService) {
+_executorService = executorService;
+_submittedOpChainMap = new ConcurrentHashMap<>();
   }
 
   @Override
-  protected void triggerShutdown() {
-// TODO: Figure out shutdown lifecycle with graceful shutdown in mind.
-LOGGER.info("Triggered shutdown on OpChainScheduler...");
-  }
-
-  @Override
-  protected void run()
-  throws Exception {
-while (isRunning()) {
-  OpChain operatorChain = _scheduler.next(DEFAULT_SCHEDULER_NEXT_WAIT_MS, 
TimeUnit.MILLISECONDS);
-  if (operatorChain == null) {
-continue;
-  }
-  LOGGER.trace("({}): Scheduling", operatorChain);
-  _workerPool.submit(new TraceRunnable() {
-@Override
-public void runJob() {
-  boolean isFinished = false;
-  boolean returnedErrorBlock = false;
-  Throwable thrown = null;
-  try {
-LOGGER.trace("({}): Executing", operatorChain);
-// throw if the operatorChain is cancelled.
-if 
(_cancelledRequests.asMap().containsKey(operatorChain.getId().getRequestId())) {
-  throw new InterruptedException("Query was cancelled!");
-}
-operatorChain.getStats().executing();
-// so long as there's work to be done, keep getting the next block
-// when the operator chain returns a NOOP block, then yield the 
execution
-// of this to another worker
-TransferableBlock result = operatorChain.getRoot().nextBlock();
-while (!result.isNoOpBlock() && !result.isEndOfStreamBlock()) {
-  result = operatorChain.getRoot().nextBlock();
-}
-
-if (result.isNoOpBlock()) {
-  // TODO: There should be a waiting-for-data state in 
OpChainStats.
-  operatorChain.getStats().queued();
-  _scheduler.yield(operatorChain);
-} else {
-  isFinished = true;
-  if (result.isErrorBlock()) {
-returnedErrorBlock = true;
-LOGGER.error("({}): Completed erroneously {} {}", 
operatorChain, operatorChain.getStats(),
-result.getDataBlock().getExceptions());
-  } else {
-LOGGER.debug("({}): Completed {}", operatorChain, 
operatorChain.getStats());
-  }
-}
-  } catch (Exception e) {
-LOGGER.error("({}): Failed to execute operator chain! {}", 
operatorChain, operatorChain.getStats(), e);
-thrown = e;
-  } finally {
-if (returnedErrorBlock || thrown != null) {
-  cancelOpChain(operatorChain, thrown);
-} else if (isFinished) {
-  closeOpChain(operatorChain);
+  public void register(OpChain operatorChain) {
+Future scheduledFuture = _executorService.submit(new TraceRunnable() {
+  @Override
+  public void runJob() {
+boolean isFinished = false;
+TransferableBlock returnedErrorBlock = null;

[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289956471


##
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/MultiStageBrokerRequestHandler.java:
##
@@ -113,14 +113,11 @@ public MultiStageBrokerRequestHandler(PinotConfiguration 
config, String brokerId
 new WorkerManager(_reducerHostname, _reducerPort, routingManager), 
_tableCache);
 _queryDispatcher = new QueryDispatcher();
 
-long releaseMs = 
config.getProperty(QueryConfig.KEY_OF_SCHEDULER_RELEASE_TIMEOUT_MS,
-QueryConfig.DEFAULT_SCHEDULER_RELEASE_TIMEOUT_MS);
-_reducerScheduler = new OpChainSchedulerService(new 
RoundRobinScheduler(releaseMs),
-Executors.newCachedThreadPool(new 
NamedThreadFactory("query_broker_reducer_" + _reducerPort + "_port")));
-_mailboxService = new MailboxService(_reducerHostname, _reducerPort, 
config, _reducerScheduler::onDataAvailable);
+_opChainExecutor = ExecutorServiceUtils.create(config, "multiStage", 
"multiStage_on_" + _reducerPort + "port");

Review Comment:
   Old name recovered  in 93f929f378b9ff74f39bcec6d3f6e2db8827ce79



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289833588


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/blocks/TransferableBlock.java:
##
@@ -188,4 +193,10 @@ private boolean isType(MetadataBlock.MetadataBlockType 
type) {
 MetadataBlock metadata = (MetadataBlock) _dataBlock;
 return metadata.getType() == type;
   }
+
+  @Override
+  public String toString() {
+String blockType = isErrorBlock() ? "error" : 
isSuccessfulEndOfStreamBlock() ? "eos" : "data";
+return "TransferableBlock{blockType=" + blockType + ", _numRows=" + 
_numRows + ", _container=" + _container + '}';

Review Comment:
   Removed in 55d9513640904c5651be88b06bc7a99db5a57c62 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289961342


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/SetOperator.java:
##
@@ -112,15 +112,14 @@ protected void constructRightBlockSet() {
 
   protected TransferableBlock constructResultBlockSet(TransferableBlock 
leftBlock) {
 List rows = new ArrayList<>();
+// TODO: Other operators keep the first erroneous block, while this keep 
the last.
+//  We should decide what is what we want to do and be consistent with 
that.

Review Comment:
   I think once we read an error from an operator, we don't read that operator 
again. But in case we do, what should we return? The first error or a new one? 
Right now some operators keep the first error emitted and always return the 
same error afterwards, but others create a new error each time they are being 
called.
   
   My TODO is to try to homogenize that to either always return the first error 
or always create a new one. Alternative, we can define that we cannot call 
`getNextBlock()` (or read from mailboxes) once the first error is emitted. In 
that case we should throw an exception.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289961773


##
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/executor/OpChainSchedulerServiceTest.java:
##
@@ -203,84 +143,72 @@ public void 
shouldCallCancelOnOperatorsThatReturnErrorBlock()
   return null;
 }).when(_operatorA).cancel(Mockito.any());
 
-schedulerService.startAsync().awaitRunning();
 schedulerService.register(opChain);
 
 Assert.assertTrue(latch.await(10, TimeUnit.SECONDS), "expected await to be 
called in less than 10 seconds");
-schedulerService.stopAsync().awaitTerminated();
   }
 
-  @Test
-  public void shouldCallCancelOnOpChainsWhenItIsCancelledByDispatch()
-  throws InterruptedException {
-initExecutor(1);
-OpChain opChain = getChain(_operatorA);
-Mockito.when(_scheduler.next(Mockito.anyLong(), 
Mockito.any())).thenAnswer((Answer) invocation -> {
-  Thread.sleep(100);
-  return opChain;
-});
-OpChainSchedulerService schedulerService = new 
OpChainSchedulerService(_scheduler, _executor);
-
-
Mockito.when(_operatorA.nextBlock()).thenReturn(TransferableBlockUtils.getNoOpTransferableBlock());
-
-CountDownLatch cancelLatch = new CountDownLatch(1);
-Mockito.doAnswer(inv -> {
-  cancelLatch.countDown();
-  return null;
-}).when(_operatorA).cancel(Mockito.any());
-CountDownLatch deregisterLatch = new CountDownLatch(1);
-Mockito.doAnswer(inv -> {
-  deregisterLatch.countDown();
-  return null;
-}).when(_scheduler).deregister(Mockito.same(opChain));
-CountDownLatch awaitLatch = new CountDownLatch(1);
-Mockito.doAnswer(inv -> {
-  awaitLatch.countDown();
-  return null;
-}).when(_scheduler).yield(Mockito.any());
-
-schedulerService.startAsync().awaitRunning();
-schedulerService.register(opChain);
-
-Assert.assertTrue(awaitLatch.await(10, TimeUnit.SECONDS), "expected await 
to be called in less than 10 seconds");
-
-// now cancel the request.
-schedulerService.cancel(123);
-
-Assert.assertTrue(cancelLatch.await(10, TimeUnit.SECONDS), "expected 
OpChain to be cancelled");
-Assert.assertTrue(deregisterLatch.await(10, TimeUnit.SECONDS), "expected 
OpChain to be deregistered");
-Mockito.verify(_operatorA, Mockito.times(1)).cancel(Mockito.any());
-Mockito.verify(_scheduler, Mockito.times(1)).deregister(Mockito.any());
-schedulerService.stopAsync().awaitTerminated();
-  }
+//  @Test
+//  public void shouldCallCancelOnOpChainsWhenItIsCancelledByDispatch()

Review Comment:
   What do you think we should do there?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1289962057


##
pinot-query-runtime/src/test/java/org/apache/pinot/query/runtime/operator/MailboxReceiveOperatorTest.java:
##
@@ -139,21 +138,11 @@ public void shouldTimeoutOnExtraLongSleep()
   MetadataBlock errorBlock = (MetadataBlock) mailbox.getDataBlock();
   
assertTrue(errorBlock.getExceptions().containsKey(QueryException.EXECUTION_TIMEOUT_ERROR_CODE));
 }
-
-// Longer timeout or default timeout (10s) doesn't result in timeout
-context =
-OperatorTestUtil.getOpChainContext(_mailboxService, RECEIVER_ADDRESS, 
System.currentTimeMillis() + 10_000L,
-_stageMetadata1);
-try (MailboxReceiveOperator receiveOp = new 
MailboxReceiveOperator(context, RelDistribution.Type.SINGLETON, 1)) {
-  Thread.sleep(100L);
-  TransferableBlock mailbox = receiveOp.nextBlock();

Review Comment:
   @Jackie-Jiang I guess we should just remove this tests?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] gortiz commented on a diff in pull request #11205: Remove noop

2023-08-10 Thread via GitHub


gortiz commented on code in PR #11205:
URL: https://github.com/apache/pinot/pull/11205#discussion_r1290051875


##
pinot-query-runtime/src/main/java/org/apache/pinot/query/runtime/executor/SchedulerService.java:
##
@@ -0,0 +1,29 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.query.runtime.executor;
+
+import org.apache.pinot.query.runtime.operator.OpChain;
+
+
+public interface SchedulerService {

Review Comment:
   Removed in ace781b6a7a1f500bd51fc7bcf7879d39b4b6439



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] walterddr commented on a diff in pull request #11151: [multistage] Derive SUM return type to be PostgreSQL compatible

2023-08-10 Thread via GitHub


walterddr commented on code in PR #11151:
URL: https://github.com/apache/pinot/pull/11151#discussion_r1290098860


##
pinot-query-planner/src/test/resources/queries/AggregatePlans.json:
##
@@ -21,7 +21,7 @@
 "sql": "EXPLAIN PLAN FOR SELECT AVG(a.col4) as avg, SUM(a.col4) as 
sum, MAX(a.col4) as max FROM a WHERE a.col3 >= 0 AND a.col2 = 'pink floyd'",
 "output": [
   "Execution Plan",
-  "\nLogicalProject(avg=[/(CASE(=($1, 0), null:DECIMAL(1000, 0), $0), 
$1)], sum=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], max=[$2])",
+  "\nLogicalProject(avg=[CAST(/(CASE(=($1, 0), null:DECIMAL(1000, 0), 
$0), $1)):DECIMAL(1000, 0)], sum=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], 
max=[$2])",

Review Comment:
   the `cast`s are actually being done as part of the transform operation. it 
might not be ideal but i think for now we should be good (let's add a comment?)



##
pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeSystem.java:
##
@@ -32,6 +32,9 @@ public class TypeSystem extends RelDataTypeSystemImpl {
   private static final int MAX_DECIMAL_SCALE_DIGIT = 1000;
   private static final int MAX_DECIMAL_PRECISION_DIGIT = 1000;
 
+  private static final int DERIVED_DECIMAL_PRECISION = 19;
+  private static final int DERIVED_DECIMAL_SCALE = 1;

Review Comment:
   add some comments regarding these 2 constant? 
   (also nit can you remove the `_DIGIT` in the previous 2 constants?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] abhioncbr commented on issue #11300: Comma Join Failing

2023-08-10 Thread via GitHub


abhioncbr commented on issue #11300:
URL: https://github.com/apache/pinot/issues/11300#issuecomment-1673212186

   Fixed, closing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] abhioncbr closed issue #11300: Comma Join Failing

2023-08-10 Thread via GitHub


abhioncbr closed issue #11300: Comma Join Failing 
URL: https://github.com/apache/pinot/issues/11300


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] swaminathanmanish commented on a diff in pull request #11315: Enhance Minion task management

2023-08-10 Thread via GitHub


swaminathanmanish commented on code in PR #11315:
URL: https://github.com/apache/pinot/pull/11315#discussion_r1290290403


##
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java:
##
@@ -118,25 +117,19 @@ public synchronized Set getTaskTypes() {
*
* @param taskType Task type
*/
-  public void ensureTaskQueueExists(String taskType) {
+  public synchronized void ensureTaskQueueExists(String taskType) {
 String helixJobQueueName = getHelixJobQueueName(taskType);
 WorkflowConfig workflowConfig = 
_taskDriver.getWorkflowConfig(helixJobQueueName);
-
 if (workflowConfig == null) {
   // Task queue does not exist
   LOGGER.info("Creating task queue: {} for task type: {}", 
helixJobQueueName, taskType);
 
   // Set full parallelism
   // Don't allow overlap job assignment so that we can control number of 
concurrent tasks per instance
-  JobQueue jobQueue = new JobQueue.Builder(helixJobQueueName)
-  .setWorkflowConfig(new 
WorkflowConfig.Builder().setParallelJobs(Integer.MAX_VALUE).build()).build();
+  JobQueue jobQueue = new 
JobQueue.Builder(helixJobQueueName).setWorkflowConfig(
+  new 
WorkflowConfig.Builder().setParallelJobs(Integer.MAX_VALUE).build()).build();
   _taskDriver.createQueue(jobQueue);
 }
-
-// Wait until task queue context shows up
-while (_taskDriver.getWorkflowContext(helixJobQueueName) == null) {

Review Comment:
   Could you provide context on why this was added and why it can be removed 
now ?  Same comment for the other while loop in  submitTask



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] kirkrodrigues commented on a diff in pull request #11210: Add JsonLogTransformer to transform semi-structured log events to fit a table's schema without dropping fields.

2023-08-10 Thread via GitHub


kirkrodrigues commented on code in PR #11210:
URL: https://github.com/apache/pinot/pull/11210#discussion_r1290341303


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/JsonLogTransformer.java:
##
@@ -0,0 +1,534 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.recordtransformer;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.common.base.Preconditions;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.ingestion.JsonLogTransformerConfig;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.stream.StreamDataDecoderImpl;
+import org.apache.pinot.spi.utils.JsonUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * This transformer transforms a record representing a JSON log event such 
that it can be stored in a table. JSON log
+ * events typically have a user-defined schema, so it is impractical to store 
each field in its own table column. At the
+ * same time, most (if not all) fields are important to the user, so we should 
not drop any field unnecessarily. Thus,
+ * this transformer primarily takes record-fields that don't exist in the 
schema and stores them in a type of catchall
+ * field.
+ * 
+ * For example, consider this log event:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags": {
+ * "platform": "data",
+ * "service": "serializer",
+ * "params": {
+ *   "queueLength": 5,
+ *   "timeout": 299,
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * And let's say the table's schema contains these fields:
+ * 
+ *   timestamp
+ *   hostname
+ *   level
+ *   message
+ *   tags.platform
+ *   tags.service
+ *   indexableExtras
+ *   unindexableExtras
+ * 
+ * 
+ * Without this transformer, the entire "tags" field would be dropped when 
storing the record in the table. However,
+ * with this transformer, the record would be transformed into the following:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags.platform": "data",
+ *   "tags.service": "serializer",
+ *   "indexableExtras": {
+ * "tags": {
+ *   "params": {
+ * "queueLength": 5,
+ * "timeout": 299
+ *   }
+ * }
+ *   },
+ *   "unindexableExtras": {
+ * "tags": {
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * Notice that the transformer:
+ * 
+ *   Flattens nested fields which exist in the schema, like 
"tags.platform"
+ *   Moves fields which don't exist in the schema into the 
"indexableExtras" field
+ *   Moves fields which don't exist in the schema and have the suffix 
"_noIndex" into the "unindexableExtras"
+ *   field
+ * 
+ * 
+ * The "unindexableExtras" field allows the transformer to separate fields 
which don't need indexing (because they are
+ * only retrieved, not searched) from those that do. The transformer also has 
other configuration options specified in
+ * {@link JsonLogTransformerConfig}.
+ * 
+ * One notable complication that this class handles is adding nested fields to 
the "extras" fields. E.g., consider

Review Comment:
   Good point. Moved this section to `processFields`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [pinot] kirkrodrigues commented on a diff in pull request #11210: Add JsonLogTransformer to transform semi-structured log events to fit a table's schema without dropping fields.

2023-08-10 Thread via GitHub


kirkrodrigues commented on code in PR #11210:
URL: https://github.com/apache/pinot/pull/11210#discussion_r1290341638


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/JsonLogTransformer.java:
##
@@ -0,0 +1,534 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.recordtransformer;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.common.base.Preconditions;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.ingestion.JsonLogTransformerConfig;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.stream.StreamDataDecoderImpl;
+import org.apache.pinot.spi.utils.JsonUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * This transformer transforms a record representing a JSON log event such 
that it can be stored in a table. JSON log
+ * events typically have a user-defined schema, so it is impractical to store 
each field in its own table column. At the
+ * same time, most (if not all) fields are important to the user, so we should 
not drop any field unnecessarily. Thus,
+ * this transformer primarily takes record-fields that don't exist in the 
schema and stores them in a type of catchall
+ * field.
+ * 
+ * For example, consider this log event:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags": {
+ * "platform": "data",
+ * "service": "serializer",
+ * "params": {
+ *   "queueLength": 5,
+ *   "timeout": 299,
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * And let's say the table's schema contains these fields:
+ * 
+ *   timestamp
+ *   hostname
+ *   level
+ *   message
+ *   tags.platform
+ *   tags.service
+ *   indexableExtras
+ *   unindexableExtras
+ * 
+ * 
+ * Without this transformer, the entire "tags" field would be dropped when 
storing the record in the table. However,
+ * with this transformer, the record would be transformed into the following:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags.platform": "data",
+ *   "tags.service": "serializer",
+ *   "indexableExtras": {
+ * "tags": {
+ *   "params": {
+ * "queueLength": 5,
+ * "timeout": 299
+ *   }
+ * }
+ *   },
+ *   "unindexableExtras": {
+ * "tags": {
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * Notice that the transformer:
+ * 
+ *   Flattens nested fields which exist in the schema, like 
"tags.platform"
+ *   Moves fields which don't exist in the schema into the 
"indexableExtras" field
+ *   Moves fields which don't exist in the schema and have the suffix 
"_noIndex" into the "unindexableExtras"
+ *   field
+ * 
+ * 
+ * The "unindexableExtras" field allows the transformer to separate fields 
which don't need indexing (because they are
+ * only retrieved, not searched) from those that do. The transformer also has 
other configuration options specified in

Review Comment:
   Good catch!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] kirkrodrigues commented on a diff in pull request #11210: Add JsonLogTransformer to transform semi-structured log events to fit a table's schema without dropping fields.

2023-08-10 Thread via GitHub


kirkrodrigues commented on code in PR #11210:
URL: https://github.com/apache/pinot/pull/11210#discussion_r1290342559


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/JsonLogTransformer.java:
##
@@ -0,0 +1,534 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.recordtransformer;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.common.base.Preconditions;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.ingestion.JsonLogTransformerConfig;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.stream.StreamDataDecoderImpl;
+import org.apache.pinot.spi.utils.JsonUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * This transformer transforms a record representing a JSON log event such 
that it can be stored in a table. JSON log
+ * events typically have a user-defined schema, so it is impractical to store 
each field in its own table column. At the
+ * same time, most (if not all) fields are important to the user, so we should 
not drop any field unnecessarily. Thus,
+ * this transformer primarily takes record-fields that don't exist in the 
schema and stores them in a type of catchall
+ * field.
+ * 
+ * For example, consider this log event:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags": {
+ * "platform": "data",
+ * "service": "serializer",
+ * "params": {
+ *   "queueLength": 5,
+ *   "timeout": 299,
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * And let's say the table's schema contains these fields:
+ * 
+ *   timestamp
+ *   hostname
+ *   level
+ *   message
+ *   tags.platform
+ *   tags.service
+ *   indexableExtras
+ *   unindexableExtras
+ * 
+ * 
+ * Without this transformer, the entire "tags" field would be dropped when 
storing the record in the table. However,
+ * with this transformer, the record would be transformed into the following:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags.platform": "data",
+ *   "tags.service": "serializer",
+ *   "indexableExtras": {
+ * "tags": {
+ *   "params": {
+ * "queueLength": 5,
+ * "timeout": 299
+ *   }
+ * }
+ *   },
+ *   "unindexableExtras": {
+ * "tags": {
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * Notice that the transformer:
+ * 
+ *   Flattens nested fields which exist in the schema, like 
"tags.platform"
+ *   Moves fields which don't exist in the schema into the 
"indexableExtras" field
+ *   Moves fields which don't exist in the schema and have the suffix 
"_noIndex" into the "unindexableExtras"

Review Comment:
   Oh it's already configurable through 
[JsonLogTransformerConfig](https://github.com/apache/pinot/pull/11210/files#diff-8a155cf3e668ba08fe3d5063d0ff76e54b83abace4f5fefbab93ce3bf2edf93cR38).
 I added some text to the Javadoc indicating it's configurable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] kirkrodrigues commented on a diff in pull request #11210: Add JsonLogTransformer to transform semi-structured log events to fit a table's schema without dropping fields.

2023-08-10 Thread via GitHub


kirkrodrigues commented on code in PR #11210:
URL: https://github.com/apache/pinot/pull/11210#discussion_r1290343214


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/recordtransformer/JsonLogTransformer.java:
##
@@ -0,0 +1,534 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.recordtransformer;
+
+import com.fasterxml.jackson.core.JsonProcessingException;
+import com.google.common.base.Preconditions;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.config.table.ingestion.JsonLogTransformerConfig;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.stream.StreamDataDecoderImpl;
+import org.apache.pinot.spi.utils.JsonUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/**
+ * This transformer transforms a record representing a JSON log event such 
that it can be stored in a table. JSON log
+ * events typically have a user-defined schema, so it is impractical to store 
each field in its own table column. At the
+ * same time, most (if not all) fields are important to the user, so we should 
not drop any field unnecessarily. Thus,
+ * this transformer primarily takes record-fields that don't exist in the 
schema and stores them in a type of catchall
+ * field.
+ * 
+ * For example, consider this log event:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags": {
+ * "platform": "data",
+ * "service": "serializer",
+ * "params": {
+ *   "queueLength": 5,
+ *   "timeout": 299,
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * And let's say the table's schema contains these fields:
+ * 
+ *   timestamp
+ *   hostname
+ *   level
+ *   message
+ *   tags.platform
+ *   tags.service
+ *   indexableExtras
+ *   unindexableExtras
+ * 
+ * 
+ * Without this transformer, the entire "tags" field would be dropped when 
storing the record in the table. However,
+ * with this transformer, the record would be transformed into the following:
+ * 
+ * {
+ *   "timestamp": 1687786535928,
+ *   "hostname": "host1",
+ *   "level": "INFO",
+ *   "message": "Started processing job1",
+ *   "tags.platform": "data",
+ *   "tags.service": "serializer",
+ *   "indexableExtras": {
+ * "tags": {
+ *   "params": {
+ * "queueLength": 5,
+ * "timeout": 299
+ *   }
+ * }
+ *   },
+ *   "unindexableExtras": {
+ * "tags": {
+ *   "userData_noIndex": {
+ * "nth": 99
+ *   }
+ * }
+ *   }
+ * }
+ * 
+ * Notice that the transformer:
+ * 
+ *   Flattens nested fields which exist in the schema, like 
"tags.platform"
+ *   Moves fields which don't exist in the schema into the 
"indexableExtras" field
+ *   Moves fields which don't exist in the schema and have the suffix 
"_noIndex" into the "unindexableExtras"
+ *   field
+ * 
+ * 
+ * The "unindexableExtras" field allows the transformer to separate fields 
which don't need indexing (because they are
+ * only retrieved, not searched) from those that do. The transformer also has 
other configuration options specified in
+ * {@link JsonLogTransformerConfig}.
+ * 
+ * One notable complication that this class handles is adding nested fields to 
the "extras" fields. E.g., consider
+ * this record
+ * 
+ * {
+ *   a: {
+ * b: {
+ *   c: 0,
+ *   d: 1
+ * }
+ *   }
+ * }
+ * 
+ * Assume "$.a.b.c" exists in the schema but "$.a.b.d" doesn't. This class 
processes the record recursively from the
+ * root node to the children, so it would only know that "$.a.b.d" doesn't 
exist when it gets to "d". At this point we
+ * need to add "d" and all of its parents to the indexableExtrasField. To do 

[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11262: WIP: Vector data type in Pinot

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11262:
URL: https://github.com/apache/pinot/pull/11262#discussion_r1290417171


##
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/Vector.java:
##
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import java.util.Arrays;
+
+
+public class Vector implements Comparable {
+  public enum VectorType {
+FLOAT, INT

Review Comment:
   Let's also add BYTE



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] sajjad-moradi commented on issue #11276: Support for storage quota for realtime tables

2023-08-10 Thread via GitHub


sajjad-moradi commented on issue #11276:
URL: https://github.com/apache/pinot/issues/11276#issuecomment-1673575018

   > Support for quota for ingestion is best done through ingestion rate quota, 
which is already supported. Coupled with retention, this config can help 
determine how much data is stored for a table.
   > 
   > Storage quota cannot be really enforced if the ingestion rate starts to 
increase.
   
   I think this is very use case dependent. If the queries issued to the 
real-time table need to scan data over a fixed period (e.g., the last three 
months), it's imperative that the time retention of the table to exceed that 
window. This would make storage-based retention configurations less applicable 
for such a table. On the other hand, if a table doesn't have strict query-time 
requirements, a storage-based quota could indeed be a good fit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11262: WIP: Vector data type in Pinot

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11262:
URL: https://github.com/apache/pinot/pull/11262#discussion_r1290436300


##
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/Vector.java:
##
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import java.util.Arrays;
+
+
+public class Vector implements Comparable {
+  public enum VectorType {
+FLOAT, INT
+  }
+
+  private int _dimension;
+  private float[] _floatValues;
+  private int[] _intValues;
+  private VectorType _type;
+
+  public Vector(int dimension, float[] values) {
+_dimension = dimension;
+_floatValues = values;
+_type = VectorType.FLOAT;
+  }
+
+  public Vector(int dimension, int[] values) {
+_dimension = dimension;
+_intValues = values;
+_type = VectorType.INT;
+  }
+
+  public int getDimension() {
+return _dimension;
+  }
+
+  public void setDimension(int dimension) {
+_dimension = dimension;
+  }
+
+  @JsonIgnore
+  public float[] getFloatValues() {
+if (_type != VectorType.FLOAT) {
+  throw new IllegalStateException("Vector type is not FLOAT");
+}
+return _floatValues;
+  }
+
+  @JsonIgnore
+  public int[] getIntValues() {
+if (_type != VectorType.INT) {
+  throw new IllegalStateException("Vector type is not INT");
+}
+return _intValues;
+  }
+
+  public void setValues(float[] values) {
+_floatValues = values;
+_type = VectorType.FLOAT;
+  }
+
+  public void setValues(int[] values) {
+_intValues = values;
+_type = VectorType.INT;
+  }
+
+  public VectorType getType() {
+return _type;
+  }
+
+  public static Vector fromString(String value) {
+String[] tokens = value.split(",");
+VectorType vectorType = VectorType.valueOf(tokens[0].toUpperCase());
+int dimension = Integer.parseInt(tokens[1]);
+switch (vectorType) {
+  case FLOAT: {
+float[] result = new float[tokens.length - 2];
+for (int i = 2; i < tokens.length; i++) {
+  result[i - 2] = Float.parseFloat(tokens[i]);
+}
+return new Vector(dimension, result);
+  }
+  case INT: {
+int[] result = new int[tokens.length - 2];
+for (int i = 2; i < tokens.length; i++) {
+  result[i - 2] = Integer.parseInt(tokens[i]);
+}
+return new Vector(dimension, result);
+  }
+  default:
+throw new IllegalArgumentException("Unsupported vector type: " + 
vectorType);
+}
+  }
+
+  public String toString() {
+StringBuilder sb = new StringBuilder();
+sb.append(_type);
+sb.append(",");
+sb.append(_dimension);
+sb.append(",");
+switch (_type) {
+  case FLOAT:
+for (int i = 0; i < _floatValues.length; i++) {
+  sb.append(_floatValues[i]);
+  if (i < _floatValues.length - 1) {
+sb.append(",");
+  }
+}
+break;
+  case INT:
+for (int i = 0; i < _intValues.length; i++) {
+  sb.append(_intValues[i]);
+  if (i < _intValues.length - 1) {
+sb.append(",");
+  }
+}
+break;
+  default:
+throw new IllegalArgumentException("Unsupported vector type: " + 
_type);
+}
+return sb.toString();
+  }
+
+  public byte[] toBytes() {
+int size = _type == VectorType.FLOAT ? _floatValues.length : 
_intValues.length;
+byte[] result = new byte[5 + 4 * size]; // 1 byte for type, 4 for dimension
+int offset = 0;
+result[offset++] = (byte) (_type == VectorType.FLOAT ? 0 : 1);
+int intBits = _dimension;

Review Comment:
   this can be var length



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11262: WIP: Vector data type in Pinot

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11262:
URL: https://github.com/apache/pinot/pull/11262#discussion_r1290436300


##
pinot-spi/src/main/java/org/apache/pinot/spi/data/readers/Vector.java:
##
@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data.readers;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+import java.util.Arrays;
+
+
+public class Vector implements Comparable {
+  public enum VectorType {
+FLOAT, INT
+  }
+
+  private int _dimension;
+  private float[] _floatValues;
+  private int[] _intValues;
+  private VectorType _type;
+
+  public Vector(int dimension, float[] values) {
+_dimension = dimension;
+_floatValues = values;
+_type = VectorType.FLOAT;
+  }
+
+  public Vector(int dimension, int[] values) {
+_dimension = dimension;
+_intValues = values;
+_type = VectorType.INT;
+  }
+
+  public int getDimension() {
+return _dimension;
+  }
+
+  public void setDimension(int dimension) {
+_dimension = dimension;
+  }
+
+  @JsonIgnore
+  public float[] getFloatValues() {
+if (_type != VectorType.FLOAT) {
+  throw new IllegalStateException("Vector type is not FLOAT");
+}
+return _floatValues;
+  }
+
+  @JsonIgnore
+  public int[] getIntValues() {
+if (_type != VectorType.INT) {
+  throw new IllegalStateException("Vector type is not INT");
+}
+return _intValues;
+  }
+
+  public void setValues(float[] values) {
+_floatValues = values;
+_type = VectorType.FLOAT;
+  }
+
+  public void setValues(int[] values) {
+_intValues = values;
+_type = VectorType.INT;
+  }
+
+  public VectorType getType() {
+return _type;
+  }
+
+  public static Vector fromString(String value) {
+String[] tokens = value.split(",");
+VectorType vectorType = VectorType.valueOf(tokens[0].toUpperCase());
+int dimension = Integer.parseInt(tokens[1]);
+switch (vectorType) {
+  case FLOAT: {
+float[] result = new float[tokens.length - 2];
+for (int i = 2; i < tokens.length; i++) {
+  result[i - 2] = Float.parseFloat(tokens[i]);
+}
+return new Vector(dimension, result);
+  }
+  case INT: {
+int[] result = new int[tokens.length - 2];
+for (int i = 2; i < tokens.length; i++) {
+  result[i - 2] = Integer.parseInt(tokens[i]);
+}
+return new Vector(dimension, result);
+  }
+  default:
+throw new IllegalArgumentException("Unsupported vector type: " + 
vectorType);
+}
+  }
+
+  public String toString() {
+StringBuilder sb = new StringBuilder();
+sb.append(_type);
+sb.append(",");
+sb.append(_dimension);
+sb.append(",");
+switch (_type) {
+  case FLOAT:
+for (int i = 0; i < _floatValues.length; i++) {
+  sb.append(_floatValues[i]);
+  if (i < _floatValues.length - 1) {
+sb.append(",");
+  }
+}
+break;
+  case INT:
+for (int i = 0; i < _intValues.length; i++) {
+  sb.append(_intValues[i]);
+  if (i < _intValues.length - 1) {
+sb.append(",");
+  }
+}
+break;
+  default:
+throw new IllegalArgumentException("Unsupported vector type: " + 
_type);
+}
+return sb.toString();
+  }
+
+  public byte[] toBytes() {
+int size = _type == VectorType.FLOAT ? _floatValues.length : 
_intValues.length;
+byte[] result = new byte[5 + 4 * size]; // 1 byte for type, 4 for dimension
+int offset = 0;
+result[offset++] = (byte) (_type == VectorType.FLOAT ? 0 : 1);
+int intBits = _dimension;

Review Comment:
   this can be var lenght



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org


[pinot] annotated tag release-1.0.0-rc0 updated (2b5f2575e0 -> 638646e1dd)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to annotated tag release-1.0.0-rc0
in repository https://gitbox.apache.org/repos/asf/pinot.git


*** WARNING: tag release-1.0.0-rc0 was modified! ***

from 2b5f2575e0 (commit)
  to 638646e1dd (tag)
 tagging 2b5f2575e0f46fcfe914223214ac63bddbcb728a (commit)
  by Saurabh Dubey
  on Thu Aug 10 22:58:32 2023 +0530

- Log -
[maven-release-plugin] copy for tag release-1.0.0-rc0
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch release-1.0.0-rc updated: [maven-release-plugin] prepare for next development iteration

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a commit to branch release-1.0.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


The following commit(s) were added to refs/heads/release-1.0.0-rc by this push:
 new 598eeb740d [maven-release-plugin] prepare for next development 
iteration
598eeb740d is described below

commit 598eeb740d118b397da4aa1d3ead45335bc05957
Author: Saurabh Dubey 
AuthorDate: Thu Aug 10 22:58:37 2023 +0530

[maven-release-plugin] prepare for next development iteration
---
 contrib/pinot-fmpp-maven-plugin/pom.xml   | 2 +-
 pinot-broker/pom.xml  | 2 +-
 pinot-clients/pinot-java-client/pom.xml   | 2 +-
 pinot-clients/pinot-jdbc-client/pom.xml   | 2 +-
 pinot-clients/pom.xml | 2 +-
 pinot-common/pom.xml  | 2 +-
 pinot-compatibility-verifier/pom.xml  | 2 +-
 pinot-connectors/pinot-flink-connector/pom.xml| 2 +-
 pinot-connectors/pinot-spark-2-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-3-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-common/pom.xml   | 2 +-
 pinot-connectors/pom.xml  | 2 +-
 pinot-controller/pom.xml  | 2 +-
 pinot-core/pom.xml| 2 +-
 pinot-distribution/pom.xml| 2 +-
 pinot-integration-test-base/pom.xml   | 2 +-
 pinot-integration-tests/pom.xml   | 2 +-
 pinot-minion/pom.xml  | 2 +-
 pinot-perf/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-common/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-standalone/pom.xml| 2 +-
 pinot-plugins/pinot-batch-ingestion/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pinot-azure/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pom.xml   | 2 +-
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml | 2 +-
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml  | 2 +-
 pinot-plugins/pinot-file-system/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro-base/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-clp-log/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-confluent-avro/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-json/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-parquet/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-protobuf/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-thrift/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-dropwizard/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pom.xml  | 2 +-
 .../pinot-segment-uploader/pinot-segment-uploader-default/pom.xml | 2 +-
 pinot-plugins/pinot-segment-uploader/pom.xml  | 2 +-
 .../pinot-segment-writer/pinot-segment-writer-file-based/pom.xml  | 2 +-
 pinot-plugins/pinot-segment-writer/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/pom.xml  | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-base/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kinesis/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-pulsar/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pom.xml 

[GitHub] [pinot] zhtaoxiang commented on a diff in pull request #11315: Enhance Minion task management

2023-08-10 Thread via GitHub


zhtaoxiang commented on code in PR #11315:
URL: https://github.com/apache/pinot/pull/11315#discussion_r1290464365


##
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java:
##
@@ -326,27 +324,35 @@ public synchronized Set getTasks(String taskType) 
{
 
   /**
* Get all task states for the given task type.
+   * NOTE: For tasks just submitted without the context created, count them as 
NOT_STARTED.
*
* @param taskType Task type
* @return Map from task name to task state
*/
   public synchronized Map getTaskStates(String taskType) {
-Map helixJobStates = new HashMap<>();
+WorkflowConfig workflowConfig = 
_taskDriver.getWorkflowConfig(getHelixJobQueueName(taskType));
+if (workflowConfig == null) {
+  return Collections.emptyMap();
+}
+Set helixJobs = workflowConfig.getJobDag().getAllNodes();
+if (helixJobs.isEmpty()) {
+  return Collections.emptyMap();
+}
 WorkflowContext workflowContext = 
_taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
-
 if (workflowContext == null) {
-  return helixJobStates;
-}
-helixJobStates = workflowContext.getJobStates();
-Map taskStates = new HashMap<>(helixJobStates.size());
-for (Map.Entry entry : helixJobStates.entrySet()) {
-  taskStates.put(getPinotTaskName(entry.getKey()), entry.getValue());
+  return helixJobs.stream()
+  
.collect(Collectors.toMap(PinotHelixTaskResourceManager::getPinotTaskName, 
ignored -> TaskState.NOT_STARTED));
+} else {
+  Map helixJobStates = workflowContext.getJobStates();
+  return 
helixJobs.stream().collect(Collectors.toMap(PinotHelixTaskResourceManager::getPinotTaskName,
+  helixJobName -> helixJobStates.getOrDefault(helixJobName, 
TaskState.NOT_STARTED)));
 }
-return taskStates;
   }
 
   /**
* This method returns a count of sub-tasks in various states, given the 
top-level task name.
+   * TODO: It doesn't count tasks just submitted without the context created.

Review Comment:
   It seems to me this PR is a breaking change without fixing this TODO and 
other TODOs.
   
   If a user relies on those APIs to determine whether tasks are successfully 
submitted and scheduled, they may get a wrong signal and decide to resubmit the 
task. Although we can make built-in minion tasks to be idempotent, but we don't 
know if the user is using a customized minion task and whether they are 
idempotent or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch release-1.0.0-rc created (now 2b5f2575e0)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to branch release-1.0.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


  at 2b5f2575e0 [maven-release-plugin] prepare release release-1.0.0-rc0

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch masterPomBump created (now 873ecd4647)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to branch masterPomBump
in repository https://gitbox.apache.org/repos/asf/pinot.git


  at 873ecd4647 [maven-release-plugin] prepare for next development 
iteration

This branch includes the following new commits:

 new 873ecd4647 [maven-release-plugin] prepare for next development 
iteration

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] 01/01: [maven-release-plugin] prepare for next development iteration

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a commit to branch masterPomBump
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit 873ecd46478374bd32440fef19c50298e94c4d75
Author: Saurabh Dubey 
AuthorDate: Thu Aug 10 22:58:37 2023 +0530

[maven-release-plugin] prepare for next development iteration
---
 contrib/pinot-fmpp-maven-plugin/pom.xml   | 2 +-
 pinot-broker/pom.xml  | 2 +-
 pinot-clients/pinot-java-client/pom.xml   | 4 
 pinot-clients/pinot-jdbc-client/pom.xml   | 4 
 pinot-clients/pom.xml | 2 +-
 pinot-common/pom.xml  | 2 +-
 pinot-compatibility-verifier/pom.xml  | 2 +-
 pinot-connectors/pinot-flink-connector/pom.xml| 2 +-
 pinot-connectors/pinot-spark-2-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-3-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-common/pom.xml   | 2 +-
 pinot-connectors/pom.xml  | 2 +-
 pinot-controller/pom.xml  | 2 +-
 pinot-core/pom.xml| 2 +-
 pinot-distribution/pom.xml| 2 +-
 pinot-integration-test-base/pom.xml   | 2 +-
 pinot-integration-tests/pom.xml   | 2 +-
 pinot-minion/pom.xml  | 2 +-
 pinot-perf/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-common/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-standalone/pom.xml| 2 +-
 pinot-plugins/pinot-batch-ingestion/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pinot-azure/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pom.xml   | 2 +-
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml | 2 +-
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml  | 2 +-
 pinot-plugins/pinot-file-system/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro-base/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-clp-log/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-confluent-avro/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-json/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-parquet/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-protobuf/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-thrift/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-dropwizard/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pom.xml  | 2 +-
 .../pinot-segment-uploader/pinot-segment-uploader-default/pom.xml | 2 +-
 pinot-plugins/pinot-segment-uploader/pom.xml  | 2 +-
 .../pinot-segment-writer/pinot-segment-writer-file-based/pom.xml  | 2 +-
 pinot-plugins/pinot-segment-writer/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/pom.xml  | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-base/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kinesis/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-pulsar/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pom.xml  | 2 +-
 pinot-plugins/pom.xml | 2 +-
 pinot-query-planner/pom.xml   | 2 +-
 pinot-query-runtime/pom.x

[pinot] 01/01: [maven-release-plugin] prepare for next development iteration

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a commit to branch masterPomBump
in repository https://gitbox.apache.org/repos/asf/pinot.git

commit a1ffd912f838b96af9a54bbf65fb1bd982b05505
Author: Saurabh Dubey 
AuthorDate: Thu Aug 10 22:58:37 2023 +0530

[maven-release-plugin] prepare for next development iteration
---
 contrib/pinot-fmpp-maven-plugin/pom.xml | 2 +-
 pinot-broker/pom.xml| 2 +-
 pinot-clients/pinot-java-client/pom.xml | 2 +-
 pinot-clients/pinot-jdbc-client/pom.xml | 2 +-
 pinot-clients/pom.xml   | 2 +-
 pinot-common/pom.xml| 2 +-
 pinot-compatibility-verifier/pom.xml| 2 +-
 pinot-connectors/pinot-flink-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-2-connector/pom.xml| 2 +-
 pinot-connectors/pinot-spark-3-connector/pom.xml| 2 +-
 pinot-connectors/pinot-spark-common/pom.xml | 2 +-
 pinot-connectors/pom.xml| 2 +-
 pinot-controller/pom.xml| 2 +-
 pinot-core/pom.xml  | 2 +-
 pinot-distribution/pom.xml  | 2 +-
 pinot-integration-test-base/pom.xml | 2 +-
 pinot-integration-tests/pom.xml | 2 +-
 pinot-minion/pom.xml| 2 +-
 pinot-perf/pom.xml  | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-common/pom.xml  | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pom.xml  | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pom.xml   | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pom.xml   | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-standalone/pom.xml  | 2 +-
 pinot-plugins/pinot-batch-ingestion/pom.xml | 2 +-
 pinot-plugins/pinot-environment/pinot-azure/pom.xml | 2 +-
 pinot-plugins/pinot-environment/pom.xml | 2 +-
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml  | 2 +-
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml   | 2 +-
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml  | 2 +-
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro-base/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-avro/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-clp-log/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-confluent-avro/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-json/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-parquet/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-protobuf/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-thrift/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pom.xml| 2 +-
 pinot-plugins/pinot-metrics/pinot-dropwizard/pom.xml| 2 +-
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml| 2 +-
 pinot-plugins/pinot-metrics/pom.xml | 2 +-
 pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/pom.xml | 2 +-
 pinot-plugins/pinot-minion-tasks/pom.xml| 2 +-
 .../pinot-segment-uploader/pinot-segment-uploader-default/pom.xml   | 2 +-
 pinot-plugins/pinot-segment-uploader/pom.xml| 2 +-
 .../pinot-segment-writer/pinot-segment-writer-file-based/pom.xml| 2 +-
 pinot-plugins/pinot-segment-writer/pom.xml  | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-base/pom.xml   | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kinesis/pom.xml  | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-pulsar/pom.xml   | 2 +-
 pinot-plugins/pinot-stream-ingestion/pom.xml| 2 +-
 pinot-plugins/pom.xml  

[pinot] branch masterPomBump updated (873ecd4647 -> a1ffd912f8)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to branch masterPomBump
in repository https://gitbox.apache.org/repos/asf/pinot.git


 discard 873ecd4647 [maven-release-plugin] prepare for next development 
iteration
 new a1ffd912f8 [maven-release-plugin] prepare for next development 
iteration

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (873ecd4647)
\
 N -- N -- N   refs/heads/masterPomBump (a1ffd912f8)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 pinot-clients/pinot-java-client/pom.xml | 4 
 pinot-clients/pinot-jdbc-client/pom.xml | 4 
 2 files changed, 8 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch master updated (ee4044c54b -> 0addf8138e)

2023-08-10 Thread siddteotia
This is an automated email from the ASF dual-hosted git repository.

siddteotia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


from ee4044c54b Added support for COMMA join (#11312)
 add 0addf8138e Add instrumentation for dcount merge. Release bitmaps for 
and/or doc id set to save memory. `performance` (#11200)

No new revisions were added by this update.

Summary of changes:
 .../pinot/core/operator/docidsets/AndDocIdSet.java | 26 +-
 .../pinot/core/operator/docidsets/OrDocIdSet.java  | 26 +-
 .../BaseDistinctAggregateAggregationFunction.java  |  4 
 3 files changed, 44 insertions(+), 12 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch release-1.0.0-rc updated (598eeb740d -> 6fa426886e)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to branch release-1.0.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


 discard 598eeb740d [maven-release-plugin] prepare for next development 
iteration
 discard 2b5f2575e0 [maven-release-plugin] prepare release release-1.0.0-rc0

This update removed existing revisions from the reference, leaving the
reference pointing at a previous point in the repository history.

 * -- * -- N   refs/heads/release-1.0.0-rc (6fa426886e)
\
 O -- O -- O   (598eeb740d)

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

No new revisions were added by this update.

Summary of changes:
 contrib/pinot-fmpp-maven-plugin/pom.xml|  6 ++-
 pinot-broker/pom.xml   |  5 ++-
 pinot-clients/pinot-java-client/pom.xml|  5 ++-
 pinot-clients/pinot-jdbc-client/pom.xml|  8 ++--
 pinot-clients/pom.xml  |  6 ++-
 pinot-common/pom.xml   | 43 +++---
 pinot-compatibility-verifier/pom.xml   |  5 ++-
 pinot-connectors/pinot-flink-connector/pom.xml |  8 ++--
 pinot-connectors/pinot-spark-2-connector/pom.xml   |  5 ++-
 pinot-connectors/pinot-spark-3-connector/pom.xml   |  5 ++-
 pinot-connectors/pinot-spark-common/pom.xml|  5 ++-
 pinot-connectors/pom.xml   |  6 ++-
 pinot-controller/pom.xml   |  5 ++-
 pinot-core/pom.xml |  5 ++-
 pinot-distribution/pom.xml |  7 ++--
 pinot-integration-test-base/pom.xml|  5 ++-
 pinot-integration-tests/pom.xml|  5 ++-
 pinot-minion/pom.xml   |  5 ++-
 pinot-perf/pom.xml |  5 ++-
 .../pinot-batch-ingestion-common/pom.xml   |  6 ++-
 .../pinot-batch-ingestion-hadoop/pom.xml   |  6 ++-
 .../pinot-batch-ingestion-spark-2.4/pom.xml|  6 ++-
 .../pinot-batch-ingestion-spark-3.2/pom.xml|  6 ++-
 .../pinot-batch-ingestion-standalone/pom.xml   |  6 ++-
 pinot-plugins/pinot-batch-ingestion/pom.xml|  6 ++-
 .../pinot-environment/pinot-azure/pom.xml  |  6 ++-
 pinot-plugins/pinot-environment/pom.xml|  6 ++-
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml |  5 ++-
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml  |  6 ++-
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml |  5 ++-
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml   |  9 +++--
 pinot-plugins/pinot-file-system/pom.xml|  6 ++-
 .../pinot-input-format/pinot-avro-base/pom.xml |  5 ++-
 .../pinot-input-format/pinot-avro/pom.xml  |  5 ++-
 .../pinot-input-format/pinot-clp-log/pom.xml   |  5 ++-
 .../pinot-confluent-avro/pom.xml   |  5 ++-
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml |  5 ++-
 .../pinot-input-format/pinot-json/pom.xml  |  5 ++-
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml |  6 ++-
 .../pinot-input-format/pinot-parquet/pom.xml   |  5 ++-
 .../pinot-input-format/pinot-protobuf/pom.xml  |  5 ++-
 .../pinot-input-format/pinot-thrift/pom.xml|  5 ++-
 pinot-plugins/pinot-input-format/pom.xml   |  6 ++-
 .../pinot-metrics/pinot-dropwizard/pom.xml |  5 ++-
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml   |  5 ++-
 pinot-plugins/pinot-metrics/pom.xml|  6 ++-
 .../pinot-minion-builtin-tasks/pom.xml |  5 ++-
 pinot-plugins/pinot-minion-tasks/pom.xml   |  6 ++-
 .../pinot-segment-uploader-default/pom.xml |  6 ++-
 pinot-plugins/pinot-segment-uploader/pom.xml   |  6 ++-
 .../pinot-segment-writer-file-based/pom.xml|  6 ++-
 pinot-plugins/pinot-segment-writer/pom.xml |  6 ++-
 .../pinot-stream-ingestion/pinot-kafka-2.0/pom.xml |  6 ++-
 .../pinot-kafka-base/pom.xml   |  6 ++-
 .../pinot-stream-ingestion/pinot-kinesis/pom.xml   | 12 --
 .../pinot-stream-ingestion/pinot-pulsar/pom.xml|  6 ++-
 pinot-plugins/pinot-stream-ingestion/pom.xml   |  6 ++-
 pinot-plugins/pom.xml  | 11 --
 pinot-query-planner/pom.xml|  5 ++-
 pinot-query-runtime/pom.xml|  5 ++-
 pinot-segment-local/pom.xml|  5 ++-
 pinot-segment-spi/pom.xml  |  5 ++-
 pinot-server/pom.xml   |  5 ++-
 pinot-spi/pom.xml  |  5 ++-
 pinot-tools/pom.xml|  5 ++-
 pom.xml| 11 +++---
 66 files changed, 262 insertions(+), 162 deletions(-)


-
To unsubscribe, e-mai

[pinot] annotated tag release-1.0.0-rc0 deleted (was 638646e1dd)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to annotated tag release-1.0.0-rc0
in repository https://gitbox.apache.org/repos/asf/pinot.git


*** WARNING: tag release-1.0.0-rc0 was deleted! ***

   tag was  638646e1dd

This change permanently discards the following revisions:

 discard 2b5f2575e0 [maven-release-plugin] prepare release release-1.0.0-rc0


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch release-1.0.0-rc updated (6fa426886e -> 91497773ea)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to branch release-1.0.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


from 6fa426886e Add projection operator utils to unify and allow 
plugability (#11291)
 add 91497773ea [maven-release-plugin] prepare release release-1.0.0-rc0

No new revisions were added by this update.

Summary of changes:
 contrib/pinot-fmpp-maven-plugin/pom.xml|  6 +--
 pinot-broker/pom.xml   |  5 +--
 pinot-clients/pinot-java-client/pom.xml|  5 +--
 pinot-clients/pinot-jdbc-client/pom.xml|  8 ++--
 pinot-clients/pom.xml  |  6 +--
 pinot-common/pom.xml   | 43 +++---
 pinot-compatibility-verifier/pom.xml   |  5 +--
 pinot-connectors/pinot-flink-connector/pom.xml |  8 ++--
 pinot-connectors/pinot-spark-2-connector/pom.xml   |  5 +--
 pinot-connectors/pinot-spark-3-connector/pom.xml   |  5 +--
 pinot-connectors/pinot-spark-common/pom.xml|  5 +--
 pinot-connectors/pom.xml   |  6 +--
 pinot-controller/pom.xml   |  5 +--
 pinot-core/pom.xml |  5 +--
 pinot-distribution/pom.xml |  7 ++--
 pinot-integration-test-base/pom.xml|  5 +--
 pinot-integration-tests/pom.xml|  5 +--
 pinot-minion/pom.xml   |  5 +--
 pinot-perf/pom.xml |  5 +--
 .../pinot-batch-ingestion-common/pom.xml   |  6 +--
 .../pinot-batch-ingestion-hadoop/pom.xml   |  6 +--
 .../pinot-batch-ingestion-spark-2.4/pom.xml|  6 +--
 .../pinot-batch-ingestion-spark-3.2/pom.xml|  6 +--
 .../pinot-batch-ingestion-standalone/pom.xml   |  6 +--
 pinot-plugins/pinot-batch-ingestion/pom.xml|  6 +--
 .../pinot-environment/pinot-azure/pom.xml  |  6 +--
 pinot-plugins/pinot-environment/pom.xml|  6 +--
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml |  5 +--
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml  |  6 +--
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml |  5 +--
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml   |  9 ++---
 pinot-plugins/pinot-file-system/pom.xml|  6 +--
 .../pinot-input-format/pinot-avro-base/pom.xml |  5 +--
 .../pinot-input-format/pinot-avro/pom.xml  |  5 +--
 .../pinot-input-format/pinot-clp-log/pom.xml   |  5 +--
 .../pinot-confluent-avro/pom.xml   |  5 +--
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml |  5 +--
 .../pinot-input-format/pinot-json/pom.xml  |  5 +--
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml |  6 +--
 .../pinot-input-format/pinot-parquet/pom.xml   |  5 +--
 .../pinot-input-format/pinot-protobuf/pom.xml  |  5 +--
 .../pinot-input-format/pinot-thrift/pom.xml|  5 +--
 pinot-plugins/pinot-input-format/pom.xml   |  6 +--
 .../pinot-metrics/pinot-dropwizard/pom.xml |  5 +--
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml   |  5 +--
 pinot-plugins/pinot-metrics/pom.xml|  6 +--
 .../pinot-minion-builtin-tasks/pom.xml |  5 +--
 pinot-plugins/pinot-minion-tasks/pom.xml   |  6 +--
 .../pinot-segment-uploader-default/pom.xml |  6 +--
 pinot-plugins/pinot-segment-uploader/pom.xml   |  6 +--
 .../pinot-segment-writer-file-based/pom.xml|  6 +--
 pinot-plugins/pinot-segment-writer/pom.xml |  6 +--
 .../pinot-stream-ingestion/pinot-kafka-2.0/pom.xml |  6 +--
 .../pinot-kafka-base/pom.xml   |  6 +--
 .../pinot-stream-ingestion/pinot-kinesis/pom.xml   | 12 ++
 .../pinot-stream-ingestion/pinot-pulsar/pom.xml|  6 +--
 pinot-plugins/pinot-stream-ingestion/pom.xml   |  6 +--
 pinot-plugins/pom.xml  | 11 ++
 pinot-query-planner/pom.xml|  5 +--
 pinot-query-runtime/pom.xml|  5 +--
 pinot-segment-local/pom.xml|  5 +--
 pinot-segment-spi/pom.xml  |  5 +--
 pinot-server/pom.xml   |  5 +--
 pinot-spi/pom.xml  |  5 +--
 pinot-tools/pom.xml|  5 +--
 pom.xml| 13 +++
 66 files changed, 163 insertions(+), 263 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch release-1.0.0-rc updated: [maven-release-plugin] prepare for next development iteration

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a commit to branch release-1.0.0-rc
in repository https://gitbox.apache.org/repos/asf/pinot.git


The following commit(s) were added to refs/heads/release-1.0.0-rc by this push:
 new f12930da8b [maven-release-plugin] prepare for next development 
iteration
f12930da8b is described below

commit f12930da8b5a05808036e71f783a519f25e166f3
Author: Saurabh Dubey 
AuthorDate: Fri Aug 11 00:10:54 2023 +0530

[maven-release-plugin] prepare for next development iteration
---
 contrib/pinot-fmpp-maven-plugin/pom.xml   | 2 +-
 pinot-broker/pom.xml  | 2 +-
 pinot-clients/pinot-java-client/pom.xml   | 2 +-
 pinot-clients/pinot-jdbc-client/pom.xml   | 2 +-
 pinot-clients/pom.xml | 2 +-
 pinot-common/pom.xml  | 2 +-
 pinot-compatibility-verifier/pom.xml  | 2 +-
 pinot-connectors/pinot-flink-connector/pom.xml| 2 +-
 pinot-connectors/pinot-spark-2-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-3-connector/pom.xml  | 2 +-
 pinot-connectors/pinot-spark-common/pom.xml   | 2 +-
 pinot-connectors/pom.xml  | 2 +-
 pinot-controller/pom.xml  | 2 +-
 pinot-core/pom.xml| 2 +-
 pinot-distribution/pom.xml| 2 +-
 pinot-integration-test-base/pom.xml   | 2 +-
 pinot-integration-tests/pom.xml   | 2 +-
 pinot-minion/pom.xml  | 2 +-
 pinot-perf/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-common/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pom.xml| 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-2.4/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-spark-3.2/pom.xml | 2 +-
 .../pinot-batch-ingestion/pinot-batch-ingestion-standalone/pom.xml| 2 +-
 pinot-plugins/pinot-batch-ingestion/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pinot-azure/pom.xml   | 2 +-
 pinot-plugins/pinot-environment/pom.xml   | 2 +-
 pinot-plugins/pinot-file-system/pinot-adls/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-gcs/pom.xml | 2 +-
 pinot-plugins/pinot-file-system/pinot-hdfs/pom.xml| 2 +-
 pinot-plugins/pinot-file-system/pinot-s3/pom.xml  | 2 +-
 pinot-plugins/pinot-file-system/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro-base/pom.xml  | 2 +-
 pinot-plugins/pinot-input-format/pinot-avro/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-clp-log/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-confluent-avro/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pinot-csv/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-json/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-orc/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-parquet/pom.xml| 2 +-
 pinot-plugins/pinot-input-format/pinot-protobuf/pom.xml   | 2 +-
 pinot-plugins/pinot-input-format/pinot-thrift/pom.xml | 2 +-
 pinot-plugins/pinot-input-format/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-dropwizard/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pinot-yammer/pom.xml  | 2 +-
 pinot-plugins/pinot-metrics/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/pom.xml   | 2 +-
 pinot-plugins/pinot-minion-tasks/pom.xml  | 2 +-
 .../pinot-segment-uploader/pinot-segment-uploader-default/pom.xml | 2 +-
 pinot-plugins/pinot-segment-uploader/pom.xml  | 2 +-
 .../pinot-segment-writer/pinot-segment-writer-file-based/pom.xml  | 2 +-
 pinot-plugins/pinot-segment-writer/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/pom.xml  | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kafka-base/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-kinesis/pom.xml| 2 +-
 pinot-plugins/pinot-stream-ingestion/pinot-pulsar/pom.xml | 2 +-
 pinot-plugins/pinot-stream-ingestion/pom.xml 

[pinot] annotated tag release-1.0-rc0 deleted (was 6da0ff3571)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to annotated tag release-1.0-rc0
in repository https://gitbox.apache.org/repos/asf/pinot.git


*** WARNING: tag release-1.0-rc0 was deleted! ***

   tag was  6da0ff3571

The revisions that were on this annotated tag are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch dependabot/maven/org.codehaus.mojo-versions-maven-plugin-2.16.0 deleted (was 049fc89079)

2023-08-10 Thread jackie
This is an automated email from the ASF dual-hosted git repository.

jackie pushed a change to branch 
dependabot/maven/org.codehaus.mojo-versions-maven-plugin-2.16.0
in repository https://gitbox.apache.org/repos/asf/pinot.git


 was 049fc89079 Bump org.codehaus.mojo:versions-maven-plugin from 2.13.0 to 
2.16.0

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] annotated tag release-1.0.0-rc0 updated (91497773ea -> f8248d8f23)

2023-08-10 Thread saurabhd336
This is an automated email from the ASF dual-hosted git repository.

saurabhd336 pushed a change to annotated tag release-1.0.0-rc0
in repository https://gitbox.apache.org/repos/asf/pinot.git


*** WARNING: tag release-1.0.0-rc0 was modified! ***

from 91497773ea (commit)
  to f8248d8f23 (tag)
 tagging 91497773ea7666ebe24755914b541c381f50c0dd (commit)
  by Saurabh Dubey
  on Fri Aug 11 00:10:49 2023 +0530

- Log -
[maven-release-plugin] copy for tag release-1.0.0-rc0
---


No new revisions were added by this update.

Summary of changes:


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch master updated (0addf8138e -> 99a6a04587)

2023-08-10 Thread jackie
This is an automated email from the ASF dual-hosted git repository.

jackie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


from 0addf8138e Add instrumentation for dcount merge. Release bitmaps for 
and/or doc id set to save memory. `performance` (#11200)
 add 99a6a04587 Bump org.codehaus.mojo:versions-maven-plugin from 2.13.0 to 
2.16.0 (#11304)

No new revisions were added by this update.

Summary of changes:
 pom.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang merged pull request #11304: Bump org.codehaus.mojo:versions-maven-plugin from 2.13.0 to 2.16.0

2023-08-10 Thread via GitHub


Jackie-Jiang merged PR #11304:
URL: https://github.com/apache/pinot/pull/11304


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang closed pull request #11305: Bump org.scala-lang:scala-library from 2.11.11 to 2.13.11

2023-08-10 Thread via GitHub


Jackie-Jiang closed pull request #11305: Bump org.scala-lang:scala-library from 
2.11.11 to 2.13.11
URL: https://github.com/apache/pinot/pull/11305


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch dependabot/maven/org.javassist-javassist-3.29.2-GA deleted (was 5bfa4d7549)

2023-08-10 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.javassist-javassist-3.29.2-GA
in repository https://gitbox.apache.org/repos/asf/pinot.git


 was 5bfa4d7549 Bump org.javassist:javassist from 3.19.0-GA to 3.29.2-GA

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang closed pull request #11306: Bump org.javassist:javassist from 3.19.0-GA to 3.29.2-GA

2023-08-10 Thread via GitHub


Jackie-Jiang closed pull request #11306: Bump org.javassist:javassist from 
3.19.0-GA to 3.29.2-GA
URL: https://github.com/apache/pinot/pull/11306


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] dependabot[bot] commented on pull request #11305: Bump org.scala-lang:scala-library from 2.11.11 to 2.13.11

2023-08-10 Thread via GitHub


dependabot[bot] commented on PR #11305:
URL: https://github.com/apache/pinot/pull/11305#issuecomment-1673747594

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch dependabot/maven/org.scala-lang-scala-library-2.13.11 deleted (was c5242f31e9)

2023-08-10 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/org.scala-lang-scala-library-2.13.11
in repository https://gitbox.apache.org/repos/asf/pinot.git


 was c5242f31e9 Bump org.scala-lang:scala-library from 2.11.11 to 2.13.11

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11307: Adding query function override for distinct functions of multi valued column

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #11307:
URL: https://github.com/apache/pinot/pull/11307#discussion_r1290569887


##
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java:
##
@@ -1075,6 +1107,48 @@ private static void 
handleDistinctCountBitmapOverride(PinotQuery pinotQuery) {
 }
   }
 
+  /**
+   * Rewrites selected 'Distinct' prefixed function to 'DistinctMV' 
function for the field of multivalued type.
+   */
+  @VisibleForTesting
+  static void handleDistinctMultiValuedOverride(PinotQuery pinotQuery, 
Set multiValuedDimColumns) {
+for (Expression expression : pinotQuery.getSelectList()) {
+  handleDistinctMultiValuedOverride(expression, multiValuedDimColumns);
+}
+List orderByExpressions = pinotQuery.getOrderByList();
+if (orderByExpressions != null) {
+  for (Expression expression : orderByExpressions) {
+// NOTE: Order-by is always a Function with the ordering of the 
Expression
+
handleDistinctMultiValuedOverride(expression.getFunctionCall().getOperands().get(0),
 multiValuedDimColumns);
+  }
+}
+Expression havingExpression = pinotQuery.getHavingExpression();
+if (havingExpression != null) {
+  handleDistinctMultiValuedOverride(havingExpression, 
multiValuedDimColumns);
+}
+  }
+
+  /**
+   * Rewrites selected 'Distinct' prefixed function to 'DistinctMV' 
function for the field of multivalued type.
+   */
+  private static void handleDistinctMultiValuedOverride(Expression expression, 
Set multiValuedDimColumns) {
+Function function = expression.getFunctionCall();
+if (function == null) {
+  return;
+}
+if (function.getOperator() != null && 
DISTINCT_MV_COL_FUNCTION_OVERRIDE_MAP.containsKey(function.getOperator())) {

Review Comment:
   (nit) operator can never be null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11307: Adding query function override for distinct functions of multi valued column

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #11307:
URL: https://github.com/apache/pinot/pull/11307#discussion_r1290573005


##
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java:
##
@@ -1075,6 +1107,48 @@ private static void 
handleDistinctCountBitmapOverride(PinotQuery pinotQuery) {
 }
   }
 
+  /**
+   * Rewrites selected 'Distinct' prefixed function to 'DistinctMV' 
function for the field of multivalued type.
+   */
+  @VisibleForTesting
+  static void handleDistinctMultiValuedOverride(PinotQuery pinotQuery, 
Set multiValuedDimColumns) {
+for (Expression expression : pinotQuery.getSelectList()) {
+  handleDistinctMultiValuedOverride(expression, multiValuedDimColumns);
+}
+List orderByExpressions = pinotQuery.getOrderByList();
+if (orderByExpressions != null) {
+  for (Expression expression : orderByExpressions) {
+// NOTE: Order-by is always a Function with the ordering of the 
Expression
+
handleDistinctMultiValuedOverride(expression.getFunctionCall().getOperands().get(0),
 multiValuedDimColumns);
+  }
+}
+Expression havingExpression = pinotQuery.getHavingExpression();
+if (havingExpression != null) {
+  handleDistinctMultiValuedOverride(havingExpression, 
multiValuedDimColumns);
+}
+  }
+
+  /**
+   * Rewrites selected 'Distinct' prefixed function to 'DistinctMV' 
function for the field of multivalued type.
+   */
+  private static void handleDistinctMultiValuedOverride(Expression expression, 
Set multiValuedDimColumns) {
+Function function = expression.getFunctionCall();
+if (function == null) {
+  return;
+}
+if (function.getOperator() != null && 
DISTINCT_MV_COL_FUNCTION_OVERRIDE_MAP.containsKey(function.getOperator())) {

Review Comment:
   (minor) We can save one map lookup by directly call `get()` and check if the 
result is `null`



##
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java:
##
@@ -942,6 +950,30 @@ private static Set 
getSegmentPartitionedColumns(@Nullable TableConfig ta
 return segmentPartitionedColumns;
   }
 
+  /**
+   * Retrieve multivalued columns for a table.
+   * From the table Schema , we get the multi valued columns of dimension 
fields.
+   *
+   * @param tableCache
+   * @param tableName
+   * @return multivalued columns of the table .
+   */
+  private static Set getMultiValuedColumns(TableCache tableCache, 
String tableName) {

Review Comment:
   This is not efficient. We can just look up on demand for each column in the 
aggregation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11307: Adding query function override for distinct functions of multi valued column

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #11307:
URL: https://github.com/apache/pinot/pull/11307#discussion_r1290566949


##
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java:
##
@@ -378,6 +385,7 @@ private BrokerResponseNative handleRequest(long requestId, 
String query,
   if (_enableDistinctCountBitmapOverride) {
 handleDistinctCountBitmapOverride(serverPinotQuery);
   }
+  handleDistinctMultiValuedOverride(serverPinotQuery, 
getMultiValuedColumns(_tableCache, tableName));

Review Comment:
   We can get schema first (move line 490 here) and pass schema into it if 
available
   ```
 Schema schema = _tableCache.getSchema(rawTableName);
 if (schema != null) {
   handleAggregationMVOverride(serverPinotQuery, schema);
 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on issue #11278: Support configurable segment balancer strategy

2023-08-10 Thread via GitHub


Jackie-Jiang commented on issue #11278:
URL: https://github.com/apache/pinot/issues/11278#issuecomment-1673770702

   cc @chenboat who is also interested in this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11307: Adding query function override for distinct functions of multi valued column

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #11307:
URL: https://github.com/apache/pinot/pull/11307#discussion_r1290572394


##
pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/BaseBrokerRequestHandler.java:
##
@@ -1075,6 +1107,48 @@ private static void 
handleDistinctCountBitmapOverride(PinotQuery pinotQuery) {
 }
   }
 
+  /**
+   * Rewrites selected 'Distinct' prefixed function to 'DistinctMV' 
function for the field of multivalued type.
+   */
+  @VisibleForTesting
+  static void handleDistinctMultiValuedOverride(PinotQuery pinotQuery, 
Set multiValuedDimColumns) {
+for (Expression expression : pinotQuery.getSelectList()) {
+  handleDistinctMultiValuedOverride(expression, multiValuedDimColumns);
+}
+List orderByExpressions = pinotQuery.getOrderByList();
+if (orderByExpressions != null) {
+  for (Expression expression : orderByExpressions) {
+// NOTE: Order-by is always a Function with the ordering of the 
Expression
+
handleDistinctMultiValuedOverride(expression.getFunctionCall().getOperands().get(0),
 multiValuedDimColumns);
+  }
+}
+Expression havingExpression = pinotQuery.getHavingExpression();
+if (havingExpression != null) {
+  handleDistinctMultiValuedOverride(havingExpression, 
multiValuedDimColumns);
+}
+  }
+
+  /**
+   * Rewrites selected 'Distinct' prefixed function to 'DistinctMV' 
function for the field of multivalued type.
+   */
+  private static void handleDistinctMultiValuedOverride(Expression expression, 
Set multiValuedDimColumns) {
+Function function = expression.getFunctionCall();
+if (function == null) {
+  return;
+}
+if (function.getOperator() != null && 
DISTINCT_MV_COL_FUNCTION_OVERRIDE_MAP.containsKey(function.getOperator())) {
+  List operands = function.getOperands();
+  if (operands.size() == 1 && operands.get(0).isSetIdentifier() && 
multiValuedDimColumns

Review Comment:
   We can allow multiple operands (e.g. `DistinctCountHLL` allows multiple 
operands). We can check if the size is larger than 1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] dependabot[bot] commented on pull request #11306: Bump org.javassist:javassist from 3.19.0-GA to 3.29.2-GA

2023-08-10 Thread via GitHub


dependabot[bot] commented on PR #11306:
URL: https://github.com/apache/pinot/pull/11306#issuecomment-1673742593

   OK, I won't notify you again about this release, but will get in touch when 
a new version is available. If you'd rather skip all updates until the next 
major or minor version, let me know by commenting `@dependabot ignore this 
major version` or `@dependabot ignore this minor version`. You can also ignore 
all major, minor, or patch releases for a dependency by adding an [`ignore` 
condition](https://docs.github.com/en/code-security/supply-chain-security/configuration-options-for-dependency-updates#ignore)
 with the desired `update_types` to your config file.
   
   If you change your mind, just re-open this PR and I'll resolve any conflicts 
on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang merged pull request #11308: Fix GREATEST and LEAST transform functions.

2023-08-10 Thread via GitHub


Jackie-Jiang merged PR #11308:
URL: https://github.com/apache/pinot/pull/11308


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11151: [multistage] Derive SUM return type to be PostgreSQL compatible

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11151:
URL: https://github.com/apache/pinot/pull/11151#discussion_r1290579479


##
pinot-query-planner/src/test/resources/queries/AggregatePlans.json:
##
@@ -21,7 +21,7 @@
 "sql": "EXPLAIN PLAN FOR SELECT AVG(a.col4) as avg, SUM(a.col4) as 
sum, MAX(a.col4) as max FROM a WHERE a.col3 >= 0 AND a.col2 = 'pink floyd'",
 "output": [
   "Execution Plan",
-  "\nLogicalProject(avg=[/(CASE(=($1, 0), null:DECIMAL(1000, 0), $0), 
$1)], sum=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], max=[$2])",
+  "\nLogicalProject(avg=[CAST(/(CASE(=($1, 0), null:DECIMAL(1000, 0), 
$0), $1)):DECIMAL(1000, 0)], sum=[CASE(=($1, 0), null:DECIMAL(1000, 0), $0)], 
max=[$2])",

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] xiangfu0 commented on a diff in pull request #11151: [multistage] Derive SUM return type to be PostgreSQL compatible

2023-08-10 Thread via GitHub


xiangfu0 commented on code in PR #11151:
URL: https://github.com/apache/pinot/pull/11151#discussion_r1290579686


##
pinot-query-planner/src/main/java/org/apache/pinot/query/type/TypeSystem.java:
##
@@ -32,6 +32,9 @@ public class TypeSystem extends RelDataTypeSystemImpl {
   private static final int MAX_DECIMAL_SCALE_DIGIT = 1000;
   private static final int MAX_DECIMAL_PRECISION_DIGIT = 1000;
 
+  private static final int DERIVED_DECIMAL_PRECISION = 19;
+  private static final int DERIVED_DECIMAL_SCALE = 1;

Review Comment:
   done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch master updated (99a6a04587 -> bf721212bf)

2023-08-10 Thread jackie
This is an automated email from the ASF dual-hosted git repository.

jackie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


from 99a6a04587 Bump org.codehaus.mojo:versions-maven-plugin from 2.13.0 to 
2.16.0 (#11304)
 add bf721212bf Fix GREATEST and LEAST transform functions. (#11308)

No new revisions were added by this update.

Summary of changes:
 .../SelectTupleElementTransformFunction.java   | 215 +
 .../TupleSelectionTransformFunctionsTest.java  |  30 ++-
 .../queries/NullHandlingEnabledQueriesTest.java|   2 +-
 3 files changed, 22 insertions(+), 225 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] xiangfu0 merged pull request #11151: [multistage] Derive SUM return type to be PostgreSQL compatible

2023-08-10 Thread via GitHub


xiangfu0 merged PR #11151:
URL: https://github.com/apache/pinot/pull/11151


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch master updated (bf721212bf -> 3bdc0e1c86)

2023-08-10 Thread xiangfu
This is an automated email from the ASF dual-hosted git repository.

xiangfu pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


from bf721212bf Fix GREATEST and LEAST transform functions. (#11308)
 add 3bdc0e1c86 [multistage] Derive SUM return type to be PostgreSQL 
compatible (#11151)

No new revisions were added by this update.

Summary of changes:
 .../tests/BaseClusterIntegrationTestSet.java   |  9 +++
 .../org/apache/pinot/query/QueryEnvironment.java   |  7 +-
 .../org/apache/pinot/query/type/TypeSystem.java| 80 --
 .../src/test/resources/queries/AggregatePlans.json | 20 +++---
 .../src/test/resources/queries/JoinPlans.json  | 58 +++-
 5 files changed, 158 insertions(+), 16 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang merged pull request #11243: Make task MaxAttemptsPerTask configurable

2023-08-10 Thread via GitHub


Jackie-Jiang merged PR #11243:
URL: https://github.com/apache/pinot/pull/11243


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch master updated (3bdc0e1c86 -> 3fd8c36de0)

2023-08-10 Thread jackie
This is an automated email from the ASF dual-hosted git repository.

jackie pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


from 3bdc0e1c86 [multistage] Derive SUM return type to be PostgreSQL 
compatible (#11151)
 add 3fd8c36de0 Make task MaxAttemptsPerTask configurable (#11243)

No new revisions were added by this update.

Summary of changes:
 .../core/minion/PinotHelixTaskResourceManager.java| 18 +++---
 .../helix/core/minion/PinotTaskManager.java   |  5 +++--
 .../core/minion/generator/BaseTaskGenerator.java  | 19 +--
 .../core/minion/generator/PinotTaskGenerator.java |  8 
 .../org/apache/pinot/core/common/MinionConstants.java |  6 ++
 .../tests/SimpleMinionClusterIntegrationTest.java | 16 ++--
 6 files changed, 59 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] shenyu0127 opened a new pull request, #11316: Remove the redundant OrderByExpressionContext::isDesc.

2023-08-10 Thread via GitHub


shenyu0127 opened a new pull request, #11316:
URL: https://github.com/apache/pinot/pull/11316

   This API is only used in precondition checks and tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] s0nskar closed pull request #11317: In ForwardIndexHandler, recollect column stats when converting raw to dict encoding

2023-08-10 Thread via GitHub


s0nskar closed pull request #11317: In ForwardIndexHandler, recollect column 
stats when converting raw to dict encoding
URL: https://github.com/apache/pinot/pull/11317


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] s0nskar opened a new pull request, #11317: In ForwardIndexHandler, recollect column stats when converting raw to dict encoding

2023-08-10 Thread via GitHub


s0nskar opened a new pull request, #11317:
URL: https://github.com/apache/pinot/pull/11317

   Fixes #10508 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] s0nskar opened a new pull request, #11318: In ForwardIndexHandler, recollect column stats when converting raw to…

2023-08-10 Thread via GitHub


s0nskar opened a new pull request, #11318:
URL: https://github.com/apache/pinot/pull/11318

   Fixes #10508 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] codecov-commenter commented on pull request #11316: Remove the redundant OrderByExpressionContext::isDesc.

2023-08-10 Thread via GitHub


codecov-commenter commented on PR #11316:
URL: https://github.com/apache/pinot/pull/11316#issuecomment-1673953729

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/11316?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   > Merging 
[#11316](https://app.codecov.io/gh/apache/pinot/pull/11316?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (95db5ee) into 
[master](https://app.codecov.io/gh/apache/pinot/commit/3fd8c36de07a49778e97db863bdadf886c464a90?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (3fd8c36) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@Coverage Diff@@
   ##   master   #11316 +/-   ##
   =
 Coverage0.11%0.11% 
   =
 Files2231 2157 -74 
 Lines  120078   116920   -3158 
 Branches1819417748-446 
   =
 Hits  137  137 
   + Misses 119921   116763   -3158 
 Partials   20   20 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `0.11% <0.00%> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Files 
Changed](https://app.codecov.io/gh/apache/pinot/pull/11316?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[...mmon/request/context/OrderByExpressionContext.java](https://app.codecov.io/gh/apache/pinot/pull/11316?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVxdWVzdC9jb250ZXh0L09yZGVyQnlFeHByZXNzaW9uQ29udGV4dC5qYXZh)
 | `0.00% <ø> (ø)` | |
   | 
[...uery/SelectionPartiallyOrderedByDescOperation.java](https://app.codecov.io/gh/apache/pinot/pull/11316?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9xdWVyeS9TZWxlY3Rpb25QYXJ0aWFsbHlPcmRlcmVkQnlEZXNjT3BlcmF0aW9uLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   
   ... and [76 files with indirect coverage 
changes](https://app.codecov.io/gh/apache/pinot/pull/11316/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD 
build times. [Learn 
more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] codecov-commenter commented on pull request #11318: In ForwardIndexHandler, recollect column stats when converting raw to…

2023-08-10 Thread via GitHub


codecov-commenter commented on PR #11318:
URL: https://github.com/apache/pinot/pull/11318#issuecomment-1673971758

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/11318?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   > Merging 
[#11318](https://app.codecov.io/gh/apache/pinot/pull/11318?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (f553293) into 
[master](https://app.codecov.io/gh/apache/pinot/commit/3fd8c36de07a49778e97db863bdadf886c464a90?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (3fd8c36) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@Coverage Diff@@
   ##   master   #11318 +/-   ##
   =
 Coverage0.11%0.11% 
   =
 Files2231 2157 -74 
 Lines  120078   116921   -3157 
 Branches1819417748-446 
   =
 Hits  137  137 
   + Misses 119921   116764   -3157 
 Partials   20   20 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `0.11% <0.00%> (ø)` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Files 
Changed](https://app.codecov.io/gh/apache/pinot/pull/11318?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[...ocal/segment/index/loader/ForwardIndexHandler.java](https://app.codecov.io/gh/apache/pinot/pull/11318?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9Gb3J3YXJkSW5kZXhIYW5kbGVyLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   
   ... and [76 files with indirect coverage 
changes](https://app.codecov.io/gh/apache/pinot/pull/11318/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD 
build times. [Learn 
more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10927: [refactor] improve disk read for partial upsert handler

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #10927:
URL: https://github.com/apache/pinot/pull/10927#discussion_r1290767908


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/PartialUpsertHandler.java:
##
@@ -60,29 +72,57 @@ public PartialUpsertHandler(Schema schema, Map pa
* For example, overwrite merger will only override the prev value if the 
new value is not null.
* Null values will override existing values if not configured. They can be 
ignored by using ignoreMerger.
*
-   * @param previousRecord the last derived full record during ingestion.
+   * @param indexSegment the segment of the last derived full record during 
ingestion.
+   * @param docId the docId of the last derived full record during ingestion 
in the segment.
* @param newRecord the new consumed record.
-   * @return a new row after merge
*/
-  public GenericRow merge(GenericRow previousRecord, GenericRow newRecord) {
-for (String column : previousRecord.getFieldToValueMap().keySet()) {
+  public void merge(IndexSegment indexSegment, int docId, GenericRow 
newRecord) {
+for (String column: newRecord.getFieldToValueMap().keySet()) {

Review Comment:
   I think we should still go over the fields within the segment (schema) 
instead of the new record



##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/PartialUpsertHandler.java:
##
@@ -60,29 +72,57 @@ public PartialUpsertHandler(Schema schema, Map pa
* For example, overwrite merger will only override the prev value if the 
new value is not null.
* Null values will override existing values if not configured. They can be 
ignored by using ignoreMerger.
*
-   * @param previousRecord the last derived full record during ingestion.
+   * @param indexSegment the segment of the last derived full record during 
ingestion.
+   * @param docId the docId of the last derived full record during ingestion 
in the segment.
* @param newRecord the new consumed record.
-   * @return a new row after merge
*/
-  public GenericRow merge(GenericRow previousRecord, GenericRow newRecord) {
-for (String column : previousRecord.getFieldToValueMap().keySet()) {
+  public void merge(IndexSegment indexSegment, int docId, GenericRow 
newRecord) {
+for (String column: newRecord.getFieldToValueMap().keySet()) {
   if (!_primaryKeyColumns.contains(column)) {
-if (!previousRecord.isNullValue(column)) {
+// Non-overwrite mergers
+// (1) If the value of the previous is null value, skip merging and 
use the new value
+// (2) Else If the value of new value is null, use the previous value 
(even for comparison columns).
+// (3) Else If the column is not a comparison column, we applied the 
merged value to it.
+if (!(getMergerForColumn(column) instanceof OverwriteMerger)) {

Review Comment:
   (minor) Cache the merger



##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/PartialUpsertHandler.java:
##
@@ -37,6 +43,7 @@ public class PartialUpsertHandler {
   private final PartialUpsertMerger _defaultPartialUpsertMerger;
   private final List _comparisonColumns;
   private final List _primaryKeyColumns;
+  private final Logger _logger;

Review Comment:
   Suggest not adding per-schema logger. To match the existing behavior, we can 
directly throw `RuntimeException` if any exception happens



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11315: Enhance Minion task management

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #11315:
URL: https://github.com/apache/pinot/pull/11315#discussion_r1290771548


##
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java:
##
@@ -118,25 +117,19 @@ public synchronized Set getTaskTypes() {
*
* @param taskType Task type
*/
-  public void ensureTaskQueueExists(String taskType) {
+  public synchronized void ensureTaskQueueExists(String taskType) {
 String helixJobQueueName = getHelixJobQueueName(taskType);
 WorkflowConfig workflowConfig = 
_taskDriver.getWorkflowConfig(helixJobQueueName);
-
 if (workflowConfig == null) {
   // Task queue does not exist
   LOGGER.info("Creating task queue: {} for task type: {}", 
helixJobQueueName, taskType);
 
   // Set full parallelism
   // Don't allow overlap job assignment so that we can control number of 
concurrent tasks per instance
-  JobQueue jobQueue = new JobQueue.Builder(helixJobQueueName)
-  .setWorkflowConfig(new 
WorkflowConfig.Builder().setParallelJobs(Integer.MAX_VALUE).build()).build();
+  JobQueue jobQueue = new 
JobQueue.Builder(helixJobQueueName).setWorkflowConfig(
+  new 
WorkflowConfig.Builder().setParallelJobs(Integer.MAX_VALUE).build()).build();
   _taskDriver.createQueue(jobQueue);
 }
-
-// Wait until task queue context shows up
-while (_taskDriver.getWorkflowContext(helixJobQueueName) == null) {

Review Comment:
   This was added in #1894 as a workaround to a Helix bug. Since we upgraded to 
latest Helix version, this workaround is no longer needed. Other changes were 
already removed, but this one was not removed because we need to fix the task 
status API as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #11315: Enhance Minion task management

2023-08-10 Thread via GitHub


Jackie-Jiang commented on code in PR #11315:
URL: https://github.com/apache/pinot/pull/11315#discussion_r1290771989


##
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java:
##
@@ -326,27 +324,35 @@ public synchronized Set getTasks(String taskType) 
{
 
   /**
* Get all task states for the given task type.
+   * NOTE: For tasks just submitted without the context created, count them as 
NOT_STARTED.
*
* @param taskType Task type
* @return Map from task name to task state
*/
   public synchronized Map getTaskStates(String taskType) {
-Map helixJobStates = new HashMap<>();
+WorkflowConfig workflowConfig = 
_taskDriver.getWorkflowConfig(getHelixJobQueueName(taskType));
+if (workflowConfig == null) {
+  return Collections.emptyMap();
+}
+Set helixJobs = workflowConfig.getJobDag().getAllNodes();
+if (helixJobs.isEmpty()) {
+  return Collections.emptyMap();
+}
 WorkflowContext workflowContext = 
_taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
-
 if (workflowContext == null) {
-  return helixJobStates;
-}
-helixJobStates = workflowContext.getJobStates();
-Map taskStates = new HashMap<>(helixJobStates.size());
-for (Map.Entry entry : helixJobStates.entrySet()) {
-  taskStates.put(getPinotTaskName(entry.getKey()), entry.getValue());
+  return helixJobs.stream()
+  
.collect(Collectors.toMap(PinotHelixTaskResourceManager::getPinotTaskName, 
ignored -> TaskState.NOT_STARTED));
+} else {
+  Map helixJobStates = workflowContext.getJobStates();
+  return 
helixJobs.stream().collect(Collectors.toMap(PinotHelixTaskResourceManager::getPinotTaskName,
+  helixJobName -> helixJobStates.getOrDefault(helixJobName, 
TaskState.NOT_STARTED)));
 }
-return taskStates;
   }
 
   /**
* This method returns a count of sub-tasks in various states, given the 
top-level task name.
+   * TODO: It doesn't count tasks just submitted without the context created.

Review Comment:
   Good point. Let me fix all of them



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] shenyu0127 commented on pull request #11316: Remove the redundant OrderByExpressionContext::isDesc.

2023-08-10 Thread via GitHub


shenyu0127 commented on PR #11316:
URL: https://github.com/apache/pinot/pull/11316#issuecomment-1674053396

   > I feel we can just keep it even though it is not used frequently. 
Basically it is `!isAsc()`
   
   I prefer to remove `isDesc` because
   - It is redundant
   - We have`isNullsLast` and we don't have `isNullsFirst`, removing `isDesc` 
makes the methods consistent.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] deemoliu commented on a diff in pull request #10927: [refactor] improve disk read for partial upsert handler

2023-08-10 Thread via GitHub


deemoliu commented on code in PR #10927:
URL: https://github.com/apache/pinot/pull/10927#discussion_r1290791520


##
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/upsert/PartialUpsertHandler.java:
##
@@ -37,6 +43,7 @@ public class PartialUpsertHandler {
   private final PartialUpsertMerger _defaultPartialUpsertMerger;
   private final List _comparisonColumns;
   private final List _primaryKeyColumns;
+  private final Logger _logger;

Review Comment:
   gotcha, thanks for the suggestion. Now i recall this comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] xiangfu0 opened a new pull request, #11319: Consolidate per function/data type integration test

2023-08-10 Thread via GitHub


xiangfu0 opened a new pull request, #11319:
URL: https://github.com/apache/pinot/pull/11319

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] codecov-commenter commented on pull request #11319: Consolidate per function/data type integration test

2023-08-10 Thread via GitHub


codecov-commenter commented on PR #11319:
URL: https://github.com/apache/pinot/pull/11319#issuecomment-1674110334

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/11319?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   > Merging 
[#11319](https://app.codecov.io/gh/apache/pinot/pull/11319?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (96f0f4c) into 
[master](https://app.codecov.io/gh/apache/pinot/commit/bf721212bf9d5849ce5221ec0b5dd2e6aff41dde?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (bf72121) will **increase** coverage by `0.00%`.
   > Report is 2 commits behind head on master.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@Coverage Diff@@
   ##   master   #11319 +/-   ##
   =
 Coverage0.11%0.11% 
   =
 Files2231 2157 -74 
 Lines  120039   116921   -3118 
 Branches1818417748-436 
   =
 Hits  137  137 
   + Misses 119882   116764   -3118 
 Partials   20   20 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | integration2temurin20 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Files 
Changed](https://app.codecov.io/gh/apache/pinot/pull/11319?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[.../main/java/org/apache/pinot/client/Connection.java](https://app.codecov.io/gh/apache/pinot/pull/11319?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[.../java/org/apache/pinot/query/QueryEnvironment.java](https://app.codecov.io/gh/apache/pinot/pull/11319?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcGxhbm5lci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvUXVlcnlFbnZpcm9ubWVudC5qYXZh)
 | `0.00% <0.00%> (ø)` | |
   | 
[...n/java/org/apache/pinot/query/type/TypeSystem.java](https://app.codecov.io/gh/apache/pinot/pull/11319?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtcXVlcnktcGxhbm5lci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcXVlcnkvdHlwZS9UeXBlU3lzdGVtLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   
   ... and [81 files with indirect coverage 
changes](https://app.codecov.io/gh/apache/pinot/pull/11319/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD 
build times. [Learn 
more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] shenyu0127 opened a new pull request, #11320: Fix a bug in NULL ordering using ordinal.

2023-08-10 Thread via GitHub


shenyu0127 opened a new pull request, #11320:
URL: https://github.com/apache/pinot/pull/11320

   After https://github.com/apache/pinot/pull/10805, there can be either one 
(ASC/DESC) or two (NULLS FIRST/NULLS LAST + ASC/DESC) levels of functions 
wrapping the ORDER BY expression. The `OrdinalsUpdater` still assumed only one 
level of function.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] codecov-commenter commented on pull request #11320: Fix a bug in NULL ordering using ordinal.

2023-08-10 Thread via GitHub


codecov-commenter commented on PR #11320:
URL: https://github.com/apache/pinot/pull/11320#issuecomment-1674146697

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/11320?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   > Merging 
[#11320](https://app.codecov.io/gh/apache/pinot/pull/11320?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (d59221d) into 
[master](https://app.codecov.io/gh/apache/pinot/commit/3fd8c36de07a49778e97db863bdadf886c464a90?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (3fd8c36) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@Coverage Diff@@
   ##   master   #11320 +/-   ##
   =
 Coverage0.11%0.11% 
   =
 Files2231 2157 -74 
 Lines  120078   116925   -3153 
 Branches1819417749-445 
   =
 Hits  137  137 
   + Misses 119921   116768   -3153 
 Partials   20   20 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `0.11% <0.00%> (ø)` | |
   | unittests2temurin20 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   | [Files 
Changed](https://app.codecov.io/gh/apache/pinot/pull/11320?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 | Coverage Δ | |
   |---|---|---|
   | 
[...org/apache/pinot/sql/parsers/CalciteSqlParser.java](https://app.codecov.io/gh/apache/pinot/pull/11320?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcWwvcGFyc2Vycy9DYWxjaXRlU3FsUGFyc2VyLmphdmE=)
 | `0.00% <0.00%> (ø)` | |
   | 
[...he/pinot/sql/parsers/rewriter/OrdinalsUpdater.java](https://app.codecov.io/gh/apache/pinot/pull/11320?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcWwvcGFyc2Vycy9yZXdyaXRlci9PcmRpbmFsc1VwZGF0ZXIuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   | 
[...uest/context/utils/QueryContextConverterUtils.java](https://app.codecov.io/gh/apache/pinot/pull/11320?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZXF1ZXN0L2NvbnRleHQvdXRpbHMvUXVlcnlDb250ZXh0Q29udmVydGVyVXRpbHMuamF2YQ==)
 | `0.00% <0.00%> (ø)` | |
   
   ... and [76 files with indirect coverage 
changes](https://app.codecov.io/gh/apache/pinot/pull/11320/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD 
build times. [Learn 
more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] cbalci commented on a diff in pull request #11098: Introduce FrequentStringsSketch and FrequentLongsSketch aggregation functions

2023-08-10 Thread via GitHub


cbalci commented on code in PR #11098:
URL: https://github.com/apache/pinot/pull/11098#discussion_r1290848973


##
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/FrequentLongsSketchAggregationFunction.java:
##
@@ -0,0 +1,251 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import java.util.Map;
+import org.apache.datasketches.frequencies.LongsSketch;
+import org.apache.datasketches.memory.Memory;
+import org.apache.pinot.common.request.context.ExpressionContext;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import 
org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import 
org.apache.pinot.segment.local.customobject.SerializedFrequentLongsSketch;
+import org.apache.pinot.segment.spi.AggregationFunctionType;
+import org.apache.pinot.spi.data.FieldSpec;
+
+
+/**
+ * 
+ *  {@code FrequentLongsSketchAggregationFunction} provides an approximate 
FrequentItems aggregation function based on
+ *  https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html";>Apache
 DataSketches library.
+ *  It is memory efficient compared to exact counting.
+ * 
+ * 
+ *   The function takes an INT or LONG column as input and returns a Base64 
encoded sketch object which can be
+ *   deserialized and used to estimate the frequency of items in the dataset 
(how many times they appear).
+ * 
+ * FREQUENT_STRINGS_SKETCH(col, maxMapSize=256)
+ * E.g.:
+ * 
+ *   FREQUENT_LONGS_SKETCH(col)
+ *   FREQUENT_LONGS_SKETCH(col, 1024)
+ * 
+ *
+ * 
+ *   If the column type is BYTES, the aggregation function will assume it is a 
serialized FrequentItems data sketch
+ *   of type `LongsSketch`and will attempt to deserialize it for merging with 
other sketch objects.
+ * 
+ *
+ * 
+ *   There is a variation of the function (FREQUENT_STRINGS_SKETCH) 
which accepts STRING type input columns.
+ * 
+ */
+public class FrequentLongsSketchAggregationFunction
+extends BaseSingleInputAggregationFunction> {
+  protected static final int DEFAULT_MAX_MAP_SIZE = 256;
+
+  protected int _maxMapSize;
+
+  public FrequentLongsSketchAggregationFunction(List 
arguments) {
+super(arguments.get(0));
+int numArguments = arguments.size();
+Preconditions.checkArgument(numArguments == 1 || numArguments == 2,
+"Expecting 1 or 2 arguments for FrequentLongsSketch function: 
FREQUENTITEMSSKETCH(column, maxMapSize");
+_maxMapSize = numArguments == 2 ? 
arguments.get(1).getLiteral().getIntValue() : DEFAULT_MAX_MAP_SIZE;
+  }
+
+  @Override
+  public AggregationFunctionType getType() {
+return AggregationFunctionType.FREQUENTLONGSSKETCH;
+  }
+
+  @Override
+  public AggregationResultHolder createAggregationResultHolder() {
+return new ObjectAggregationResultHolder();
+  }
+
+  @Override
+  public GroupByResultHolder createGroupByResultHolder(int initialCapacity, 
int maxCapacity) {
+return new ObjectGroupByResultHolder(initialCapacity, maxCapacity);
+  }
+
+  @Override
+  public void aggregate(int length, AggregationResultHolder 
aggregationResultHolder,
+  Map blockValSetMap) {
+BlockValSet valueSet = blockValSetMap.get(_expression);
+FieldSpec.DataType valueType = valueSet.getValueType();
+
+LongsSketch sketch = getOrCreateSketch(aggregationResultHolder);
+
+if (valueType == FieldSpec.DataType.BYTES) {
+  // Assuming the column contains serialized data sketch
+  LongsSketch[] deserializedSketches =
+  
deserializeSketches(blockValSetMap.get(_expression).getBytesValuesSV());
+  sketch = getOrCreateSketch(aggregationResultHolder);
+
+  for (int i = 0; i < length; i++) {

Review Comment:
   Done



##
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/FrequentLongsSketchAggregationFunction.java:

[GitHub] [pinot] cbalci commented on a diff in pull request #11098: Introduce FrequentStringsSketch and FrequentLongsSketch aggregation functions

2023-08-10 Thread via GitHub


cbalci commented on code in PR #11098:
URL: https://github.com/apache/pinot/pull/11098#discussion_r1290848886


##
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/FrequentLongsSketchAggregationFunction.java:
##
@@ -0,0 +1,251 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import java.util.Map;
+import org.apache.datasketches.frequencies.LongsSketch;
+import org.apache.datasketches.memory.Memory;
+import org.apache.pinot.common.request.context.ExpressionContext;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import 
org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import 
org.apache.pinot.segment.local.customobject.SerializedFrequentLongsSketch;
+import org.apache.pinot.segment.spi.AggregationFunctionType;
+import org.apache.pinot.spi.data.FieldSpec;
+
+
+/**
+ * 
+ *  {@code FrequentLongsSketchAggregationFunction} provides an approximate 
FrequentItems aggregation function based on
+ *  https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html";>Apache
 DataSketches library.
+ *  It is memory efficient compared to exact counting.
+ * 
+ * 
+ *   The function takes an INT or LONG column as input and returns a Base64 
encoded sketch object which can be
+ *   deserialized and used to estimate the frequency of items in the dataset 
(how many times they appear).
+ * 
+ * FREQUENT_STRINGS_SKETCH(col, maxMapSize=256)

Review Comment:
   Good point. Added some basic explanation to the Javadoc. I'll also follow up 
with a more detailed description for the function and how to use it in Pinot 
Docs repo.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] cbalci commented on a diff in pull request #11098: Introduce FrequentStringsSketch and FrequentLongsSketch aggregation functions

2023-08-10 Thread via GitHub


cbalci commented on code in PR #11098:
URL: https://github.com/apache/pinot/pull/11098#discussion_r1290848886


##
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/FrequentLongsSketchAggregationFunction.java:
##
@@ -0,0 +1,251 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.query.aggregation.function;
+
+import com.google.common.base.Preconditions;
+import java.util.List;
+import java.util.Map;
+import org.apache.datasketches.frequencies.LongsSketch;
+import org.apache.datasketches.memory.Memory;
+import org.apache.pinot.common.request.context.ExpressionContext;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.core.common.BlockValSet;
+import org.apache.pinot.core.query.aggregation.AggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.ObjectAggregationResultHolder;
+import org.apache.pinot.core.query.aggregation.groupby.GroupByResultHolder;
+import 
org.apache.pinot.core.query.aggregation.groupby.ObjectGroupByResultHolder;
+import 
org.apache.pinot.segment.local.customobject.SerializedFrequentLongsSketch;
+import org.apache.pinot.segment.spi.AggregationFunctionType;
+import org.apache.pinot.spi.data.FieldSpec;
+
+
+/**
+ * 
+ *  {@code FrequentLongsSketchAggregationFunction} provides an approximate 
FrequentItems aggregation function based on
+ *  https://datasketches.apache.org/docs/Frequency/FrequentItemsOverview.html";>Apache
 DataSketches library.
+ *  It is memory efficient compared to exact counting.
+ * 
+ * 
+ *   The function takes an INT or LONG column as input and returns a Base64 
encoded sketch object which can be
+ *   deserialized and used to estimate the frequency of items in the dataset 
(how many times they appear).
+ * 
+ * FREQUENT_STRINGS_SKETCH(col, maxMapSize=256)

Review Comment:
   Good point. I added some basic explanation to the Javadoc:
   
   > Second argument, maxMapsSize, refers to the size of the physical length of 
the hashmap which stores counts. It influences the accuracy of the sketch and 
should be a power of 2.
   
I'll also follow up with a more detailed description for the function and 
how to use it in Pinot Docs repo.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] cbalci commented on pull request #11098: Introduce FrequentStringsSketch and FrequentLongsSketch aggregation functions

2023-08-10 Thread via GitHub


cbalci commented on PR #11098:
URL: https://github.com/apache/pinot/pull/11098#issuecomment-1674156926

   Thanks for the review @chenboat. Addressed the comments, please take another 
look when you get a chance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] saurabhd336 opened a new pull request, #11321: Include package-lock.json in generated sources

2023-08-10 Thread via GitHub


saurabhd336 opened a new pull request, #11321:
URL: https://github.com/apache/pinot/pull/11321

   Include package-lock.json in generated sources


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] codecov-commenter commented on pull request #11321: Include package-lock.json in generated sources

2023-08-10 Thread via GitHub


codecov-commenter commented on PR #11321:
URL: https://github.com/apache/pinot/pull/11321#issuecomment-1674234428

   ## 
[Codecov](https://app.codecov.io/gh/apache/pinot/pull/11321?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 Report
   > Merging 
[#11321](https://app.codecov.io/gh/apache/pinot/pull/11321?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (6594f36) into 
[master](https://app.codecov.io/gh/apache/pinot/commit/3fd8c36de07a49778e97db863bdadf886c464a90?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
 (3fd8c36) will **increase** coverage by `0.00%`.
   > The diff coverage is `n/a`.
   
   ```diff
   @@Coverage Diff@@
   ##   master   #11321 +/-   ##
   =
 Coverage0.11%0.11% 
   =
 Files2231 2157 -74 
 Lines  120078   116921   -3157 
 Branches1819417748-446 
   =
 Hits  137  137 
   + Misses 119921   116764   -3157 
 Partials   20   20 
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | integration1temurin20 | `?` | |
   | integration2temurin11 | `?` | |
   | integration2temurin17 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `?` | |
   | unittests2temurin17 | `?` | |
   | unittests2temurin20 | `0.11% <ø> (ø)` | |
   
   Flags with carried forward coverage won't be shown. [Click 
here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment)
 to find out more.
   
   [see 76 files with indirect coverage 
changes](https://app.codecov.io/gh/apache/pinot/pull/11321/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD 
build times. [Learn 
more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[GitHub] [pinot] snleee merged pull request #11321: Include package-lock.json in generated sources

2023-08-10 Thread via GitHub


snleee merged PR #11321:
URL: https://github.com/apache/pinot/pull/11321


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org



[pinot] branch master updated: Include package-lock.json in generated sources (#11321)

2023-08-10 Thread snlee
This is an automated email from the ASF dual-hosted git repository.

snlee pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pinot.git


The following commit(s) were added to refs/heads/master by this push:
 new 527eb4fddb Include package-lock.json in generated sources (#11321)
527eb4fddb is described below

commit 527eb4fddb70c3760ef81674182dddf4656c755b
Author: Saurabh Dubey 
AuthorDate: Fri Aug 11 11:34:58 2023 +0530

Include package-lock.json in generated sources (#11321)

Co-authored-by: Saurabh Dubey 
---
 pinot-distribution/pinot-source-assembly.xml | 1 -
 1 file changed, 1 deletion(-)

diff --git a/pinot-distribution/pinot-source-assembly.xml 
b/pinot-distribution/pinot-source-assembly.xml
index 6e9c45099e..4ab49843af 100644
--- a/pinot-distribution/pinot-source-assembly.xml
+++ b/pinot-distribution/pinot-source-assembly.xml
@@ -44,7 +44,6 @@
 
 pinot-controller/src/main/resources/node_modules/**
 pinot-controller/src/main/resources/dist/**
-
pinot-controller/src/main/resources/package-lock.json
 
 
 **/*.releaseBackup


-
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org