[incubator-pinot] branch skip-deploy-in-pinot-tools created (now 90cd580)

2021-03-04 Thread jlli
This is an automated email from the ASF dual-hosted git repository. jlli pushed a change to branch skip-deploy-in-pinot-tools in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. at 90cd580 Skip mvn deploy in pinot-tools module This branch includes the following new com

[incubator-pinot] 01/01: Skip mvn deploy in pinot-tools module

2021-03-04 Thread jlli
This is an automated email from the ASF dual-hosted git repository. jlli pushed a commit to branch skip-deploy-in-pinot-tools in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git commit 90cd580aec5a8ab9d2ed73ee199db3292d90b25f Author: Jack Li(Analytics Engineering) AuthorDate: T

[GitHub] [incubator-pinot] KKcorps commented on a change in pull request #6518: Kinesis Connector

2021-03-04 Thread GitBox
KKcorps commented on a change in pull request #6518: URL: https://github.com/apache/incubator-pinot/pull/6518#discussion_r588050566 ## File path: pinot-spi/src/main/java/org/apache/pinot/spi/stream/Checkpoint.java ## @@ -0,0 +1,29 @@ +/** + * Licensed to the Apache Software Fou

[GitHub] [incubator-pinot] KKcorps commented on pull request #6197: Parallelize PinotFS file operations

2021-03-04 Thread GitBox
KKcorps commented on pull request #6197: URL: https://github.com/apache/incubator-pinot/pull/6197#issuecomment-791174352 @elonazoulay How are you handling the case where sequential operations are must. e.g. Delete directory and then create a new directory and then copy contents to this ne

[GitHub] [incubator-pinot] icefury71 commented on a change in pull request #6485: [WIP] add schema validation mechanism for table config

2021-03-04 Thread GitBox
icefury71 commented on a change in pull request #6485: URL: https://github.com/apache/incubator-pinot/pull/6485#discussion_r588042899 ## File path: pinot-core/src/main/java/org/apache/pinot/core/util/validator/PinotConfigValidator.java ## @@ -0,0 +1,33 @@ +/** + * Licensed to

[GitHub] [incubator-pinot] icefury71 commented on pull request #6641: Adding a check for multi-value column in star tree indexing config

2021-03-04 Thread GitBox
icefury71 commented on pull request #6641: URL: https://github.com/apache/incubator-pinot/pull/6641#issuecomment-791163748 Aah - good point. I can make that change. The map for the most part is just for better error message - the validation logic will still work. However, that will bre

[GitHub] [incubator-pinot] codecov-io commented on pull request #6649: Skip loading columns not in the schema

2021-03-04 Thread GitBox
codecov-io commented on pull request #6649: URL: https://github.com/apache/incubator-pinot/pull/6649#issuecomment-791119791 # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6649?src=pr&el=h1) Report > Merging [#6649](https://codecov.io/gh/apache/incubator-pinot/pull/6649?s

[GitHub] [incubator-pinot] dubin555 commented on issue #6648: Support nested json schema in realtime ingest

2021-03-04 Thread GitBox
dubin555 commented on issue #6648: URL: https://github.com/apache/incubator-pinot/issues/6648#issuecomment-791101882 > Do you want the column name to be a.b and value xxx? yes, sir This is an automated message from th

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6649: Skip loading columns not in the schema

2021-03-04 Thread GitBox
Jackie-Jiang opened a new pull request #6649: URL: https://github.com/apache/incubator-pinot/pull/6649 ## Description Currently, when a column is added to the schema, server will generate default columns if the segment does not have the new column. But when a column is removed from the

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6647: Recover the segment from controller when LLC table cannot load it

2021-03-04 Thread GitBox
Jackie-Jiang commented on a change in pull request #6647: URL: https://github.com/apache/incubator-pinot/pull/6647#discussion_r587965796 ## File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java ## @@ -268,56 +269,69 @@ pu

[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6646: Support downloading of segment from controller on load error in Realtime table

2021-03-04 Thread GitBox
Jackie-Jiang commented on issue #6646: URL: https://github.com/apache/incubator-pinot/issues/6646#issuecomment-791082493 @mcvsubbu For real-time table, when server finds a local copy of the segment, but cannot load it, server will throw exception. The proposal here is to download a new c

[GitHub] [incubator-pinot] mcvsubbu commented on issue #6645: Support multiple levels of config validation

2021-03-04 Thread GitBox
mcvsubbu commented on issue #6645: URL: https://github.com/apache/incubator-pinot/issues/6645#issuecomment-791078659 Perhaps so, but we need to stop making incompatible changes at some point. The cost of retrofitting all tables to conditions that are not really applicable to that setup is

[GitHub] [incubator-pinot] kishoreg commented on issue #6648: Support nested json schema in realtime ingest

2021-03-04 Thread GitBox
kishoreg commented on issue #6648: URL: https://github.com/apache/incubator-pinot/issues/6648#issuecomment-791076731 Do you want the column name to be a.b and value xxx? This is an automated message from the Apache Git Servi

[GitHub] [incubator-pinot] mcvsubbu commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

2021-03-04 Thread GitBox
mcvsubbu commented on issue #6637: URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-791076286 A CONSUMING segment can go into ERROR state only during the OFFLINE to CONSUMING state transition. Once it has transitioned, it can never go into ERROR state. It will turn

[GitHub] [incubator-pinot] dubin555 opened a new issue #6648: Support nested json schema in realtime ingest

2021-03-04 Thread GitBox
dubin555 opened a new issue #6648: URL: https://github.com/apache/incubator-pinot/issues/6648 Hi forks, Does the streaming ingest function support the nested json schema, e.g. `{ "a": { "b": "xxx" } }` Is there a way like this in the schema statement: `"a.b" is a

[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6647: Recover the segment from controller when LLC table cannot load it

2021-03-04 Thread GitBox
mcvsubbu commented on a change in pull request #6647: URL: https://github.com/apache/incubator-pinot/pull/6647#discussion_r587956134 ## File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java ## @@ -268,56 +269,69 @@ public

[GitHub] [incubator-pinot] mcvsubbu commented on issue #6646: Support downloading of segment from controller on load error in Realtime table

2021-03-04 Thread GitBox
mcvsubbu commented on issue #6646: URL: https://github.com/apache/incubator-pinot/issues/6646#issuecomment-791072612 The segment download code is the same on the server, with retry logic -- whether realtime or offline This

[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6646: Support downloading of segment from controller on load error in Realtime table

2021-03-04 Thread GitBox
Jackie-Jiang commented on issue #6646: URL: https://github.com/apache/incubator-pinot/issues/6646#issuecomment-791061593 #6647 This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [incubator-pinot] Jackie-Jiang opened a new pull request #6647: Recover the segment from controller when LLC table cannot load it

2021-03-04 Thread GitBox
Jackie-Jiang opened a new pull request #6647: URL: https://github.com/apache/incubator-pinot/pull/6647 ## Description For LLC real-time table, when the server failed to load a local segment (e.g. segment is collapsed or default column got removed), try to download a new copy from the co

[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6632: Do not log warning when gRPC or admin port is not configured for an instance

2021-03-04 Thread GitBox
Jackie-Jiang commented on a change in pull request #6632: URL: https://github.com/apache/incubator-pinot/pull/6632#discussion_r587931997 ## File path: pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java ## @@ -114,25 +109,

[GitHub] [incubator-pinot] fx19880617 commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

2021-03-04 Thread GitBox
fx19880617 commented on issue #6637: URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-791044746 > I suggest creating a new segment, with new metadata etc. having the new offset. Set it consuming state, and let the ball roll. I see. Then how about the current c

[GitHub] [incubator-pinot] icefury71 opened a new issue #6646: Support downloading of segment from controller on load error in Realtime table

2021-03-04 Thread GitBox
icefury71 opened a new issue #6646: URL: https://github.com/apache/incubator-pinot/issues/6646 Currently, for realtime Pinot table, a segment load error (during server restart for eg) will result in the segment going into an ERROR state. This might happen if the segment was not sealed prop

[GitHub] [incubator-pinot] fx19880617 commented on issue #6642: Merge multi-valued field support from DistinctCountHHLMV into DistinctCountHLL

2021-03-04 Thread GitBox
fx19880617 commented on issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642#issuecomment-791031667 Right, from the implementation side, we will eventually merge the behavior of `` and `MV`. Then we can rewrite all `MV` to ``. -

[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #6632: Do not log warning when gRPC or admin port is not configured for an instance

2021-03-04 Thread GitBox
fx19880617 commented on a change in pull request #6632: URL: https://github.com/apache/incubator-pinot/pull/6632#discussion_r587910406 ## File path: pinot-controller/src/main/java/org/apache/pinot/controller/api/resources/PinotInstanceRestletResource.java ## @@ -114,25 +109,27

[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6635: In PinotDataType, trim the string before converting to other types

2021-03-04 Thread GitBox
Jackie-Jiang commented on pull request #6635: URL: https://github.com/apache/incubator-pinot/pull/6635#issuecomment-791026200 @fx19880617 @mayankshriv Please take another look, thanks This is an automated message from the Apa

[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6642: Merge multi-valued field support from DistinctCountHHLMV into DistinctCountHLL

2021-03-04 Thread GitBox
Jackie-Jiang commented on issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642#issuecomment-791025513 @fx19880617 We don't really need to rewrite. We have access to the column metadata during aggregation, and knows whether it is SV or MV ---

[GitHub] [incubator-pinot] icefury71 commented on issue #6645: Support multiple levels of config validation

2021-03-04 Thread GitBox
icefury71 commented on issue #6645: URL: https://github.com/apache/incubator-pinot/issues/6645#issuecomment-791013173 I feel if we leave the validation levels unbounded - the validation code will be rendered unreadable. I prefer to keep it to a few levels only (eg: strict, medium, basic).

[GitHub] [incubator-pinot] mcvsubbu commented on issue #6645: Support multiple levels of config validation

2021-03-04 Thread GitBox
mcvsubbu commented on issue #6645: URL: https://github.com/apache/incubator-pinot/issues/6645#issuecomment-790998054 A nice way of doing this (from user standpoint) is to start with validation set to level 0. Each time you add new validations, bump it up by 1. Let folks set the validatio

[GitHub] [incubator-pinot] icefury71 commented on issue #6644: Adding validation check to prevent creating two columns with different cases in schema

2021-03-04 Thread GitBox
icefury71 commented on issue #6644: URL: https://github.com/apache/incubator-pinot/issues/6644#issuecomment-790985354 @mcvsubbu created #6645 which should be done before addressing this. This is an automated message from the

[GitHub] [incubator-pinot] icefury71 opened a new issue #6645: Support multiple levels of config validation

2021-03-04 Thread GitBox
icefury71 opened a new issue #6645: URL: https://github.com/apache/incubator-pinot/issues/6645 Ideally, we wish to have levels of validation such as: * strict * basic (default) In this case, strict validation will imply all the stringent checks are done (eg: #6644, #6300 ). bas

[GitHub] [incubator-pinot] icefury71 commented on issue #6644: Adding validation check to prevent creating two columns with different cases in schema

2021-03-04 Thread GitBox
icefury71 commented on issue #6644: URL: https://github.com/apache/incubator-pinot/issues/6644#issuecomment-790983268 Yes agreed - this one in particular seems to be backwards incompatible. Currently, we don't provide a mechanism to enforce different levels of validation. We should probab

[GitHub] [incubator-pinot] mcvsubbu commented on issue #6644: Adding validation check to prevent creating two columns with different cases in schema

2021-03-04 Thread GitBox
mcvsubbu commented on issue #6644: URL: https://github.com/apache/incubator-pinot/issues/6644#issuecomment-790976508 I am fine for adding all sorts of validations, but I thought we agreed (perhaps verbally) on doing things in a compatible manner by introducing the concept of a `level`. If

[GitHub] [incubator-pinot] fx19880617 commented on issue #6642: Merge multi-valued field support from DistinctCountHHLMV into DistinctCountHLL

2021-03-04 Thread GitBox
fx19880617 commented on issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642#issuecomment-790975586 right, we should be able to infer the type with the table schema during query parsing phase and then rewrite the query if it's a multi-value column.

[GitHub] [incubator-pinot] fx19880617 opened a new issue #6644: Adding validation check to prevent creating two columns with different cases in schema

2021-03-04 Thread GitBox
fx19880617 opened a new issue #6644: URL: https://github.com/apache/incubator-pinot/issues/6644 Although pinot columns can be case-sensitive, I think we should still prevent the scenarios like this. This easily breaks the external case-insensitive integrations like Presto. E.g. in

[GitHub] [incubator-pinot] mqliang opened a new pull request #6643: remove duplicate code

2021-03-04 Thread GitBox
mqliang opened a new pull request #6643: URL: https://github.com/apache/incubator-pinot/pull/6643 ## Description This change remove duplicate code: 1. move GENERATION_NUMBER_PLACEHOLDER to BaseOP 2. make ClusterTest.postSqlQuery as public so that it can be imported from comp-te

[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6611: SegmentWriter interface

2021-03-04 Thread GitBox
jackjlli commented on a change in pull request #6611: URL: https://github.com/apache/incubator-pinot/pull/6611#discussion_r58738 ## File path: pinot-spi/src/main/java/org/apache/pinot/spi/ingestion/segment/writer/SegmentWriter.java ## @@ -0,0 +1,59 @@ +/** + * Licensed to

[incubator-pinot] branch master updated (df767c0 -> b2d716d)

2021-03-04 Thread mcvsubbu
This is an automated email from the ASF dual-hosted git repository. mcvsubbu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git. from df767c0 Removing redundant check for column with RAW encoding and noDictionary config (#6636) add b

[GitHub] [incubator-pinot] mcvsubbu merged pull request #6638: Add generation number to data and queries

2021-03-04 Thread GitBox
mcvsubbu merged pull request #6638: URL: https://github.com/apache/incubator-pinot/pull/6638 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [incubator-pinot] snleee commented on pull request #6611: SegmentWriter interface

2021-03-04 Thread GitBox
snleee commented on pull request #6611: URL: https://github.com/apache/incubator-pinot/pull/6611#issuecomment-790876594 @amrishlal Why do you need to get metadata information from `SegmentWriter`? I think that those stats will be used internal implementation and as long as the imple

[GitHub] [incubator-pinot] kkrugler commented on issue #6642: Merge multi-valued field support from DistinctCountHHLMV into DistinctCountHLL

2021-03-04 Thread GitBox
kkrugler commented on issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642#issuecomment-790857926 I imagine similar merging could happen for many of the other [xxxMV](https://docs.pinot.apache.org/users/user-guide-query/supported-aggregations#multi-value-column-functions

[GitHub] [incubator-pinot] kkrugler commented on issue #6642: Merge multi-valued field support from DistinctCountHHLMV into DistinctCountHLL

2021-03-04 Thread GitBox
kkrugler commented on issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642#issuecomment-790853089 Hi @Jackie-Jiang Updated issue to cover merging the two. This is an automated message from the Apache Git S

[GitHub] [incubator-pinot] Jackie-Jiang commented on issue #6642: Add multi-valued field support to DistinctCountHLL

2021-03-04 Thread GitBox
Jackie-Jiang commented on issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642#issuecomment-790838990 Currently, in order to aggregate on multi-valued fields, you need to use the MV version of the function: `DistinctCountHLLMVAggregationFunction`. In the future, we mi

[GitHub] [incubator-pinot] icefury71 commented on a change in pull request #6641: Adding a check for multi-value column in star tree indexing config

2021-03-04 Thread GitBox
icefury71 commented on a change in pull request #6641: URL: https://github.com/apache/incubator-pinot/pull/6641#discussion_r587710803 ## File path: pinot-core/src/main/java/org/apache/pinot/core/util/TableConfigUtils.java ## @@ -371,6 +371,7 @@ private static void validateInde

[GitHub] [incubator-pinot] kkrugler opened a new issue #6642: Add multi-valued field support to DistinctCountHLL

2021-03-04 Thread GitBox
kkrugler opened a new issue #6642: URL: https://github.com/apache/incubator-pinot/issues/6642 Currently the code in `DistinctCountHLLAggregationFunction.aggregate()` looks like: ``` java public void aggregate(int length, AggregationResultHolder aggregationResultHolder,

[GitHub] [incubator-pinot] mcvsubbu commented on issue #6637: Kafka ingestion: Allow users to reset the offset to consume

2021-03-04 Thread GitBox
mcvsubbu commented on issue #6637: URL: https://github.com/apache/incubator-pinot/issues/6637#issuecomment-790666889 I suggest creating a new segment, with new metadata etc. having the new offset. Set it consuming state, and let the ball roll.

[GitHub] [incubator-pinot] fx19880617 commented on a change in pull request #6641: Adding a check for multi-value column in star tree indexing config

2021-03-04 Thread GitBox
fx19880617 commented on a change in pull request #6641: URL: https://github.com/apache/incubator-pinot/pull/6641#discussion_r587269212 ## File path: pinot-core/src/main/java/org/apache/pinot/core/util/TableConfigUtils.java ## @@ -371,6 +371,7 @@ private static void validateInd