rohityadav1993 commented on code in PR #13107: URL: https://github.com/apache/pinot/pull/13107#discussion_r1603148256
########## pinot-common/src/main/java/org/apache/pinot/common/metadata/segment/SegmentPartitionMetadata.java: ########## @@ -48,6 +53,21 @@ public SegmentPartitionMetadata( @Nonnull @JsonProperty("columnPartitionMap") Map<String, ColumnPartitionMetadata> columnPartitionMap) { Preconditions.checkNotNull(columnPartitionMap); _columnPartitionMap = columnPartitionMap; + _uploadedSegmentPartitionId = -1; + } + + /** + * Constructor for the class. + * + * @param columnPartitionMap Column name to ColumnPartitionMetadata map. + */ + @JsonCreator + public SegmentPartitionMetadata( + @Nullable @JsonProperty("columnPartitionMap") Map<String, ColumnPartitionMetadata> columnPartitionMap, + @Nullable @JsonProperty(value = "uploadedSegmentPartitionId", defaultValue = "-1") Review Comment: > I might have missed it, but how to configure SegmentPartitionConfig in TableConfig for tables that allow to upload segments built and partitioned externally? We don't need to configure the table similar to how we don't need to do for realtime stream ingestion for upsert tables. Providing some more context how this would fit in current design. There are two scenarios where data partitioning comes to play: 1. Query routing: [[docs](https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#data-ingested-partitioned-by-some-column)] Data partitioning is not a requirement here but a good optimization. 2. Segment assignment: a. If the data is partitioned on a single column and with a Pinot supported algorithm, we configure the table as: ``` ... "tableIndexConfig": { ... "segmentPartitionConfig": { "columnPartitionMap": { "memberId": { "functionName": "Modulo", "numPartitions": 3 } } }, ... }, ``` **Partitioning for upsert tables**: Consuming segment assignement: The stream is always externally partitioned (either on PK or other field which can still ensure the PKs are all part of the same partition) and does not need to use one of Pinot supported algorithms. `segmentPartitionConfig` need not be set for the upsert table either. Each `LLCSegmentName` contains a partitionId substring which is derived from the streams partitionId. When assigning a segment to instance, we get the partition id by parsing the LLCSegmentName in `SegmentUtils.getRealtimeSegmentPartitionId`. **Uploaded segment assignment**: Uploaded segments are not generated with LLCSegmentName convention. The only way to specify partitioning info is via `segmentPartitionConfig` via table config which is not possible if the stream is using custom partitioning. If one wants to backfill/uplaod segment to such custom partitioned stream, the uploaded segment must provide the partitionId so the segmentAssignement can put the segments in the same instances as the consuming segments of the same partition un pusert table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org