9aman commented on code in PR #17089:
URL: https://github.com/apache/pinot/pull/17089#discussion_r2471982638
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:
##########
@@ -350,6 +378,18 @@ public List<SegmentContext>
getSegmentContexts(List<IndexSegment> selectedSegmen
return segmentContexts;
}
+ public StreamMetadataProvider
getStreamMetadataProvider(RealtimeSegmentDataManager
realtimeSegmentDataManager) {
+ String tableStreamName = realtimeSegmentDataManager.getTableStreamName();
+ StreamConsumerFactory streamConsumerFactory =
realtimeSegmentDataManager.getStreamConsumerFactory();
+ try {
+ return _streamMetadataProviderCache.get(tableStreamName,
Review Comment:
Can we add a comment here that the stream metadata provider created here is
a synchronized one and hence it;s thread safe.
This is a bit different from the traditional metadata provider and hence the
calls might get blocked.
##########
pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/src/main/java/org/apache/pinot/plugin/stream/kafka20/SynchronizedKafkaStreamMetadataProvider.java:
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.stream.kafka20;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.pinot.spi.stream.StreamConfig;
+import org.apache.pinot.spi.stream.StreamPartitionMsgOffset;
+
+
+public class SynchronizedKafkaStreamMetadataProvider extends
KafkaStreamMetadataProvider {
Review Comment:
Please add java doc. Maybe a brief on the rationale behind adding this and
the scenarios it should be used in.
##########
pinot-server/src/main/java/org/apache/pinot/server/api/resources/DebugResource.java:
##########
@@ -209,17 +216,34 @@ private long getSegmentSize(SegmentDataManager
segmentDataManager) {
.getSegment()).getSegmentSizeBytes() : 0;
}
- private SegmentConsumerInfo getSegmentConsumerInfo(SegmentDataManager
segmentDataManager, TableType tableType) {
+ private SegmentConsumerInfo getSegmentConsumerInfo(TableDataManager
tableDataManager,
+ SegmentDataManager segmentDataManager, TableType tableType) {
SegmentConsumerInfo segmentConsumerInfo = null;
if (tableType == TableType.REALTIME) {
RealtimeSegmentDataManager realtimeSegmentDataManager =
(RealtimeSegmentDataManager) segmentDataManager;
- Map<String, ConsumerPartitionState> partitionStateMap =
realtimeSegmentDataManager.getConsumerPartitionState();
+ StreamMetadataProvider streamMetadataProvider =
Review Comment:
@KKcorps this seems to be correctly using _streamPartitionId
```
if (numStreams == 1) {
// Single stream
// NOTE: We skip partition id translation logic to handle cases where
custom stream might return partition id
// larger than 10000.
_streamPartitionId = _partitionGroupId;
_streamConfig = new StreamConfig(_tableNameWithType,
streamConfigMaps.get(0));
} else {
// Multiple streams
_streamPartitionId =
IngestionConfigUtils.getStreamPartitionIdFromPinotPartitionId(_partitionGroupId);
int index =
IngestionConfigUtils.getStreamConfigIndexFromPinotPartitionId(_partitionGroupId);
Preconditions.checkState(numStreams > index, "Cannot find stream
config of index: %s for table: %s", index,
_tableNameWithType);
_streamConfig = new StreamConfig(_tableNameWithType,
streamConfigMaps.get(index));
}
_streamConsumerFactory =
StreamConsumerFactoryProvider.create(_streamConfig);
```
The existing code also relies on _partitionGroupId that is set based on the
segment name to fetch the latest offset.
@noob-se7en please verify whether the existing code also has any concerns.
@KKcorps I feel the new code has a similar behavior to that of the previous
code.
Please correct me if wrong.
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeSegmentDataManager.java:
##########
@@ -1082,21 +1081,12 @@ public long getLastConsumedTimestamp() {
/**
* Returns the {@link ConsumerPartitionState} for the partition group.
*/
- public Map<String, ConsumerPartitionState> getConsumerPartitionState() {
+ public Map<String, ConsumerPartitionState> getConsumerPartitionState(
+ @Nullable StreamPartitionMsgOffset latestMsgOffset) {
String partitionGroupId = String.valueOf(_partitionGroupId);
- return Collections.singletonMap(partitionGroupId, new
ConsumerPartitionState(partitionGroupId, getCurrentOffset(),
- getLastConsumedTimestamp(), fetchLatestStreamOffset(5_000),
_lastRowMetadata));
- }
-
- /**
- * Returns the {@link PartitionLagState} for the partition group.
- */
- public Map<String, PartitionLagState> getPartitionToLagState(
Review Comment:
Ohh, I see.
So we have segregated the provider based on the access patter i.e.
concurrent access or not concurrent access.
We go with the normal in case of RealtimeSegmentDataManager and concurrent
otherwise ?
##########
pinot-plugins/pinot-stream-ingestion/pinot-kafka-3.0/src/main/java/org/apache/pinot/plugin/stream/kafka30/SynchronizedKafkaStreamMetadataProvider.java:
##########
@@ -0,0 +1,39 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.plugin.stream.kafka30;
+
+import java.util.Map;
+import java.util.Set;
+import org.apache.pinot.spi.stream.StreamConfig;
+import org.apache.pinot.spi.stream.StreamPartitionMsgOffset;
+
+
+public class SynchronizedKafkaStreamMetadataProvider extends
KafkaStreamMetadataProvider {
Review Comment:
Same as above.
##########
pinot-core/src/main/java/org/apache/pinot/core/data/manager/realtime/RealtimeTableDataManager.java:
##########
@@ -248,6 +259,23 @@ public boolean getAsBoolean() {
}
}
+ @VisibleForTesting
+ protected Cache<String, StreamMetadataProvider>
getStreamMetadataProviderCache() {
+ return CacheBuilder.newBuilder()
+ .expireAfterAccess(STREAM_METADATA_PROVIDER_CACHE_TTL)
+ .removalListener((RemovalNotification<String, StreamMetadataProvider>
notification) -> {
+ StreamMetadataProvider provider = notification.getValue();
Review Comment:
Is there a way, similar to RealtimeSegmentDataManager, to invalidate the
cache in case we run into multiple transient errors ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]