jhyao opened a new issue, #11948:
URL: https://github.com/apache/pinot/issues/11948

   We found an issue on pinot release-1.0.0, when a new segment is created, 
select count(*) on the table may result in a much smaller value than the actual 
value.
   
   In my testing, I firstly ingested 1 million data to pinot with all unique 
ids ranged from 0 to 1000000, then repeat the process to send data with same 
ids from 0 to 1000000 but with different column values. At the same time, 
another process execute count(*) query every 100ms to record count number.
   
   Here is the count number result, we can see many anomalous numbers, and 
their timing is exactly when new segments were created.
   
![image](https://github.com/apache/pinot/assets/17529008/9c9d5ec4-0e4b-4308-88bf-842a7cf320c5)
   
   I found in broker logs, most issues happened between IDEAL_STATE change and 
EXTERNAL_VIEW change. So I suspect the problem occurs between these two state 
changes. Do you have any idea?
   ```
   023/11/03 17:50:06.769 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003375,table=issuerrisk,timeMs=1,docs=1000000/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/1/0/0/0,consumingFreshnessTimeMs=1699005001041,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:06.771 INFO [CallbackHandler] 
[ZkClient-EventThread-73-localhost:2181] 73 START: CallbackHandler 0, INVOKE 
/PinotCluster/IDEALSTATES listener: 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: 
CALLBACK
   2023/11/03 17:50:06.771 INFO [ClusterChangeMediator] 
[ZkClient-EventThread-73-localhost:2181] Enqueueing IDEAL_STATE change
   2023/11/03 17:50:06.771 INFO [CallbackHandler] 
[ZkClient-EventThread-73-localhost:2181] 73 END:INVOKE CallbackHandler 0, 
/PinotCluster/IDEALSTATES listener: 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: 
CALLBACK Took: 0ms
   2023/11/03 17:50:06.771 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Start processing IDEAL_STATE change
   2023/11/03 17:50:06.771 INFO [BrokerRoutingManager] 
[ClusterChangeHandlingThread] Processing segment assignment change
   2023/11/03 17:50:06.774 INFO [BaseInstanceSelector] 
[ClusterChangeHandlingThread] Got 1 new segments: 
{issuerrisk__0__8__20231103T0950Z=1699005006774} for table: issuerrisk_REALTIME 
by processing existing states, current time: 1699005006774
   2023/11/03 17:50:06.774 INFO [BrokerRoutingManager] 
[ClusterChangeHandlingThread] Processed segment assignment change in 3ms (fetch 
ideal state and external view stats for 1 tables: 0ms, update routing entry for 
1 tables ([issuerrisk_REALTIME]): 3ms)
   2023/11/03 17:50:06.774 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish handling IDEAL_STATE change for handler: 
BrokerRoutingManager in 3ms
   2023/11/03 17:50:06.774 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish processing IDEAL_STATE change in 3ms
   2023/11/03 17:50:06.876 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003376,table=issuerrisk,timeMs=0,docs=1000000/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/1/0/0/0,consumingFreshnessTimeMs=1699005001041,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:06.998 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003377,table=issuerrisk,timeMs=1,docs=740101/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/0/0/0/0,consumingFreshnessTimeMs=1699005002130,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,256,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.094 INFO [CallbackHandler] 
[ZkClient-EventThread-73-localhost:2181] 73 START: CallbackHandler 1, INVOKE 
/PinotCluster/EXTERNALVIEW listener: 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: 
CALLBACK
   2023/11/03 17:50:07.094 INFO [ClusterChangeMediator] 
[ZkClient-EventThread-73-localhost:2181] Enqueueing EXTERNAL_VIEW change
   2023/11/03 17:50:07.094 INFO [CallbackHandler] 
[ZkClient-EventThread-73-localhost:2181] 73 END:INVOKE CallbackHandler 1, 
/PinotCluster/EXTERNALVIEW listener: 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: 
CALLBACK Took: 0ms
   2023/11/03 17:50:07.094 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Start processing EXTERNAL_VIEW change
   2023/11/03 17:50:07.094 INFO [BrokerRoutingManager] 
[ClusterChangeHandlingThread] Processing segment assignment change
   2023/11/03 17:50:07.096 INFO [BaseInstanceSelector] 
[ClusterChangeHandlingThread] Got 0 new segments: {} for table: 
issuerrisk_REALTIME by processing existing states, current time: 1699005007096
   2023/11/03 17:50:07.096 INFO [BrokerRoutingManager] 
[ClusterChangeHandlingThread] Processed segment assignment change in 2ms (fetch 
ideal state and external view stats for 1 tables: 1ms, update routing entry for 
1 tables ([issuerrisk_REALTIME]): 1ms)
   2023/11/03 17:50:07.096 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: 
BrokerRoutingManager in 2ms
   2023/11/03 17:50:07.096 INFO [HelixExternalViewBasedQueryQuotaManager] 
[ClusterChangeHandlingThread] Start processing qps quota change.
   2023/11/03 17:50:07.096 INFO [HelixExternalViewBasedQueryQuotaManager] 
[ClusterChangeHandlingThread] No qps quota change: external view for broker 
resource remains the same.
   2023/11/03 17:50:07.096 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: 
HelixExternalViewBasedQueryQuotaManager in 0ms
   2023/11/03 17:50:07.096 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish processing EXTERNAL_VIEW change in 2ms
   2023/11/03 17:50:07.109 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003378,table=issuerrisk,timeMs=2,docs=875721/1873069,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007107,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,2,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.218 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003379,table=issuerrisk,timeMs=1,docs=999998/1879272,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007217,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.304 INFO [CallbackHandler] 
[ZkClient-EventThread-73-localhost:2181] 73 START: CallbackHandler 1, INVOKE 
/PinotCluster/EXTERNALVIEW listener: 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: 
CALLBACK
   2023/11/03 17:50:07.304 INFO [ClusterChangeMediator] 
[ZkClient-EventThread-73-localhost:2181] Enqueueing EXTERNAL_VIEW change
   2023/11/03 17:50:07.304 INFO [CallbackHandler] 
[ZkClient-EventThread-73-localhost:2181] 73 END:INVOKE CallbackHandler 1, 
/PinotCluster/EXTERNALVIEW listener: 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator@465fefa9 type: 
CALLBACK Took: 0ms
   2023/11/03 17:50:07.304 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Start processing EXTERNAL_VIEW change
   2023/11/03 17:50:07.304 INFO [BrokerRoutingManager] 
[ClusterChangeHandlingThread] Processing segment assignment change
   2023/11/03 17:50:07.309 INFO [BaseInstanceSelector] 
[ClusterChangeHandlingThread] Got 0 new segments: {} for table: 
issuerrisk_REALTIME by processing existing states, current time: 1699005007309
   2023/11/03 17:50:07.309 INFO [BrokerRoutingManager] 
[ClusterChangeHandlingThread] Processed segment assignment change in 5ms (fetch 
ideal state and external view stats for 1 tables: 2ms, update routing entry for 
1 tables ([issuerrisk_REALTIME]): 3ms)
   2023/11/03 17:50:07.309 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: 
BrokerRoutingManager in 5ms
   2023/11/03 17:50:07.309 INFO [HelixExternalViewBasedQueryQuotaManager] 
[ClusterChangeHandlingThread] Start processing qps quota change.
   2023/11/03 17:50:07.309 INFO [HelixExternalViewBasedQueryQuotaManager] 
[ClusterChangeHandlingThread] No qps quota change: external view for broker 
resource remains the same.
   2023/11/03 17:50:07.309 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish handling EXTERNAL_VIEW change for handler: 
HelixExternalViewBasedQueryQuotaManager in 0ms
   2023/11/03 17:50:07.309 INFO [ClusterChangeMediator] 
[ClusterChangeHandlingThread] Finish processing EXTERNAL_VIEW change in 5ms
   2023/11/03 17:50:07.326 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003380,table=issuerrisk,timeMs=1,docs=999998/1886050,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007326,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.434 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003381,table=issuerrisk,timeMs=1,docs=999999/1895348,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007433,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.545 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003382,table=issuerrisk,timeMs=1,docs=999998/1907060,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007544,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.652 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003383,table=issuerrisk,timeMs=1,docs=1000000/1910088,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007644,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   
   ```
   Wrong number logs are these two lines.
   ```
   2023/11/03 17:50:06.998 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003377,table=issuerrisk,timeMs=1,docs=740101/1869645,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):8/8/4/0/0/0/0,consumingFreshnessTimeMs=1699005002130,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,1,256,0,1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   2023/11/03 17:50:07.109 INFO [QueryLogger] 
[jersey-server-managed-async-executor-7] 
requestId=621783916000003378,table=issuerrisk,timeMs=2,docs=875721/1873069,entries=0/0,segments(queried/processed/matched/consumingQueried/consumingProcessed/consumingMatched/unavailable):9/9/5/1/0/0/0,consumingFreshnessTimeMs=1699005007107,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);192.168.31.226_R=0,2,264,0,0,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,clientIp=unknown,query=select
 count(*) as count from issuerrisk
   
   ```
   
   
   Here is my table definition.
   ```json
   {
       "tableName": "issuerrisk",
       "tableType": "REALTIME",
       "segmentsConfig": {
           "schemaName": "issuerrisk",
           "timeColumnName": "UpdatedTime",
           "allowNullTimeValue": false,
           "replicasPerPartition": "1",
           "completionConfig": {
               "completionMode": "DOWNLOAD"
           }
       },
       "tableIndexConfig": {
           "invertedIndexColumns": [],
           "sortedColumn": [],
           "noDictionaryColumns": [
               "JTD1",
               "JTD2",
               "JTD3",
               "JTD4",
               "JTD5",
               "JTD6",
               "JTD7",
               "JTD8",
               "JTD9",
               "JTD10",
               "Content"
           ],
           "loadMode": "MMAP",
           "nullHandlingEnabled": false
       },
       "ingestionConfig": {
           "streamIngestionConfig": {
               "streamConfigMaps": [
                   {
                       "streamType": "kafka",
                       "stream.kafka.consumer.type": "lowlevel",
                       "stream.kafka.topic.name": "issuerrisk",
                       "stream.kafka.decoder.class.name": 
"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
                       "stream.kafka.consumer.factory.class.name": 
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
                       "stream.kafka.broker.list": "localhost:9092",
                       "realtime.segment.flush.threshold.rows": "0",
                       "realtime.segment.flush.threshold.time": "24h",
                       "realtime.segment.flush.threshold.segment.size": "50M",
                       "stream.kafka.consumer.prop.auto.offset.reset": "largest"
                   }
               ]
           }
       },
       "tenants": {},
       "metadata": {},
       "upsertConfig": {
           "mode": "FULL",
           "comparisonColumn": "UpdatedTime"
       },
       "routing": {
           "instanceSelectorType": "strictReplicaGroup"
       }
   }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to