kirkrodrigues opened a new issue, #10130:
URL: https://github.com/apache/pinot/issues/10130

   When streaming data into a no-dictionary MV column, the segment fails to be 
built with the following exception:
   
   ```
   2023/01/14 03:28:20.982 ERROR 
[LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] 
[bug__0__0__20230114T0826Z] Could not build segment 
   java.nio.BufferOverflowException: null
       at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:409) ~[?:?]
       at java.nio.ByteBuffer.put(ByteBuffer.java:914) ~[?:?]
       at 
org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter.putBytes(VarByteChunkSVForwardIndexWriter.java:118)
 
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba
       at 
org.apache.pinot.segment.local.segment.creator.impl.fwd.MultiValueFixedByteRawIndexCreator.putIntMV(MultiValueFixedByteRawIndexCreator.java:119)
 ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca0
       at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:677)
 
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074a
       at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:240)
 
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba0
       at 
org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:110)
 
~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074af1d4d492b92
       at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:903)
 
[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475b
       at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:814)
 
[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
       at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:713)
 
[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475
       at java.lang.Thread.run(Thread.java:829) [?:?]
   2023/01/14 03:28:21.003 ERROR 
[LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] 
[bug__0__0__20230114T0826Z] Could not build segment for 
bug__0__0__20230114T0826Z
   ```
   
   Glancing at the code, it seems like 
`MutableNoDictionaryColStatistics::getMaxNumberOfMultiValues` returns 0, 
whereas it should probably return 
`_dataSource.getDataSourceMetadata().getMaxNumValuesPerMVEntry()`, similar to 
`MutableColStatistics::getMaxNumberOfMultiValues`? (I may be totally wrong 
though, didn't look at all the code.)
   
   # Version
   
   ca86ef
   
   # Environment
   
   * OpenJDK 11.0.16
   * Ubuntu 18.04
   
   # Reproduction steps
   
   * Add the following schema to Pinot:
   
     ```json
     {
       "schemaName": "bug",
       "dimensionFieldSpecs": [
         {
           "name": "integers",
           "dataType": "INT",
           "singleValueField": false
         }
       ],
       "dateTimeFieldSpecs": [
         {
           "name": "timestamp",
           "dataType": "LONG",
           "format": "1:MILLISECONDS:EPOCH",
           "granularity": "1:MILLISECONDS"
         }
       ]
     }
     ```
   
   * Add the following table to Pinot:
   
     ```json
     {
       "tableName": "bug",
       "tableType": "REALTIME",
       "segmentsConfig": {
         "timeColumnName": "timestamp",
         "timeType": "MILLISECONDS",
         "schemaName": "bug",
         "replicasPerPartition": "1"
       },
       "tenants": {},
       "tableIndexConfig": {
         "noDictionaryColumns": [
           "integers"
         ],
         "loadMode": "MMAP",
         "streamConfigs": {
           "streamType": "kafka",
           "stream.kafka.consumer.type": "lowlevel",
           "stream.kafka.topic.name": "bug-topic",
           "stream.kafka.decoder.class.name": 
"org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
           "stream.kafka.consumer.factory.class.name": 
"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
           "stream.kafka.broker.list": "localhost:9876",
           "realtime.segment.flush.threshold.time": "3600000",
           "realtime.segment.flush.threshold.rows": "500000",
           "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
         }
       },
       "metadata": {
         "customConfigs": {}
       }
     }
     ```
    
   * Ingest over a segment's worth of JSON records (500K+) containing the field 
`integers` into the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to