kirkrodrigues opened a new issue, #10130: URL: https://github.com/apache/pinot/issues/10130
When streaming data into a no-dictionary MV column, the segment fails to be built with the following exception: ``` 2023/01/14 03:28:20.982 ERROR [LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] [bug__0__0__20230114T0826Z] Could not build segment java.nio.BufferOverflowException: null at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:409) ~[?:?] at java.nio.ByteBuffer.put(ByteBuffer.java:914) ~[?:?] at org.apache.pinot.segment.local.io.writer.impl.VarByteChunkSVForwardIndexWriter.putBytes(VarByteChunkSVForwardIndexWriter.java:118) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba at org.apache.pinot.segment.local.segment.creator.impl.fwd.MultiValueFixedByteRawIndexCreator.putIntMV(MultiValueFixedByteRawIndexCreator.java:119) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca0 at org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:677) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074a at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:240) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba0 at org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:110) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475ba074af1d4d492b92 at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:903) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475b at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:814) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475 at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:713) [pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-ca86efca006453d407475 at java.lang.Thread.run(Thread.java:829) [?:?] 2023/01/14 03:28:21.003 ERROR [LLRealtimeSegmentDataManager_bug__0__0__20230114T0826Z] [bug__0__0__20230114T0826Z] Could not build segment for bug__0__0__20230114T0826Z ``` Glancing at the code, it seems like `MutableNoDictionaryColStatistics::getMaxNumberOfMultiValues` returns 0, whereas it should probably return `_dataSource.getDataSourceMetadata().getMaxNumValuesPerMVEntry()`, similar to `MutableColStatistics::getMaxNumberOfMultiValues`? (I may be totally wrong though, didn't look at all the code.) # Version ca86ef # Environment * OpenJDK 11.0.16 * Ubuntu 18.04 # Reproduction steps * Add the following schema to Pinot: ```json { "schemaName": "bug", "dimensionFieldSpecs": [ { "name": "integers", "dataType": "INT", "singleValueField": false } ], "dateTimeFieldSpecs": [ { "name": "timestamp", "dataType": "LONG", "format": "1:MILLISECONDS:EPOCH", "granularity": "1:MILLISECONDS" } ] } ``` * Add the following table to Pinot: ```json { "tableName": "bug", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "timestamp", "timeType": "MILLISECONDS", "schemaName": "bug", "replicasPerPartition": "1" }, "tenants": {}, "tableIndexConfig": { "noDictionaryColumns": [ "integers" ], "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "bug-topic", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "localhost:9876", "realtime.segment.flush.threshold.time": "3600000", "realtime.segment.flush.threshold.rows": "500000", "stream.kafka.consumer.prop.auto.offset.reset": "smallest" } }, "metadata": { "customConfigs": {} } } ``` * Ingest over a segment's worth of JSON records (500K+) containing the field `integers` into the table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org