mneedham opened a new pull request, #9134:
URL: https://github.com/apache/pinot/pull/9134

   This is a bug fix for an issue I found when using the timestamp index with 
streaming data. 
   
   The problem is that the schema passed into the `LLRealTimeDataManager` (and 
then into `MutableSegmentImpl`) doesn't know about the extra timestamp fields.
   
   This means that when any rows are indexed they ignore the new fields and 
when Pinot tries to commit the segment we get this type of exception:
   
   ```
   java.lang.NullPointerException: null
     at 
org.apache.pinot.segment.spi.creator.ColumnIndexCreationInfo.getDistinctValueCount(ColumnIndexCreationInfo.java:67)
 
~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.init(SegmentColumnarIndexCreator.java:201)
 
~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:216)
 
~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at 
org.apache.pinot.segment.local.realtime.converter.RealtimeSegmentConverter.build(RealtimeSegmentConverter.java:123)
 
~[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentInternal(LLRealtimeSegmentDataManager.java:851)
 
[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.buildSegmentForCommit(LLRealtimeSegmentDataManager.java:778)
 
[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at 
org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:677)
 
[pinot-all-0.11.0-SNAPSHOT-jar-with-dependencies.jar:0.11.0-SNAPSHOT-0c1037ed90d75bb7cd95315cd6a6bdd00f34a6c2]
     at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   
   I have updated the streaming QuickStart to add the timestamp index. While 
doing that I had to change the value for `mtime` because there is another bug 
where Pinot runs the following expression when it tries to add the extra date 
columns:
   
   ```
   dateTrunc('DAY', '2022-07-29 11:18:23')
   ```
   
   Which doesn't work because the second parameter of this function needs to be 
a `LONG` value, which isn't yet the case as the `DataTypeTransformer` hasn't 
coerced the type. I'm not sure what the proper fix for that issue should be, so 
I'm working around it for the sake of this PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to