mneedham opened a new pull request #7776: URL: https://github.com/apache/pinot/pull/7776
I wanted to import a CSV file that contains a DateTime field. The CSV file looks like this: ``` ID,Date 10224738,09-05-2015T09:58:00 ``` And then the schema file: ``` { "schemaName": "crimes", "dimensionFieldSpecs": [ { "name": "ID", "dataType": "INT" } ], "dateTimeFieldSpecs": [{ "name": "Date", "dataType": "STRING", "format" : "1:SECONDS:SIMPLE_DATE_FORMAT:MM-dd-yyyy'T'HH:mm:ss", "granularity": "1:HOURS" }] } ``` But we get this error when running the ingestion job: ``` 2021/11/16 11:37:50.382 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1] Failed to generate Pinot segment for file - file:/data/mark.csv java.lang.IllegalArgumentException: null at shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8] at org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:268) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8] at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:258) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8] at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:119) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8] at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263) ~[pinot-batch-ingestion-standalone-0.9.0-SNAPSHOT-shaded.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] ``` And the issue is that the min and max times don't pass the `isValidSegmentName` function that was added to `SimpleSegmentNameGenerator` in https://github.com/apache/pinot/pull/7085. The min and max values are both `09-05-2015T09:58:00` and the issue is that they have the : in their name, but we would have the same issue with other characters that may appear in date fields, such as a space or forward slash. This PR replaces those problematic characters inside `SimpleSegmentNameGenerator` before the `isValidSegmentName` check. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org