mneedham opened a new pull request #7776:
URL: https://github.com/apache/pinot/pull/7776


   I wanted to import a CSV file that contains a DateTime field. 
   
   The CSV file looks like this:
   
   ```
   ID,Date
   10224738,09-05-2015T09:58:00
   ```
   And then the schema file:
   
   ```
   {
       "schemaName": "crimes",
       "dimensionFieldSpecs": [
         {
           "name": "ID",
           "dataType": "INT"
         }
       ],
       "dateTimeFieldSpecs": [{
         "name": "Date",
         "dataType": "STRING",
         "format" : "1:SECONDS:SIMPLE_DATE_FORMAT:MM-dd-yyyy'T'HH:mm:ss",
         "granularity": "1:HOURS"
       }]
   }
   ```
     
   But we get this error when running the ingestion job:
   
   ```
   2021/11/16 11:37:50.382 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1] 
Failed to generate Pinot segment for file - file:/data/mark.csv
   java.lang.IllegalArgumentException: null
        at 
shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
 
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
        at 
org.apache.pinot.segment.spi.creator.name.SimpleSegmentNameGenerator.generateSegmentName(SimpleSegmentNameGenerator.java:53)
 
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
        at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:268)
 
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
        at 
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:258)
 
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
        at 
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:119)
 
~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
        at 
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263)
 
~[pinot-batch-ingestion-standalone-0.9.0-SNAPSHOT-shaded.jar:0.9.0-SNAPSHOT-540e70e9e3e24bdb2a14f56b2c1264180abaeda8]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
   ```
   
   And the issue is that the min and max times don't pass the 
`isValidSegmentName` function that was added to `SimpleSegmentNameGenerator`  
in https://github.com/apache/pinot/pull/7085. The min and max values are both 
`09-05-2015T09:58:00`  and the issue is that they have the : in their name, but 
we would have the same issue with other characters that may appear in date 
fields, such as a space or forward slash.
   
   This PR replaces those problematic characters inside 
`SimpleSegmentNameGenerator` before the `isValidSegmentName` check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to