agnes-xinyi-lu opened a new issue, #6911:
URL: https://github.com/apache/iceberg/issues/6911

   ### Apache Iceberg version
   
   None
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   We started getting this exception in some of our UTs after upgrading to 
1.1.0. Basically in the test we use a string field in the partition spec, and 
provide some string partition values like it's converted by a 
datetime("2020-20-20") write was fine, but read will throw exceptions like this:
       java.time.format.DateTimeParseException: Text '2021-20-20' could not be 
parsed: Invalid value for MonthOfYear (valid values 1 - 12): 20
           at 
java.time.format.DateTimeFormatter.createError(DateTimeFormatter.java:1920)
           at 
java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1855)
           at java.time.LocalDate.parse(LocalDate.java:400)
           at 
org.apache.iceberg.expressions.Literals$StringLiteral.to(Literals.java:495)
           at 
org.apache.iceberg.expressions.ExpressionUtil.sanitizeString(ExpressionUtil.java:380)
           at 
org.apache.iceberg.expressions.ExpressionUtil.sanitize(ExpressionUtil.java:320)
           at 
org.apache.iceberg.expressions.ExpressionUtil.access$300(ExpressionUtil.java:38)
           at 
org.apache.iceberg.expressions.ExpressionUtil$StringSanitizer.predicate(ExpressionUtil.java:269)
           at 
org.apache.iceberg.expressions.ExpressionUtil$StringSanitizer.predicate(ExpressionUtil.java:197)
           at 
org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:347)
           at 
org.apache.iceberg.expressions.ExpressionVisitors.visit(ExpressionVisitors.java:366)
           at 
org.apache.iceberg.expressions.ExpressionUtil.toSanitizedString(ExpressionUtil.java:82)
           at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:142)
           at org.apache.iceberg.DataTableScan.planFiles(DataTableScan.java:27)
   
   
   The reason is in 
[planFiles](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/SnapshotScan.java#:~:text=ExpressionUtil.toSanitizedString(filter()))),
 logInfo uses sanitizedString to log the filter. But when user defines the 
field as string type, Iceberg shouldn't assume it to follow any pattern right? 
Even if it's an invalid date/month/year, it should still work. And it doesn't 
feel right to throw in logInfo.
   
   In the latest master both SnapshotScan and BaseAllMetadataTableScan have 
this check in the log, we can probably change it to use ExpressionParser.ToJson 
instead. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to