dhruv-pratap commented on code in PR #13108: URL: https://github.com/apache/iceberg/pull/13108#discussion_r2100663158
########## parquet/src/main/java/org/apache/iceberg/parquet/ParquetReader.java: ########## @@ -120,18 +125,27 @@ public boolean hasNext() { @Override public T next() { - if (valuesRead >= nextRowGroupStart) { - advance(); - } - - if (reuseContainers) { - this.last = model.read(last); - } else { - this.last = model.read(null); + try { + if (valuesRead >= nextRowGroupStart) { + advance(); + } + + if (reuseContainers) { + this.last = model.read(last); + } else { + this.last = model.read(null); + } + valuesRead += 1; + + return last; + } catch (ParquetDecodingException e) { + if (reader != null) { + // Knowing the exact parquet file is essential for tracing bad nodes + // that produced the corrupt file, parquet lib doesn't do this today. + LOG.error("Error decoding Parquet file {}", reader.getFile(), e); Review Comment: My bad, it indeed does. Unfortunately though `org.apache.parquet.hadoop.ParquetFileReader` encapsulates the `InputFile`, and only exposes `getFile()` to get the parquet file location. There is also `getPath()` but that has been marked as deprecated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org