gortiz commented on code in PR #15919: URL: https://github.com/apache/pinot/pull/15919#discussion_r2137677116
########## pinot-common/src/main/java/org/apache/pinot/common/datablock/ZeroCopyDataBlockSerde.java: ########## @@ -254,21 +265,37 @@ private long calculateEndOffset(DataBuffer buffer, Header header) { return currentOffset; } + /// Deserializes the exceptions and metadata from the stream. @VisibleForTesting - static Map<Integer, String> deserializeExceptions(PinotInputStream stream, Header header) + static ErrorsAndMetadata deserializeExceptions(PinotInputStream stream, Header header) throws IOException { if (header._exceptionsLength == 0) { - return new HashMap<>(); + return ErrorsAndMetadata.EMPTY; } + long currentOffset = header.getExceptionsStart(); + stream.seek(header.getExceptionsStart()); int numExceptions = stream.readInt(); - Map<Integer, String> exceptions = new HashMap<>(HashUtil.getHashMapCapacity(numExceptions)); + // We reserve extra space for the fake error codes storing stageId, workerId and serverId + Map<Integer, String> exceptions = new HashMap<>(HashUtil.getHashMapCapacity(numExceptions + 3)); for (int i = 0; i < numExceptions; i++) { int errCode = stream.readInt(); String errMessage = stream.readInt4UTF(); exceptions.put(errCode, errMessage); } - return exceptions; + + long readOffset = stream.getCurrentOffset() - currentOffset; + if (readOffset >= header._exceptionsLength) { + return new ErrorsAndMetadata(exceptions, -1, -1, ""); + } + int errorMetadataVersion = stream.readInt(); + if (errorMetadataVersion != ERROR_METADATA_VERSION) { + return new ErrorsAndMetadata(exceptions, -1, -1, ""); + } + int stageId = stream.readInt(); + int workerId = stream.readInt(); + String serverId = stream.readInt4UTF(); Review Comment: I'm using "". I think it is better (may avoid NPEs) and simplifies the serialized code, as it is simpler to write an empty string than a null value. We can then transform this empty string into null when converting it to a block if needed ########## pinot-common/src/main/java/org/apache/pinot/common/datablock/ZeroCopyDataBlockSerde.java: ########## @@ -254,21 +265,37 @@ private long calculateEndOffset(DataBuffer buffer, Header header) { return currentOffset; } + /// Deserializes the exceptions and metadata from the stream. @VisibleForTesting - static Map<Integer, String> deserializeExceptions(PinotInputStream stream, Header header) + static ErrorsAndMetadata deserializeExceptions(PinotInputStream stream, Header header) throws IOException { if (header._exceptionsLength == 0) { - return new HashMap<>(); + return ErrorsAndMetadata.EMPTY; } + long currentOffset = header.getExceptionsStart(); + stream.seek(header.getExceptionsStart()); int numExceptions = stream.readInt(); - Map<Integer, String> exceptions = new HashMap<>(HashUtil.getHashMapCapacity(numExceptions)); + // We reserve extra space for the fake error codes storing stageId, workerId and serverId + Map<Integer, String> exceptions = new HashMap<>(HashUtil.getHashMapCapacity(numExceptions + 3)); for (int i = 0; i < numExceptions; i++) { int errCode = stream.readInt(); String errMessage = stream.readInt4UTF(); exceptions.put(errCode, errMessage); } - return exceptions; + + long readOffset = stream.getCurrentOffset() - currentOffset; + if (readOffset >= header._exceptionsLength) { + return new ErrorsAndMetadata(exceptions, -1, -1, ""); Review Comment: We cannot use EMPTY, as exceptions could be non-empty. ########## pinot-common/src/main/java/org/apache/pinot/common/datablock/MetadataBlock.java: ########## @@ -69,6 +71,17 @@ public static MetadataBlock newEosWithStats(List<DataBuffer> statsByStage) { public MetadataBlock(List<DataBuffer> statsByStage) { Review Comment: > Is there scenario where stageId and workerId are unavailable? Yes, MetadataBlock is used to serialize any EOS, including those that succeed and those that don't. But stageId and workerId are only used on error blocks. We could include them in successful blocks as well, but that won't be very useful and would make SuccessMseBlock stateful, which is not great given they can be used as singletons now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org