[
https://issues.apache.org/jira/browse/HADOOP-17453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17275771#comment-17275771
]
Claus Stadler commented on HADOOP-17453:
----------------------------------------
I am facing IndexOutOfBounds exceptions with my custom RecordReader on hadoop
common 2.8.5 and I am also inclined to think this is a major bug in this line.
If I understand this codec stuff correctly (not claiming I do) reading decoded
data (e.g. text) with READ_MODE.BY_BLOCK mode backed by an encoded stream (e.g.
bzip2) should make a read on the decoded stream return when the backing stream
hits its set boundary (typically the split end); and this mechanism is referred
to as "advertise".
But before my repeated reads actually hit the split boundary I get an
IndexOutOfBounds exception - apparently because my buffer's length is at some
point less than 2 * offset + 1 - huh?
> BZip2Codec incorrectly throws IndexOutOfBoundsException: offs(X) + len(X+1) >
> dest.length(Y).
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-17453
> URL: https://issues.apache.org/jira/browse/HADOOP-17453
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Affects Versions: 3.1.2
> Reporter: Christian Asmussen
> Priority: Major
>
> In org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream
> around line 496 seems to mistakenly add the offset to the length.
> {noformat}
> if (this.posSM == POS_ADVERTISEMENT_STATE_MACHINE.ADVERTISE) {
> result = this.input.read(b, off, off + 1 << HERE);
> {noformat}
> Here's a reference
> [BZip2Codec.java:L496|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/BZip2Codec.java#L496]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]