nandorKollar commented on PR #13880:
URL: https://github.com/apache/iceberg/pull/13880#issuecomment-3228534662
> > Why did they type of the vector change from IntVectors to
BaseVarWidthVectors?
>
> The vector changes because Dictionary encoded pages are a sequence of
ints, {1, 2, 3, 4} that refer to entries in the Dictionary which maps the int
to the actual column value. {1: "foo", 2: "bar", ....}. Other pages have
literal representations of the values stored as binary {foo, bar, bazz }. So
you have to switch vector types when you alternate.
>
> > If we clear out "this.vec" if it is set, wouldn't this type change in
the vector cause problems? Shouldn't we explicitly close the `this.vec` if it
is not null, before setting it to a new vector?
>
> No. To be clear, the code has _always_ cleared out this.vec and we dont'
have correctness issues because essentially what is happening is:
>
> 1. Reader looks to see if it can read the page
> 2. If it can't re-use the container do an allocate for the correct
container
>
> What is missing here is 2.a If I previously had a container but it cannot
be re-used, clear it
Thanks for clarifying why the type change happens, makes sense. We can't
reuse the vector, only when there's a switch from/to dictionary encoded pages,
right? When you mention, that it is always cleared, you mean the the value
count is set to 0 in this block:
```
if (reuse == null
|| (!dictEncoded && readType == ReadType.DICTIONARY)
|| (dictEncoded && readType != ReadType.DICTIONARY)) {
allocateFieldVector(dictEncoded);
nullabilityHolder = new NullabilityHolder(batchSize);
} else {
vec.setValueCount(0);
nullabilityHolder.reset();
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]