msfroh commented on code in PR #12987: URL: https://github.com/apache/lucene/pull/12987#discussion_r1449507398
########## lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java: ########## @@ -329,9 +330,15 @@ public BytesRef apply(int docID) { } catch (ParseException pe) { throw new CorruptIndexException("failed to parse int length", in, pe); } - term.grow(len); - term.setLength(len); - in.readBytes(term.bytes(), 0, len); + termByteArray.grow(len); + termByteArray.setLength(len); + in.readBytes(termByteArray.bytes(), 0, len); + if (len > 2) { + term.copyBytes( + SimpleTextUtil.fromBytesRefString(termByteArray.get().utf8ToString())); + } else { + term.setLength(0); + } Review Comment: ~~The issue is that on the writing side, if `stringVal == null`, we write the length as `0` and don't output anything. So, `termByteArray.get().utf8ToString()` is the empty string and `fromBytesRefString` throws an exception on that. It was showing up as a unit test failure.~~ ~~I added this condition here, but I suppose it would also be fine to output `[]` followed by `F` for missing binary values (though we'll waste a couple of bytes). Of course, now I'm wondering how we get here if the doc doesn't have the given field...~~ Nope... after debugging it, it wasn't the `null` case (which seems to work fine and skips any doc where the "has value" is `F`). It turns out that `fromBytesRefString` **can't** read empty arrays. The logic extracts from within the `[]`, then tries to split by `,`. The result is the empty string, so it thinks it has 1 "part". I'll follow up with a fix for `fromBytesRefString` instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org