msfroh commented on code in PR #12987:
URL: https://github.com/apache/lucene/pull/12987#discussion_r1449507398


##########
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java:
##########
@@ -329,9 +330,15 @@ public BytesRef apply(int docID) {
               } catch (ParseException pe) {
                 throw new CorruptIndexException("failed to parse int length", 
in, pe);
               }
-              term.grow(len);
-              term.setLength(len);
-              in.readBytes(term.bytes(), 0, len);
+              termByteArray.grow(len);
+              termByteArray.setLength(len);
+              in.readBytes(termByteArray.bytes(), 0, len);
+              if (len > 2) {
+                term.copyBytes(
+                    
SimpleTextUtil.fromBytesRefString(termByteArray.get().utf8ToString()));
+              } else {
+                term.setLength(0);
+              }

Review Comment:
   ~~The issue is that on the writing side, if `stringVal == null`, we write 
the length as `0` and don't output anything. So, 
`termByteArray.get().utf8ToString()` is the empty string and 
`fromBytesRefString` throws an exception on that. It was showing up as a unit 
test failure.~~
   
   ~~I added this condition here, but I suppose it would also be fine to output 
`[]` followed by `F` for missing binary values (though we'll waste a couple of 
bytes). Of course, now I'm wondering how we get here if the doc doesn't have 
the given field...~~
   
   Nope... after debugging it, it wasn't the `null` case (which seems to work 
fine and skips any doc where the "has value" is `F`). It turns out that 
`fromBytesRefString` **can't** read empty arrays. The logic extracts from 
within the `[]`, then tries to split by " ". The result is the empty string, so 
it thinks it has 1 "part".
   
   I'll follow up with a fix for `fromBytesRefString` instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to