[ 
https://issues.apache.org/jira/browse/SOLR-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Provalov updated SOLR-14293:
---------------------------------
    Description: 
I noticed a weird payload behavior with Solr 6.3.0, also 7.7.2 and 8.3.1.  
After writing the Lucene62Codec specific unit test (see attached, also can be 
run with the later versions) I think there could be a bug which allows for the 
same term payloads to be written into another document's same term payload (or 
the second payload for the second document not being read correctly).  
  
 For comparison, I added SimpleTextCodec which doesn't behave this way. 
  
 For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to 
MultiTerms.getTermPostingsEnum(...).
  
 Thanks to Alan Woodward, I made the necessary changes to the analyzer to 
address the sharing of the TokenStreamComponents which was used in the 
TestPayloads class.  Now I use non-mocked tokenizer and a new filter which 
would create a random payload (see attached).  So, doc one and two will have 
the same token, but different payloads.  

Same idea, SimpleTextCodec passes the test, but these ones don't:

Lucene50Codec;
 Lucene54Codec;
 Lucene62Codec;
 Lucene70Codec;
 Lucene80Codec; 
  
  

  was:
I noticed a weird payload behavior with Solr 6.3.0, also 7.7.2 and 8.3.1.  
After writing the Lucene62Codec specific unit test (see attached, also can be 
run with the later versions) I think there could be a bug which allows for the 
same term payloads to be written into another document's same term payload (or 
the second payload for the second document not being read correctly).  
  
 For comparison, I added SimpleTextCodec which doesn't behave this way. 
  
 For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to 
MultiTerms.getTermPostingsEnum(...).
  
 Thanks to Alan Woodward, I made the necessary changes to the analyzer to 
address the sharing of the TokenStreamComponents which was used in the 
TestPayloads class.  Now I use non-mocked tokenizer and a new filter which 
would create a random payload (see attached).  So, doc one and two will have 
the same token, but different payloads.  

Same idea, SimpleTextCodec passes the test, but these ones don't:

Lucene50Codec;
Lucene54Codec;
Lucene62Codec;
Lucene70Codec;
Lucene80Codec; 
  
  


> Payloads Are Written or Read Incorrectly - Across the Documents
> ---------------------------------------------------------------
>
>                 Key: SOLR-14293
>                 URL: https://issues.apache.org/jira/browse/SOLR-14293
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>    Affects Versions: 5.1, 5.5.5, 6.3, 7.7.2, 8.3.1
>            Reporter: Ivan Provalov
>            Priority: Critical
>              Labels: codec, format, payload, postings, reader, writer
>         Attachments: TestPayloads.java
>
>
> I noticed a weird payload behavior with Solr 6.3.0, also 7.7.2 and 8.3.1.  
> After writing the Lucene62Codec specific unit test (see attached, also can be 
> run with the later versions) I think there could be a bug which allows for 
> the same term payloads to be written into another document's same term 
> payload (or the second payload for the second document not being read 
> correctly).  
>   
>  For comparison, I added SimpleTextCodec which doesn't behave this way. 
>   
>  For 8.3.1, you will need to change MultiFields.getTermPositionsEnum(...) to 
> MultiTerms.getTermPostingsEnum(...).
>   
>  Thanks to Alan Woodward, I made the necessary changes to the analyzer to 
> address the sharing of the TokenStreamComponents which was used in the 
> TestPayloads class.  Now I use non-mocked tokenizer and a new filter which 
> would create a random payload (see attached).  So, doc one and two will have 
> the same token, but different payloads.  
> Same idea, SimpleTextCodec passes the test, but these ones don't:
> Lucene50Codec;
>  Lucene54Codec;
>  Lucene62Codec;
>  Lucene70Codec;
>  Lucene80Codec; 
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to