Tomoko Uchida created LUCENE-10610:
--------------------------------------

             Summary: RunAutomaton#hashCode() can easily cause hash collision 
for different Automatons
                 Key: LUCENE-10610
                 URL: https://issues.apache.org/jira/browse/LUCENE-10610
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Tomoko Uchida


Current RunAutomaton#hashCode() is:
{code:java}
  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + alphabetSize;
    result = prime * result + points.length;
    result = prime * result + size;
    return result;
  }
{code}
Since it does not take account of the contents of the {{points}} array, this 
returns the same value for different automatons when their alphabet size and 
state size are the same.

For example, this test code passes.
{code:java}
  public void testHashCode() throws IOException {
    PrefixQuery q1 = new PrefixQuery(new Term("field", "aba"));
    PrefixQuery q2 = new PrefixQuery(new Term("field", "fee"));
    assert q1.compiled.runAutomaton.hashCode() == 
q2.compiled.runAutomaton.hashCode();
  }
{code}
I suspect this is a bug?

Note that I think it's not a serious one; all callers of this {{hashCode()}} 
take account of additional information when calculating their own hash value, 
it seems there is no substantial impact on higher-level APIs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to