Tomoko Uchida created LUCENE-10610: -------------------------------------- Summary: RunAutomaton#hashCode() can easily cause hash collision for different Automatons Key: LUCENE-10610 URL: https://issues.apache.org/jira/browse/LUCENE-10610 Project: Lucene - Core Issue Type: Bug Reporter: Tomoko Uchida
Current RunAutomaton#hashCode() is: {code:java} @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + alphabetSize; result = prime * result + points.length; result = prime * result + size; return result; } {code} Since it does not take account of the contents of the {{points}} array, this returns the same value for different automatons when their alphabet size and state size are the same. For example, this test code passes. {code:java} public void testHashCode() throws IOException { PrefixQuery q1 = new PrefixQuery(new Term("field", "aba")); PrefixQuery q2 = new PrefixQuery(new Term("field", "fee")); assert q1.compiled.runAutomaton.hashCode() == q2.compiled.runAutomaton.hashCode(); } {code} I suspect this is a bug? Note that I think it's not a serious one; all callers of this {{hashCode()}} take account of additional information when calculating their own hash value, it seems there is no substantial impact on higher-level APIs. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org