[ https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552844#comment-17552844 ]
Tomoko Uchida commented on LUCENE-10610: ---------------------------------------- bq. Let's please not go back to trying to implement hashcode/equals in Automaton. There are good reasons why we don't do that. Yes, I looked through Automaton class and understand it shouldn't be hash keys. bq. If possible, let's try to find/prevent anything using RunAutomaton.hashCode/equals too. RunAutomaton.hashCode() is (indirectly) used by AutomatonQuery.hashCode(). I'll check if it's possible to implement AutomatonQuery.hashCode() without depending CompiledAutomaton.hashCode() - it isn't called from anyware else. > RunAutomaton#hashCode() can easily cause hash collision for different > Automatons > -------------------------------------------------------------------------------- > > Key: LUCENE-10610 > URL: https://issues.apache.org/jira/browse/LUCENE-10610 > Project: Lucene - Core > Issue Type: Bug > Reporter: Tomoko Uchida > Priority: Minor > > Current RunAutomaton#hashCode() is: > {code:java} > @Override > public int hashCode() { > final int prime = 31; > int result = 1; > result = prime * result + alphabetSize; > result = prime * result + points.length; > result = prime * result + size; > return result; > } > {code} > Since it does not take account of the contents of the {{points}} array, this > returns the same value for different automatons when their alphabet size and > state size are the same. > For example, this test code passes. > {code:java} > public void testHashCode() throws IOException { > PrefixQuery q1 = new PrefixQuery(new Term("field", "aba")); > PrefixQuery q2 = new PrefixQuery(new Term("field", "fee")); > assert q1.compiled.runAutomaton.hashCode() == > q2.compiled.runAutomaton.hashCode(); > } > {code} > I suspect this is a bug? > Note that I think it's not a serious one; all callers of this {{hashCode()}} > take account of additional information when calculating their own hash value, > it seems there is no substantial impact on higher-level APIs. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org