[ https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552861#comment-17552861 ]
Robert Muir commented on LUCENE-10610: -------------------------------------- Also i honestly think the current hashcode based on points.length could be considered good enough for a hashcode. The example above is two exact same automatons (same shape), exact same number of states (same string length), exact same number of unique symbols/ranges (points.length). I personally feel a collision is OK, from automaton perspective, due to how similar they are. And especially since stuff like PrefixQuery doesn't even use this hashcode anyway. But again, if you want to make it more refined, substitute {{Arrays.hashCode(points)}} for {{points.length}}. It shouldnt be much slower. > RunAutomaton#hashCode() can easily cause hash collision for different > Automatons > -------------------------------------------------------------------------------- > > Key: LUCENE-10610 > URL: https://issues.apache.org/jira/browse/LUCENE-10610 > Project: Lucene - Core > Issue Type: Bug > Reporter: Tomoko Uchida > Priority: Minor > > Current RunAutomaton#hashCode() is: > {code:java} > @Override > public int hashCode() { > final int prime = 31; > int result = 1; > result = prime * result + alphabetSize; > result = prime * result + points.length; > result = prime * result + size; > return result; > } > {code} > Since it does not take account of the contents of the {{points}} array, this > returns the same value for different automatons when their alphabet size and > state size are the same. > For example, this test code passes. > {code:java} > public void testHashCode() throws IOException { > PrefixQuery q1 = new PrefixQuery(new Term("field", "aba")); > PrefixQuery q2 = new PrefixQuery(new Term("field", "fee")); > assert q1.compiled.runAutomaton.hashCode() == > q2.compiled.runAutomaton.hashCode(); > } > {code} > I suspect this is a bug? > Note that I think it's not a serious one; all callers of this {{hashCode()}} > take account of additional information when calculating their own hash value, > it seems there is no substantial impact on higher-level APIs. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org