[ 
https://issues.apache.org/jira/browse/LUCENE-10610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552943#comment-17552943
 ] 

Robert Muir commented on LUCENE-10610:
--------------------------------------

it is much more complicated. I really don't think we should do equals/hashcode 
on automaton, we've been down that road before and it gets bad. The problem has 
to do with things like determinization, minimization, etc.

originally equals was implemented by "sameLanguage" but it requires very 
heavy-duty stuff to compute. Otherwise, if you arent going to define it that 
way, equals is meaningless as there are zillions of possible automata that "do 
the same thing".

It isn't about mutability. I prefer for automaton to simply implement 
Object.equals/hashcode (identity) as it makes just as much sense.

> RunAutomaton#hashCode() can easily cause hash collision for different 
> Automatons
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-10610
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10610
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Tomoko Uchida
>            Priority: Minor
>
> Current RunAutomaton#hashCode() is:
> {code:java}
>   @Override
>   public int hashCode() {
>     final int prime = 31;
>     int result = 1;
>     result = prime * result + alphabetSize;
>     result = prime * result + points.length;
>     result = prime * result + size;
>     return result;
>   }
> {code}
> Since it does not take account of the contents of the {{points}} array, this 
> returns the same value for different automatons when their alphabet size and 
> state size are the same.
> For example, this test code passes.
> {code:java}
>   public void testHashCode() throws IOException {
>     PrefixQuery q1 = new PrefixQuery(new Term("field", "aba"));
>     PrefixQuery q2 = new PrefixQuery(new Term("field", "fee"));
>     assert q1.compiled.runAutomaton.hashCode() == 
> q2.compiled.runAutomaton.hashCode();
>   }
> {code}
> I suspect this is a bug?
> Note that I think it's not a serious one; all callers of this {{hashCode()}} 
> take account of additional information when calculating their own hash value, 
> it seems there is no substantial impact on higher-level APIs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to