mikemccand commented on pull request #157:
URL: https://github.com/apache/lucene/pull/157#issuecomment-854760044


   Maybe another way to improve the checking for correctness in the randomized 
test (or maybe in a new randomized test) would be to randomly generate a set of 
strings from a limited alphabet, create the minimal automaton matching only 
those strings (we have a nice API to do that, efficiently, already), call 
flatten, and the confirm that the resulting output graph still accepts all the 
original strings?
   
   I.e., flatten should only ever "generalize" -- accepting strings that the 
original machine did not -- and never "remove" previously accepted strings?
   
   But I think one missing part for such a test would be an "Automaton to 
TokenStream" converter, i.e. a "serializer" from (acyclic) Automaton to 
TokenStream.  I think such a thing would not be too difficult to build, 
basically just topo sort the input graph (and throw exception if it has 
cycles), then emit the transitions as tokens.  The `posInc` attribute is 
guaranteed to never go negative because of the topo sort.  This would 
(separately) be a nice utility API to convert between these two things that are 
really nearly the same ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to