Well, let me see. Your customers are telling you, in essence, "for any random input, you cannot return false positives". Which is nonsense, so I'd say you need to negotiate with your customers. I flat guarantee that, for any algorithm you try, you can write a counter-example in, oh, 15 seconds or so <G>.
I think the best you can hope for is "reasonable results", but getting your customers to agree to what is "reasonable" is...er... often a challenge. Frequently when confronted by "close but not perfect", customers aren't as unforgiving as their first position would indicate since the inconvenience of the not- quite-perfect results is often much less than people think when starting out. FuzzySearch tries to do some of this work for you, and that may be acceptable, as this is a common issue. But it'll never be perfect. You might get some joy from ngrams, but I haven't worked with it myself, just seen it recommended by people whose opinions I respect... Best Erick 2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]> > Hello, group. > > I'm trying to create a search facility for documents in "broken" Polish (by > broken I mean "not language rules compliant"), searchable by terms in > "broken" Polish, but broken in many other ways than documents. See this > example: > > document text: "włatcy móch" (in proper Polish this would be "władcy much") > example terms that should match: "włatcy much", "wlatcy moch", "wladcy > much" > > This double brokeness ruled out any Polish stemmers currently available for > Lucene and now I am at point 0. The search results do not have to be 100% > accurate - some missing results are acceptable, but "false positives" are > not. Is it at all possible using machinery provided by Solr (I do not own > PHD in liguistics), or should I ask the business for lowering their > expectations? > > -- > We read Knuth so you don't have to. - Tim Peters > > Jarek Zgoda, R&D, Redefine > [EMAIL PROTECTED] > >