Hello, group.

I'm trying to create a search facility for documents in "broken" Polish (by broken I mean "not language rules compliant"), searchable by terms in "broken" Polish, but broken in many other ways than documents. See this example:

document text: "włatcy móch" (in proper Polish this would be "władcy much") example terms that should match: "włatcy much", "wlatcy moch", "wladcy much"

This double brokeness ruled out any Polish stemmers currently available for Lucene and now I am at point 0. The search results do not have to be 100% accurate - some missing results are acceptable, but "false positives" are not. Is it at all possible using machinery provided by Solr (I do not own PHD in liguistics), or should I ask the business for lowering their expectations?

--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]

Reply via email to