Hello, group.
I'm trying to create a search facility for documents in "broken"
Polish (by broken I mean "not language rules compliant"), searchable
by terms in "broken" Polish, but broken in many other ways than
documents. See this example:
document text: "włatcy móch" (in proper Polish this would be "władcy
much")
example terms that should match: "włatcy much", "wlatcy moch", "wladcy
much"
This double brokeness ruled out any Polish stemmers currently
available for Lucene and now I am at point 0. The search results do
not have to be 100% accurate - some missing results are acceptable,
but "false positives" are not. Is it at all possible using machinery
provided by Solr (I do not own PHD in liguistics), or should I ask the
business for lowering their expectations?
--
We read Knuth so you don't have to. - Tim Peters
Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]