Re: Advice on analysis/filtering?

Jarek Zgoda Thu, 16 Oct 2008 07:11:07 -0700

Wiadomość napisana w dniu 2008-10-16, o godz. 15:54, przez ErickErickson:

Well, let me see. Your customers are telling you, in essence,
"for any random input, you cannot return false positives". Which
is nonsense, so I'd say you need to negotiate with your
customers. I flat guarantee that, for any algorithm you try,
you can write a counter-example in, oh, 15 seconds or so <G>.

They came to such expectations seeing Solr's own Spellcheck at work -if it can suggest correct versions, it should be able to sanitizebroken words in documents and search them using sanitized input. Forme, this seemed reasonable request (of course, if this can be achievedreasonably abusing solr's spellcheck component).

FuzzySearch tries to do some of this work for you, and that may be
acceptable, as this is a common issue. But it'll never be
perfect.

You might get some joy from ngrams, but I haven't
worked with it myself, just seen it recommended by people
whose opinions I respect...


Thank you for these suggestions.

Best
Erick


2008/10/16 Jarek Zgoda <[EMAIL PROTECTED]>
Hello, group.
I'm trying to create a search facility for documents in "broken"Polish (by
broken I mean "not language rules compliant"), searchable by terms in
"broken" Polish, but broken in many other ways than documents. Seethis
example:
document text: "włatcy móch" (in proper Polish this would be"władcy much")example terms that should match: "włatcy much", "wlatcy moch","wladcy
much"
This double brokeness ruled out any Polish stemmers currentlyavailable forLucene and now I am at point 0. The search results do not have tobe 100%accurate - some missing results are acceptable, but "falsepositives" arenot. Is it at all possible using machinery provided by Solr (I donot own
PHD in liguistics), or should I ask the business for lowering their
expectations?

--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]


--
We read Knuth so you don't have to. - Tim Peters

Jarek Zgoda, R&D, Redefine
[EMAIL PROTECTED]

Re: Advice on analysis/filtering?

Reply via email to