Re: Solr - Make Exact Search on Field with Fuzzy Query

Erick Erickson Wed, 10 Oct 2012 05:52:09 -0700

There's nothing really built in to Solr to allow this. Are you
absolutely sure you can't just use the copyfield? Have you
actually tried it?

But I don't think you need to store the contents twice. Just
store it once and always highlight on that field whether you
search it or not. Since it's the raw text, you should be fine.
You'll have two versions of the field tokenized of course, but
that should take less space than you might think. You
probably want to store the version with the stemming turned on...

That said, storing twice only uses up some disk space, it
doesn't require additional memory for searching. So unless
you're running out of disk space you can just keep two stored
versions around.

But

If none of that works you might write a custom filter that
emits two tokens for each input token at indexing
time, similar to what synonyms do. The original should
have some special character appended, say $ and the
second should be the results of stemming (note, there
will be two tokens even if there is no stemming done).
So, indexing "running" would index "running$" and "run".
Now, when you need to search for an exact match on
running, you search for running$.

This works for the reverse too. Since the rule is "append
$ to all original tokens" "run" gets indexed as "run$" and "run".
Now, searching for "run" matches as does "run$". But
"run$" does not match the doc that had "running" since the two
tokens emitted in that case are "run" and "running$".

But look at what's happened here. You're indexing two tokens
for every one token in the input. Furthermore, you're adding
a bunch of unique tokens to the index. It's hard to see how this
results in any savings over just using copyField. You have
to index the two tokens since you have to distinguish between
the stemmed and un-stemmed version.

You might be able to do something really exotic with payloads.
This is _really_ out of left field, but it just occurred to me. You'd
have to define a transformation from the original word into the
stemmed word that created a unique value. Something like
no stemming -> 0
removing ing -> 1
removing s    -> 2

etc. Actually, this would have to be some kind of function on the
letters removed so that removing "ing" mapped to, say,
the ordinal position of the letter in the alphabet * position * 100. So
"ing" would map to 'i' - 'a' + ('n' - 'a') * 100 + ('g' - 'a') * 10000 etc...
(you'd have to take considerable care to get this right for any
code sets that had more than 100 possible code points)...
Now, you've included the information about what the original
word was and could use the payload to fail to match in the
exact-match case. Of course the other issue would be to figure
out the syntax to get the fact that you wanted an exact match
down into your custom scorer.

But as you can see, any scheme is harder than just flipping a switch,
so I'd _really_ verify that you can't just use copyField....

Best
Erick

On Wed, Oct 10, 2012 at 7:38 AM, meghana <meghana.rav...@amultek.com> wrote:
>  0 down vote favorite
>
>
> We are using solr 3.6.
>
> We have field named Description. We want searching feature with stemming and
> also without stemming (exact word/phrase search), with highlighting in both
> .
>
> For that , we had made lot of research and come to conclusion, to use the
> copy field with data type which doesn't have stemming factory. it is working
> fine at now.
>
> (main field has stemming and copy field has not.)
>
> The data for that field is very large and we are having millions of
> documents; and as we want, both searching and highlighting on them; we need
> to keep this copy field stored and indexed both. which will increase index
> size a lot.
>
> we need to eliminate this duplication if possible any how.
>
> From the recent research, we read that combining fuzzy search with dismax
> will fulfill our requirement. (we have tried a bit but not getting success.)
>
> Please let me know , if this is possible, or any other solutions to make
> this happen.
>
> Thanks in Advance
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr - Make Exact Search on Field with Fuzzy Query

Reply via email to