problems when hunspell returns multiple stems

Michael Sokolov Tue, 18 Nov 2014 11:40:48 -0800

I find that a query for stemmed terms sometimes fails with the edismaxquery parser and hunspell stemmer. Looklng at the output of analysis forthe query (text:following) I can see that it generates two differentterms at the same position: "follow" and "following". Then edismax seemsto generate a sloppy phrase query from that; in the debug output of thequery I can see ( text:following text:follow)~2. This doesn't matchanything, even though both the words follow and following (as well asfollowed, follows, etc) both occur in various documents.

First, I'm confused as to what the source of the sloppy query is. Hereare the relevant settings from solrconfig:


<str name="defType">edismax</str>

<str name="qf">archive_id^1 author^20 chapter_title^15 isbn^1publisher^5 subjects^5 text^1 title^120</str>

<str name="pf">chapter_title~2^1 subjects~2^20 text~10^1 title~2^4</str>
<str name="mm">100%</str>
<str name="q.op">OR</str>

Is there some process that generates a slop query for co-occurring terms?

As an aside, the same query returns a document when we use the lucenequery parser: it matches one document. But when I search across ourunstemmed field, it returns more. It appears as if

It seems as if when hunspell returns multiple terms from a single one,this causes problems?

So in summary: why would hunspell generate "following" as a stem for"following"? Probably just a buggy dictionary entry; we could fix that,but I wouldn't expect the phrase behavior in that case from edismaxeither. Can anybody shed some light as to what's going on here?


Thanks

-Mike

problems when hunspell returns multiple stems

Reply via email to