OK - please disregard; I found a rogue new component in our analyzer
that was messing everything up.
The hunspell behavior was perhaps a little confusing, but I don't
believe it leads to broken queries.
-Mike
On 11/18/2014 02:38 PM, Michael Sokolov wrote:
followup - hunspell has:
follow/SDRZGJ
follower/M
following/M
follow/G generates following
I guess the reason for the /M entries is to represent the nouns, which
have plural endings, so that
following->followings
-- I'm not really sure where the bug is, but it seems as if generating
multiple "stems" causes issues
On 11/18/2014 02:33 PM, Michael Sokolov wrote:
I find that a query for stemmed terms sometimes fails with the
edismax query parser and hunspell stemmer. Looklng at the output of
analysis for the query (text:following) I can see that it generates
two different terms at the same position: "follow" and "following".
Then edismax seems to generate a sloppy phrase query from that; in
the debug output of the query I can see ( text:following
text:follow)~2. This doesn't match anything, even though both the
words follow and following (as well as followed, follows, etc) both
occur in various documents.
First, I'm confused as to what the source of the sloppy query is.
Here are the relevant settings from solrconfig:
<str name="defType">edismax</str>
<str name="qf">archive_id^1 author^20 chapter_title^15 isbn^1
publisher^5 subjects^5 text^1 title^120</str>
<str name="pf">chapter_title~2^1 subjects~2^20 text~10^1 title~2^4</str>
<str name="mm">100%</str>
<str name="q.op">OR</str>
Is there some process that generates a slop query for co-occurring terms?
As an aside, the same query returns a document when we use the lucene
query parser: it matches one document. But when I search across our
unstemmed field, it returns more. It appears as if
It seems as if when hunspell returns multiple terms from a single
one, this causes problems?
So in summary: why would hunspell generate "following" as a stem for
"following"? Probably just a buggy dictionary entry; we could fix
that, but I wouldn't expect the phrase behavior in that case from
edismax either. Can anybody shed some light as to what's going on here?
Thanks
-Mike