We help clients that perform index-time semantic expansion to hypernyms at
index time. For example, they will have a synonyms file that does the
following

wing_tips => wing_tips, dress_shoes, shoes
dress_shoes => dress_shoes, shoes
oxfords => oxfords, dress_shoes, shoes

Then at query time, we rely on differing IDF of these terms in the same
position to bring up the rare, specific terms matches, followed by
increasingly semantically broad matches. For example, Previously, a search
for wing_tips would get turned into "wing_tips OR dress_shoes OR shoes".
Shoes being very common would get scored lowest. Wing tips being very
specific would get scored very highly

( I have a blog post about this (which uses Elasticsearch)
http://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/
 )

As our clients upgrade to Solr 6 and above, we're noticing our technique no
longer works due to SynonymQuery, which blends the doc freq at query time
of synonyms at query time. SynonymQuery seems to be the right direction for
most people :) Still I would like to figure out how/if there's a setting
anywhere to return to the legacy behavior (a boolean query of term queries)
so I don't have to go back to the drawing board for clients that rely on
this technique.

I've been going through QueryBuilder and I don't see where we could go back
to the legacy behavior. It seems to be based on position overlap.

Thanks!
-Doug



-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Reply via email to