On Fri, May 6, 2011 at 8:49 AM, Michael McCandless
<luc...@mikemccandless.com> wrote:

> Shouldn't we  have field types in the eg schema for the different
> languages?  Ie, text_zh, text_th, text_en, text_ja, text_nl, etc.

In fact, until we break out dedicated language field types, shouldn't
we default autophrase to off in Solr?

I think this is what ElasticSearch does (just inherits Lucene's
default for this) -- Shay, or any ElasticSearch users out there... can
you confirm?

Leaving autophrase on is catastrophic for non-whitespace languages
(CJK and others), and at best iffy for whitespace languages (ie,
unexpected that the QueryParser would make a PhraseQuery when user
hadn't asked for one, not clear it really helps relevance for
whitespace languages, definitely hurts performance), so leaving it is
doing far more damage than good, as far as I can tell.

Any objections to turning off autophrase by default in Solr, until we
have per-language field types?

Mike

http://blog.mikemccandless.com

Reply via email to