On Fri, May 6, 2011 at 8:49 AM, Michael McCandless <luc...@mikemccandless.com> wrote:
> Shouldn't we have field types in the eg schema for the different > languages? Ie, text_zh, text_th, text_en, text_ja, text_nl, etc. In fact, until we break out dedicated language field types, shouldn't we default autophrase to off in Solr? I think this is what ElasticSearch does (just inherits Lucene's default for this) -- Shay, or any ElasticSearch users out there... can you confirm? Leaving autophrase on is catastrophic for non-whitespace languages (CJK and others), and at best iffy for whitespace languages (ie, unexpected that the QueryParser would make a PhraseQuery when user hadn't asked for one, not clear it really helps relevance for whitespace languages, definitely hurts performance), so leaving it is doing far more damage than good, as far as I can tell. Any objections to turning off autophrase by default in Solr, until we have per-language field types? Mike http://blog.mikemccandless.com