Re: Help with multi-lang searches

2018-10-22 Thread Tim Casey
Hi Sambhav, Calculate the percentage of letter pairs per language in the index. Given the letter pairs in the incoming token, find the closest "match" for the languages in the indexes. Even on a small number of tokens you will get close to the intended language. You can also calculate the "sourc

Re: Help with multi-lang searches

2018-10-22 Thread Alexandre Rafalovitch
Additional possibilities: 1) omitNorms and maybe omitTermFreqAndPositions for the fields to avoid frequency of term mattering http://lucene.apache.org/solr/guide/7_5/defining-fields.html#optional-field-type-override-properties 2) Constant score: http://lucene.apache.org/solr/guide/7_5/the-standard-

Help with multi-lang searches

2018-10-22 Thread Sambhav Kothari (BLOOMBERG/ LONDON)
Hi, We have a problem with searches with multiple languages. Our schema looks something like this: field_en = English content for field field_es = Spanish field_it = Italian etc. When a user searches for a keyword, e.g.: "brexit" it can also specify several languages s/he wants to