Hi,

I know this topic has been treated many times in the (distant) past, but I 
wonder whether there are new better practices/tendencies.

In my application, I'm dealing with documents in different languages. Each 
document is monolingual; it has some fields containing free text and a set of 
fields that do not require any text analysis. For the free text, we need to 
make a specific analysis based of the language of the document.

I'm for the use of a single index for all the documents instead of one index 
per language (any objection?). Thus, in schema.xml, I need to declare a 
separate field for each language (text_fr, text_en, etc.), each with its own 
appropriate analysis. Then, during the indexing, I need to assign the free text 
content of each document to the appropriate field. Thus, for each document, 
only one of the freetext fields would be populated.

My question is, at search time, what is the best solution to search against the 
appropriate field?

I know that using dismax, we can define in "qf" the set the fields we want to 
search against. e.g., <str name="qf"> text_fr text_en</str>

With this solution, does Solr choose the appropriate analysis for the query. 
i.e., if a query is compared to a document having English free text (text_en is 
populated), does Solr analyze the query as it was in English ?

One problem with this approach is that, each query will be compared to all the 
available documents. i.e., a query in English would be compared to a document 
in French. As I know, if we know the query language, this problem can be 
avoided, either by searching against the appropriate field (e.g., 
text_fr:query), or by using a filter to select only those documents having 
English text. Am I correct? Or is there a better solution?

Thanks,
-Saïd

Reply via email to