Thx, I'll try this approach. Zitat von Alexandre Rafalovitch <[email protected]>:
Have you looked at edismax and the 'qf' fields parameter? It allows you to define the fields to search. Also, you can define those parameters in solrconfig.xml and not have to send them down the wire. Finally, you can define several different request handlers (e.g. /ensearch, /frsearch) and have each of them use different 'qf' values, possibly with 'fl' field also defined and with field name aliasing from language-specific to generic names. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Apr 9, 2013 at 2:32 PM, <[email protected]> wrote:Hello, I'm trying to index a large number of documents in different languages. I don't know the language of the document, so I'm using TikaLanguageIdentifierUpdatePr**ocessorFactory to identify it. So, this is my configuration in solrconfig.xml <updateRequestProcessorChain name="langid"> <processor class="org.apache.solr.update.**processor.** TikaLanguageIdentifierUpdatePr**ocessorFactory"> <bool name="langid">true</bool> <str name="langid.fl">title,**subtitle,content</str> <str name="langid.langField">**language_s</str> <str name="langid.threshold">0.3</**str> <str name="langid.fallback">**general</str> <str name="langid.whitelist">en,fr,**de,it,es</str> <bool name="langid.map">true</bool> <bool name="langid.map.keepOrig">**true</bool> </processor> <processor class="solr.**LogUpdateProcessorFactory" /> <processor class="solr.**RunUpdateProcessorFactory" /> </updateRequestProcessorChain> So, the detection works fine and I put some dynamic fields in schema.xml to store the results: <dynamicField name="*_en" type="text_en" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_fr" type="text_fr" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_de" type="text_de" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_it" type="text_it" indexed="true" stored="true" multiValued="true"/> <dynamicField name="*_es" type="text_es" indexed="true" stored="true" multiValued="true"/> My main problem now is how to search the document without knowing the language of the searched document. I don't want to have a huge querystring like ?q=title_en:+term+subtitle_en:**+term+title_de:+term... Okay, using copyField and copy all fields into the "text" field...but "text" has the type text_general, so the language specific indexing is not working. I could use at least a combined field for every language (like text_en, text_fr...) but still, my querystring gets very long and to add new languages is terribly uncomfortable. So, what can I do? Is there a better solution to index and search documents in many languages without knowing the language of the document and the query before? - Geschan
