I am trying to use “languid.map.individual” setting to allow field “a” to detect as, say, English, and be mapped to “a_en”, while in the same document, field “b” detects as, say, German and is mapped to “b_de”.
What happens in my tests is that the global language is detected (for example, German), but BOTH fields are mapped to “_de” as a result. I cannot get individual detection or mapping to work. Am I mis-understanding the purpose of this setting? Here is the resulting document from my test: ---------------- { "id": "1005!22345", "language": [ "de" ], "a_de": "A title that should be detected as English with high confidence", "b_de": "Die Einführung einer anlasslosen Speicherung von Passagierdaten für alle Flüge aus einem Nicht-EU-Staat in die EU und umgekehrt ist näher gerückt. Der Ausschuss des EU-Parlaments für bürgerliche Freiheiten, Justiz und Inneres (LIBE) hat heute mit knapper Mehrheit für einen entsprechenden Richtlinien-Entwurf der EU-Kommission gestimmt. Bürgerrechtler, Grüne und Linke halten die geplante Richtlinie für eine andere Form der anlasslosen Vorratsdatenspeicherung, die alle Flugreisenden zu Verdächtigen mache.", "_version_": 1508494723734569000 } ---------------- I expected “a_de” to be “a_en”, and the “language” multi-valued field to have “en” and “de”. Here is my configuration in solrconfig.xml: -------------------- <updateRequestProcessorChain name="langid" default="true"> <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> <lst name="defaults"> <str name="langid">true</str> <str name="langid.fl">a,b</str> <str name="langid.map">true</str> <str name="langid.map.individual">true</str> <str name="langid.langField">language</str> <str name="langid.map.lcmap">af:uns,ar:uns,bg:uns,bn:uns,cs:uns,da:uns,el:uns,et:uns,fa:uns,fi:uns,gu:uns,he:uns,hi:uns,hr:uns,hu:uns,id:uns,ja:uns,kn:uns,ko:uns,lt:uns,lv:uns,mk:uns,ml:uns,mr:uns,ne:uns,nl:uns,no:uns,pa:uns,pl:uns,ro:uns,ru:uns,sk:uns,sl:uns,so:uns,sq:uns,sv:uns,sw:uns,ta:uns,te:uns,th:uns,tl:uns,tr:uns,uk:uns,ur:uns,vi:uns,zh-cn:uns,zh-tw:uns</str> <str name="langid.fallback">en</str> </lst> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> -------------------- The debug output of lang detect, during indexing, is as follows: ------------------- DEBUG - 2015-08-03 14:37:54.450; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Language detected de with certainty 0.9999964723182276 DEBUG - 2015-08-03 14:37:54.450; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Detected main document language from fields [a, b]: de DEBUG - 2015-08-03 14:37:54.450; org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; Appending field a DEBUG - 2015-08-03 14:37:54.451; org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; Appending field b DEBUG - 2015-08-03 14:37:54.453; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Language detected de with certainty 0.9999964723182276 DEBUG - 2015-08-03 14:37:54.453; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping field a using individually detected language de DEBUG - 2015-08-03 14:37:54.454; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Doing mapping from a with language de to field a_de DEBUG - 2015-08-03 14:37:54.454; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping field 1005!22345 to de DEBUG - 2015-08-03 14:37:54.454; org.eclipse.jetty.webapp.WebAppClassLoader; loaded class org.apache.solr.common.SolrInputField from WebAppClassLoader=525571@80503 DEBUG - 2015-08-03 14:37:54.454; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Removing old field a DEBUG - 2015-08-03 14:37:54.455; org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; Appending field a DEBUG - 2015-08-03 14:37:54.455; org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor; Appending field b DEBUG - 2015-08-03 14:37:54.456; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Language detected de with certainty 0.9999980402022373 DEBUG - 2015-08-03 14:37:54.456; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping field b using individually detected language de DEBUG - 2015-08-03 14:37:54.456; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Doing mapping from b with language de to field b_de DEBUG - 2015-08-03 14:37:54.456; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Mapping field 1005!22345 to de DEBUG - 2015-08-03 14:37:54.456; org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor; Removing old field b ------------- From this, my takeaway is that every time the LangDetectLanguageIdentifierUpdateProcessor is asked to detect the language, it is using field a AND b. But I can’t quite tell from this output. Any insight appreciated. Regards, David