After language detection is enabled, SOLR (5.1) isn't indexing anything

Angel Todorov Wed, 22 Apr 2015 08:05:06 -0700

Hi guys,

I've enabled language detection in solrconfig.xml:


  <updateRequestProcessorChain name="langid">

    <processor class="
org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory
">

      <lst name="defaults">

        <str name="langid.fl">content,title</str>

        <str name="langid.fallback">en</str>

        <str name="langid.langField">language_s</str>

        <str name="langid.lcmap">en_GB:en en_US:en</str>

        <str name="langid.map.lcmap">en_GB:en en_US:en</str>

      </lst>

    </processor>

  </updateRequestProcessorChain>


Then I have:


  <requestHandler name="/update" class="solr.UpdateRequestHandler">

    <!-- See below for information on defining

         updateRequestProcessorChains that can be used by name

         on each Update Request

      -->



       <lst name="defaults">

         <str name="update.chain">langid</str>

       </lst>



  </requestHandler>


When I try to index a document, it's not added to the SOLR index. If I
remove the above code, everything works fine.


Do i need to make any specific changes to the schema.xml? Here is an
excerpt of it :


 <field name="title" type="string" indexed="true" stored="true" required="
false" multiValued="false" />

<field name="title_en" type="string" indexed="true" stored="true" required="
false" multiValued="false" />

<field name="content" type="multilang_text_exact" indexed="true" stored="
true"/>


<fieldType name="multilang_text_exact" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">

<analyzer type="index">

<tokenizer class="solr.LetterTokenizerFactory"/>

</analyzer>

<analyzer type="query">

<tokenizer class="solr.LetterTokenizerFactory"/>

</analyzer>

 </fieldType>


I don't get any errors in the SOLR console output.


Do i need to add _en and _<LANG ID> suffixes to all fields in my schema,
for the above to work? I mean, do i need to have title, title_en, title_jp,
and so on - manually defined in the schema? I still don't understand why a
document isn't added at all, without any error being thrown.


Thank you,

Angel

After language detection is enabled, SOLR (5.1) isn't indexing anything

Reply via email to