Dear all,

I'm a new user of Solr. I've managed to index a bunch of documents (in
fact, they are tweets) and everything works quite smoothly.

Nevertheless it looks like Solr doesn't detect the language of my documents
nor remove stopwords accordingly so I can extract the most frequent terms.

I've added this piece of XML to my solrconfig.xml as well as the Tika lib
jars.

    <updateRequestProcessorChain name="langid">
       <processor
class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
          <lst name="defaults">
            <str name="langid.fl">text</str>
            <str name="langid.langField">lang</str>
          </lst>
        </processor>
        <processor class="solr.LogUpdateProcessorFactory" />
       <processor class="solr.RunUpdateProcessorFactory" />
     </updateRequestProcessorChain>

There is no error in the tomcat log file, so I have no clue of why this
isn't working.
Any hint on how to solve this problem will be much appreciated!

Reply via email to