Hi Steve, You were right,it turned out to be a an encoding issue but a really weird one. I was using windows notepad to save the stopwords file in UTF-8 encoding. On the other hand I was using editplus to save synonyms file. That was the only difference. The moment I switched to editplus for saving stopwords file it started working for Russian, German and all type of languages.
Anyways Thanks for the suggesting a valid direction. Regards, Tushar. Steven A Rowe wrote: > > Hi Tushar, > > On 12/05/2008 at 5:18 AM, tushar kapoor wrote: >> I am trying to filter russian stopwords but have not been >> successful with that. > [...] >> <filter class="solr.StopFilterFactory" ignoreCase="true" >> words="stopwords.txt"/> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> ignoreCase="true" expand="false"/> > [...] >> Intrestingly, Russian synonyms are working fine. English and russian >> synonyms get searched correctly. >> >> Also,If I add an English language word to stopwords.txt it >> gets filtered correctly. Its the russian words that are not >> getting filtered as stopwords. > > It might be an encoding issue - StopFilterFactory delegates stopword file > reading to SolrResourceLoader.getLines(), which uses an InputStreamReader > instantiated with the UTF-8 charset. Is your stopwords.txt encoded as > UTF-8? > > It's strange that synonyms are working fine, though - SynonymFilterFactory > reads in the synonyms file using the same mechanism as StopFilterFactory - > is it possible that your synonyms file is encoded as UTF-8, but your > stopwords file is encoded with a different encoding, perhaps KOI8-R? Like > UTF-8, KOI8-R includes the entirety of 7-bit ASCII, so English words would > be properly decoded under UTF-8. > > Steve > > -- View this message in context: http://www.nabble.com/Russian-stopwords-tp20851093p20868126.html Sent from the Solr - User mailing list archive at Nabble.com.