A colleague stumbled upon this :
http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding
The second answer, environment variable JAVA_TOOL_OPTIONS did the job.
JAVA_TOOL_OPTIONS : -Dfile.encoding=UTF8
Happy stop-wording !
--
View this message in context:
http:
Sounds like maybe UTF-specific issue when you are _reading it in_. See
if you can change the default locale before starting Java Process (I
think it is an environmental variable) and check if that makes an
impact.
If you have a very easy test-case, I would be happy to check it on Mac
and Windows.
Just so everyone knows :
It turns out my stopwords.txt was OK after all. It functions correctly on a
Linux (ubuntu), and, strangely, on a colleague's Windows 7. My computer is
also Windows 7. The only difference between the 2 Windows is the language
of the interface (French for mine, English fo
I'm encountering the same issue, but, my Russian stopwords.txt IS encoded in
UTF-8.
I verified the encoding using EmEditor (I've used it for years, and I use it
for the existing English, French, Spanish, Portuguese and German Solr
configurations, without issues).
Just to make extra sure, I downloa
EMAIL PROTECTED]
Sent: Saturday, December 06, 2008 1:17 AM
To: solr-user@lucene.apache.org
Subject: RE: Russian stopwords
Hi Steve,
You were right,it turned out to be a an encoding issue but a really weird
one. I was using windows notepad to save the stopwords file in UTF-8
encoding. On the other h
Hi Steve,
You were right,it turned out to be a an encoding issue but a really weird
one. I was using windows notepad to save the stopwords file in UTF-8
encoding. On the other hand I was using editplus to save synonyms file. That
was the only difference. The moment I switched to editplus for sa
Hi Tushar,
On 12/05/2008 at 5:18 AM, tushar kapoor wrote:
> I am trying to filter russian stopwords but have not been
> successful with that.
[...]
> words="stopwords.txt"/>
>ignoreCase="true" expand="false"/>
[...]
> Intrestingly, Russian synonyms are work