I'm trying to employ the HunspellStemFilterFactory, but have trouble
loading a dictionary.
I downloaded the .dic and .aff file for en_GB, en_US and nl_NL from
the OpenOffice site, but they all give me the same error message.
When I use them AS IS, I get the error message:
Oct 26, 2012 2:39:37 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Unable to load hunspell data!
[dictionary=en_GB.dic,affix=en_GB.aff]
at
org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:87)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:551)
....
Caused by: java.text.ParseException: The first non-comment line in the
affix file must be a 'SET charset', was: 'FLAG num'
at
org.apache.lucene.analysis.hunspell.HunspellDictionary.getDictionaryEncoding(HunspellDictionary.java:280)
at
org.apache.lucene.analysis.hunspell.HunspellDictionary.<init>(HunspellDictionary.java:112)
at
org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:85)
... 32 more
When I add the following first line to both the .dic and the .aff file:
SET UTF-8
The error message changes into:
Oct 26, 2012 10:16:42 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: Unable to load hunspell data!
[dictionary=en_GB.dic,affix=en_GB.aff]
at org.apache.solr.analysis.HunspellStemFilterFactory.infoOSX
10.7.5rm(HunspellStemFilterFactory.java:87)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:551)
....
Caused by: java.nio.charset.IllegalCharsetNameException: 'UTF-8'
at java.nio.charset.Charset.checkName(Charset.java:284)
at java.nio.charset.Charset.lookup2(Charset.java:458)
at java.nio.charset.Charset.lookup(Charset.java:437)
at java.nio.charset.Charset.forName(Charset.java:502)
at
org.apache.lucene.analysis.hunspell.HunspellDictionary.getJavaEncoding(HunspellDictionary.java:293)
at
org.apache.lucene.analysis.hunspell.HunspellDictionary.<init>(HunspellDictionary.java:113)
at
org.apache.solr.analysis.HunspellStemFilterFactory.inform(HunspellStemFilterFactory.java:85)
... 32 more
I am aware of a similar issue that was raised on this list in 12-2011,
which was escalated to the Jiria list
(https://issues.apache.org/jira/browse/SOLR-2934), but am not sure if
that was ever resolved. Or am I just missing something? In either
case, could anyone who has working dictionary files share them with me
(any old language; as long as it works!)
I am using Solr 3.6.1 on a Mac running OSX 10.7.5
- Rob