Hi,
I'm having issues with special characters in synonyms.txt on Solr 3.5.
I'm running a multi-lingual index and need certain terms to give results across
all languages no matter what language the user uses.
I figured that this should be easily resovled by just adding the different
words to synonyms.txt.
This works great as long as I don't use special characters such as åäö.
I've tried a couple of things so far but now I'm completely stuck.
This is completetly ignored by solr:
island, "\u00F6"
and alternatively:
island, "\u00F6n" (this should translate to "ön" which means "the island")
A search for island gives me results with the word " island " but not containg
the word "ö" (island in swedish) and vice versa.
Directly injecting the letter "ö" into synonyms like so:
island, ön
island, "ön"
renders the following exception on startup (both lines renders the same error):
java.lang.RuntimeException: java.nio.charset.MalformedInputException: Input
length = 3
at
org.apache.solr.analysis.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:92)
at
org.apache.solr.analysis.SynonymFilterFactory.inform(SynonymFilterFactory.java:50)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:546)
at
org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:126)
at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:461)
at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:316)
at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:207)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94)
at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at
org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
at
org.mortbay.jetty.Server.doStart(Server.java:224)
at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at
org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:985)
at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.mortbay.start.Main.invokeMain(Main.java:194)
at org.mortbay.start.Main.start(Main.java:534)
at org.mortbay.start.Main.start(Main.java:441)
at org.mortbay.start.Main.main(Main.java:119)
Caused by: java.nio.charset.MalformedInputException: Input length = 3
at
java.nio.charset.CoderResult.throwException(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown
Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.LineNumberReader.readLine(Unknown
Source)
at
org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:82)
at
org.apache.lucene.analysis.synonym.SolrSynonymParser.add(SolrSynonymParser.java:70)
at
org.apache.solr.analysis.FSTSynonymFilterFactory.loadSolrSynonyms(FSTSynonymFilterFactory.java:122)
at
org.apache.solr.analysis.FSTSynonymFilterFactory.inform(FSTSynonymFilterFactory.java:84)
... 33 more
Does anyone have any ideas on how to solve this issue?
Thanks,
Carl