Hello Chris - i don't know that token filter you mention but i would like to 
recommend Lucene's HyphenationCompoundWordTokenFilter. It works reasonably well 
if you provide the hyphenation rules and a dictionary. It has some flaws such 
as decompounding to irrelevant subwords, overlapping subwords or to subwords 
that do not form the whole compound word (minus genitives),  but these can be 
fixed.

Markus
 
-----Original message-----
> From:Chris Morley <ch...@depahelix.com>
> Sent: Wednesday 25th March 2015 17:59
> To: solr-user@lucene.apache.org
> Subject: German Compound Splitter words.fst causing problems.
> 
> Hello, Chris Morley here, of Wayfair.com. I am working on the German 
> compound-splitter by Dawid Weiss. 
>   
>   I tried to "upgrade" the words.fst file that comes with the German 
> compound-splitter using Solr 3.5, but it doesn't work. Below is the 
> IndexNotFoundException that I get.
>   
>  cmorley@Caracal01:~/Work/oss/git/apache-solr-3.5.0$ java -cp 
> lucene/build/lucene-core-3.5-SNAPSHOT.jar 
> org.apache.lucene.index.IndexUpgrader wordsFst
>  Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: 
> org.apache.lucene.store.MMapDirectory@/home/cmorley/Work/oss/git/apache-solr-3.5.0/wordsFst
>  lockFactory=org.apache.lucene.store.NativeFSLockFactory@201a755e
>                  at 
> org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:118)
>                  at 
> org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:85)
>   
>  The reason I'm attempting this at all is due to the answer here, 
> http://stackoverflow.com/questions/25450865/migrate-solr-1-4-index-files-to-4-7,
>  which says to do the upgrade in a two step process, first using Solr 3.5, 
> and then the latest Solr version (4.10.3).  When I try this running the unit 
> tests for my modified German compound-splitter I'm getting this same type of 
> error.  The thing is, this is an FST, not an index, which is a little 
> confusing.  The reason why I'm following this answer though, is because I'm 
> getting that exact same message when trying to build the (modified) project 
> with maven....at the point at which it tries to load in words.fst. Below.
>   
>  [main] ERROR com.wayfair.lucene.analysis.de.compound.GermanCompoundSplitter 
> - Format version is not supported (resource: 
> com.wayfair.lucene.analysis.de.compound.InputStreamDataInput@79a66240): 0 
> (needs to be between 3 and 4). This version of Lucene only supports indexes 
> created with release 3.0 and later.  Failed to initialize static data 
> structures for German compound splitter.
>   
>  Thanks,
>  -Chris.
> 
> 
> 

Reply via email to