Hello Chris - i don't know that token filter you mention but i would like to recommend Lucene's HyphenationCompoundWordTokenFilter. It works reasonably well if you provide the hyphenation rules and a dictionary. It has some flaws such as decompounding to irrelevant subwords, overlapping subwords or to subwords that do not form the whole compound word (minus genitives), but these can be fixed.
Markus -----Original message----- > From:Chris Morley <ch...@depahelix.com> > Sent: Wednesday 25th March 2015 17:59 > To: solr-user@lucene.apache.org > Subject: German Compound Splitter words.fst causing problems. > > Hello, Chris Morley here, of Wayfair.com. I am working on the German > compound-splitter by Dawid Weiss. > > I tried to "upgrade" the words.fst file that comes with the German > compound-splitter using Solr 3.5, but it doesn't work. Below is the > IndexNotFoundException that I get. > > cmorley@Caracal01:~/Work/oss/git/apache-solr-3.5.0$ java -cp > lucene/build/lucene-core-3.5-SNAPSHOT.jar > org.apache.lucene.index.IndexUpgrader wordsFst > Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: > org.apache.lucene.store.MMapDirectory@/home/cmorley/Work/oss/git/apache-solr-3.5.0/wordsFst > lockFactory=org.apache.lucene.store.NativeFSLockFactory@201a755e > at > org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:118) > at > org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:85) > > The reason I'm attempting this at all is due to the answer here, > http://stackoverflow.com/questions/25450865/migrate-solr-1-4-index-files-to-4-7, > which says to do the upgrade in a two step process, first using Solr 3.5, > and then the latest Solr version (4.10.3). When I try this running the unit > tests for my modified German compound-splitter I'm getting this same type of > error. The thing is, this is an FST, not an index, which is a little > confusing. The reason why I'm following this answer though, is because I'm > getting that exact same message when trying to build the (modified) project > with maven....at the point at which it tries to load in words.fst. Below. > > [main] ERROR com.wayfair.lucene.analysis.de.compound.GermanCompoundSplitter > - Format version is not supported (resource: > com.wayfair.lucene.analysis.de.compound.InputStreamDataInput@79a66240): 0 > (needs to be between 3 and 4). This version of Lucene only supports indexes > created with release 3.0 and later. Failed to initialize static data > structures for German compound splitter. > > Thanks, > -Chris. > > >