Yes, from the current nightly release setting up Stempel is quite easy. All I did was:
svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr cd lucene-solr/solr ant example cp ./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar ./lib cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar ./lib in solrschema.xml <lib path="../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar" /> <lib path="../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar" /> in schema.xml <!-- Polish --> <fieldType name="text_pl" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" /> <filter class="solr.StempelPolishStemFilterFactory" language="Polish" /> </analyzer> </fieldType> The end. Anyway. I don't know if that is Polish stemmer or bad configurated fieldType, but the results are just wrong. example: index for type "text_pl": bilety query for type "text_pl": bilet Index Analyzer org.apache.solr.analysis.StempelPolishStemFilterFactory {language=Polish, luceneMatchVersion=LUCENE_24} term position 1 term text bilić term type word source start,end 0,6 payload Query Analyzer org.apache.solr.analysis.StempelPolishStemFilterFactory {language=Polish, luceneMatchVersion=LUCENE_24} term position 1 term text binąć term type word source start,end 0,5 payload But I imagine the result as: bilet and bilet which are the base. Any clues how to make it work like Polish? Maybe someone has good experience with hunspell-solr and Polish dictonaries? Thanks for letting me know! Cheers, Jakub Godawa. On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote: > https://issues.apache.org/jira/browse/SOLR-2237 > > On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa <jakub.god...@gmail.com> > wrote: > > I tried to reach the autors twice, but with no luck. I've seen some > > posts where people finally were able to lunch it (without much > pain). > > I don't know. If any pro would be so nice to try to run the stempel > on > > his/her machine and paste me some verbose step by step solution I > > would really appreciate. > > > > Cheers, > > Jakub Godawa. > > > > 2010/11/13 Lance Norskog <goks...@gmail.com>: > >> I don't know of the Stempel jar includes the Java source. At this > point I > >> think you should ask the author to Stempel to make a Solr front-end > for it. > >> It's very simple for him. > >> > >> Jakub Godawa wrote: > >>> > >>> Am I not doing it in the point no 4? I am compiling all the folder > >>> that was extracted before, but now with that new class file. > >>> > >>> 2010/11/12 Lance Norskog<goks...@gmail.com>: > >>> > >>>> > >>>> I think you have to compile all of the stempel source including > your > >>>> filter factory into one jar at the same time. Everybody does > this; I > >>>> don't know how different Java versions make class file binaries. > >>>> > >>>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub > Godawa<jakub.god...@gmail.com> > >>>> wrote: > >>>> > >>>>> > >>>>> Hi! Sorry for such a break, but I was moving house... anyway: > >>>>> > >>>>> 1. I took the > >>>>> > ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java > >>>>> file and modified it (named as StempelFilterFactory.java) in Vim > that > >>>>> way: > >>>>> > >>>>> package org.getopt.solr.analysis; > >>>>> > >>>>> import org.apache.lucene.analysis.TokenStream; > >>>>> import org.apache.lucene.analysis.standard.StandardFilter; > >>>>> > >>>>> public class StempelTokenFilterFactory extends > BaseTokenFilterFactory { > >>>>> public StempelFilter create(TokenStream input) { > >>>>> return new StempelFilter(input); > >>>>> } > >>>>> } > >>>>> > >>>>> 2. Then I put the file to the extracted stempel-1.0.jar in > >>>>> ./org/getopt/solr/analysis/ > >>>>> 3. Then I created a class from it: jar -cf > >>>>> StempelTokenFilterFactory.class StempelFilterFactory.java > >>>>> 4. Then I created new stempel-1.0.jar archive: jar -cf > stempel-1.0.jar > >>>>> -C ./stempel-1.0/ . > >>>>> 5. Then in schema.xml I've put: > >>>>> > >>>>> <fieldType name="text_pl" class="solr.TextField"> > >>>>> <analyzer> > >>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> > >>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>> <filter > >>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory" /> > >>>>> </analyzer> > >>>>> </fieldType> > >>>>> > >>>>> 6. I started the solr server and I recieved the following error: > >>>>> > >>>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log > >>>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value > >>>>> 1347093252 in class file > >>>>> org/getopt/solr/analysis/StempelTokenFilterFactory > >>>>> at java.lang.ClassLoader.defineClass1(Native Method) > >>>>> at > java.lang.ClassLoader.defineClass(ClassLoader.java:634) > >>>>> at > >>>>> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > >>>>> ... > >>>>> > >>>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create > jars, > >>>>> I googled on that error but with no answer gave me idea what is > wrong > >>>>> in my .java file. > >>>>> > >>>>> Please help, as I believe I am close to the end of that subject. > >>>>> > >>>>> Cheers, > >>>>> Jakub Godawa. > >>>>> > >>>>> 2010/11/3 Lance Norskog<goks...@gmail.com>: > >>>>> > >>>>>> > >>>>>> Here's the problem: Solr is a little dumb about these Filter > classes, > >>>>>> and so you have to make a Factory object for the Stempel > Filter. > >>>>>> > >>>>>> There are a lot of other FilterFactory classes. You would have > to just > >>>>>> copy one and change the names to Stempel and it might actually > work. > >>>>>> > >>>>>> This will take some Solr programming- perhaps the author can > help you? > >>>>>> > >>>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub > Godawa<jakub.god...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> > >>>>>>> Sorry, I am not Java programmer at all. I would appreciate > more > >>>>>>> verbose (or step by step) help. > >>>>>>> > >>>>>>> 2010/11/2 Bernd Fehling<bernd.fehl...@uni-bielefeld.de>: > >>>>>>> > >>>>>>>> > >>>>>>>> So you call > org.getopt.solr.analysis.StempelTokenFilterFactory. > >>>>>>>> In this case I would assume a file > StempelTokenFilterFactory.class > >>>>>>>> in your directory org/getopt/solr/analysis/. > >>>>>>>> > >>>>>>>> And a class which extends the BaseTokenFilterFactory rigth? > >>>>>>>> ... > >>>>>>>> public class StempelTokenFilterFactory extends > BaseTokenFilterFactory > >>>>>>>> implements ResourceLoaderAware { > >>>>>>>> ... > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa: > >>>>>>>> > >>>>>>>>> > >>>>>>>>> This is what stempel-1.0.jar consist of after jar -xf: > >>>>>>>>> > >>>>>>>>> jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ > >>>>>>>>> org/: > >>>>>>>>> egothor getopt > >>>>>>>>> > >>>>>>>>> org/egothor: > >>>>>>>>> stemmer > >>>>>>>>> > >>>>>>>>> org/egothor/stemmer: > >>>>>>>>> Cell.class Diff.class Gener.class MultiTrie2.class > >>>>>>>>> Optimizer2.class Reduce.class Row.class > TestAll.class > >>>>>>>>> TestLoad.class Trie$StrEnum.class > >>>>>>>>> Compile.class DiffIt.class Lift.class MultiTrie.class > >>>>>>>>> Optimizer.class Reduce$Remap.class Stock.class > Test.class > >>>>>>>>> Trie.class > >>>>>>>>> > >>>>>>>>> org/getopt: > >>>>>>>>> stempel > >>>>>>>>> > >>>>>>>>> org/getopt/stempel: > >>>>>>>>> Benchmark.class lucene Stemmer.class > >>>>>>>>> > >>>>>>>>> org/getopt/stempel/lucene: > >>>>>>>>> StempelAnalyzer.class StempelFilter.class > >>>>>>>>> jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ > >>>>>>>>> META-INF/: > >>>>>>>>> MANIFEST.MF > >>>>>>>>> jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res > >>>>>>>>> res: > >>>>>>>>> tables > >>>>>>>>> > >>>>>>>>> res/tables: > >>>>>>>>> readme.txt stemmer_1000.out stemmer_100.out > stemmer_2000.out > >>>>>>>>> stemmer_200.out stemmer_500.out stemmer_700.out > >>>>>>>>> > >>>>>>>>> 2010/11/2 Bernd Fehling<bernd.fehl...@uni-bielefeld.de>: > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Hi Jakub, > >>>>>>>>>> > >>>>>>>>>> if you unzip your stempel-1.0.jar do you have the > >>>>>>>>>> required directory structure and file in there? > >>>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Bernd > >>>>>>>>>> > >>>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa: > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Erick I've put the jar files like that before. I also > added the > >>>>>>>>>>> directive and put the file in instanceDir/lib > >>>>>>>>>>> > >>>>>>>>>>> What is still a problem is that even the files are loaded: > >>>>>>>>>>> 2010-11-02 13:20:48 > org.apache.solr.core.SolrResourceLoader > >>>>>>>>>>> replaceClassLoader > >>>>>>>>>>> INFO: Adding > >>>>>>>>>>> > 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' > >>>>>>>>>>> to classloader > >>>>>>>>>>> > >>>>>>>>>>> I am not able to use the FilterFactory... maybe I am > attempting it > >>>>>>>>>>> in > >>>>>>>>>>> a wrong way? > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Jakub Godawa. > >>>>>>>>>>> > >>>>>>>>>>> 2010/11/2 Erick Erickson<erickerick...@gmail.com>: > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr, > if you > >>>>>>>>>>>> copy > >>>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set. > >>>>>>>>>>>> > >>>>>>>>>>>> Alternatively, you can add another<lib> directive to the > >>>>>>>>>>>> solrconfig.xml > >>>>>>>>>>>> file > >>>>>>>>>>>> (there are several examples in that file already). > >>>>>>>>>>>> > >>>>>>>>>>>> I'm a little confused about not being able to find > TokenFilter, > >>>>>>>>>>>> is that > >>>>>>>>>>>> still > >>>>>>>>>>>> a problem? > >>>>>>>>>>>> > >>>>>>>>>>>> HTH > >>>>>>>>>>>> Erick > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub > >>>>>>>>>>>> Godawa<jakub.god...@gmail.com> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is > my > >>>>>>>>>>>>> problem: > >>>>>>>>>>>>> > >>>>>>>>>>>>> 1. There is a file > ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar > >>>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml > there is > >>>>>>>>>>>>> a > >>>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" /> > >>>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml > there is > >>>>>>>>>>>>> fieldType: > >>>>>>>>>>>>> > >>>>>>>>>>>>> (...) > >>>>>>>>>>>>> <!-- Polish --> > >>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField"> > >>>>>>>>>>>>> <analyzer> > >>>>>>>>>>>>> <tokenizer > class="solr.WhitespaceTokenizerFactory"/> > >>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>>> <filter > class="org.getopt.stempel.lucene.StempelFilter" /> > >>>>>>>>>>>>> <!--<filter > >>>>>>>>>>>>> > class="org.getopt.solr.analysis.StempelTokenFilterFactory" > >>>>>>>>>>>>> protected="protwords.txt" /> --> > >>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>> </fieldType> > >>>>>>>>>>>>> (...) > >>>>>>>>>>>>> > >>>>>>>>>>>>> 4. jar file is loaded but I got an error: > >>>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property > >>>>>>>>>>>>> java.lang.NoClassDefFoundError: > >>>>>>>>>>>>> org/apache/lucene/analysis/TokenFilter > >>>>>>>>>>>>> at java.lang.ClassLoader.defineClass1(Native > Method) > >>>>>>>>>>>>> at > java.lang.ClassLoader.defineClass(ClassLoader.java:634) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> > >>>>>>>>>>>>> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > >>>>>>>>>>>>> (...) > >>>>>>>>>>>>> > >>>>>>>>>>>>> 5. Different class gave me that one: > >>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error > loading > >>>>>>>>>>>>> class > >>>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory' > >>>>>>>>>>>>> at > >>>>>>>>>>>>> > >>>>>>>>>>>>> > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) > >>>>>>>>>>>>> at > >>>>>>>>>>>>> > >>>>>>>>>>>>> > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) > >>>>>>>>>>>>> (...) > >>>>>>>>>>>>> > >>>>>>>>>>>>> Question is: How to make<fieldType /> and<filter /> > work with > >>>>>>>>>>>>> that > >>>>>>>>>>>>> Stempel? :) > >>>>>>>>>>>>> > >>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>> Jakub Godawa. > >>>>>>>>>>>>> > >>>>>>>>>>>>> 2010/10/29 Bernd > Fehling<bernd.fehl...@uni-bielefeld.de>: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Jakub, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr > trunk > >>>>>>>>>>>>>> version. > >>>>>>>>>>>>>> My stemmer is located in the lib directory of Solr > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar" > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> because it belongs to Solr. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like: > >>>>>>>>>>>>>> <filter > class="de.ubbielefeld.solr.analysis.KStemFilterFactory" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> protected="protwords.txt" /> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This is how my fieldType looks like: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <fieldType name="text_kstem" class="solr.TextField" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> positionIncrementGap="100"> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <analyzer type="index"> > >>>>>>>>>>>>>> <tokenizer > class="solr.WhitespaceTokenizerFactory" /> > >>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> words="stopwords.txt" > enablePositionIncrements="false" /> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <filter class="solr.WordDelimiterFilterFactory" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" > catenateWords="1" > >>>>>>>>>>>>> catenateNumbers="1" > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" /> > >>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory" /> > >>>>>>>>>>>>>> <filter > >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> protected="protwords.txt" /> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <filter > class="solr.RemoveDuplicatesTokenFilterFactory" > >>>>>>>>>>>>>> /> > >>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>>>> <tokenizer > class="solr.WhitespaceTokenizerFactory" /> > >>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> words="stopwords.txt" /> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <filter class="solr.WordDelimiterFilterFactory" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1" > catenateWords="0" > >>>>>>>>>>>>> catenateNumbers="0" > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" /> > >>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory" /> > >>>>>>>>>>>>>> <filter > >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory" > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> protected="protwords.txt" /> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <filter > class="solr.RemoveDuplicatesTokenFilterFactory" > >>>>>>>>>>>>>> /> > >>>>>>>>>>>>>> </analyzer> > >>>>>>>>>>>>>> </fieldType> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>> Bernd > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi! > >>>>>>>>>>>>>>> There is a polish stemmer > http://www.getopt.org/stempel/ and I > >>>>>>>>>>>>>>> have > >>>>>>>>>>>>>>> problems connecting it with solr 1.4.1 > >>>>>>>>>>>>>>> Questions: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file? > >>>>>>>>>>>>>>> 2. How do I register the file, so I can build a > fieldType > >>>>>>>>>>>>>>> like: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField"> > >>>>>>>>>>>>>>> <analyzer > >>>>>>>>>>>>>>> > class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/> > >>>>>>>>>>>>>>> </fieldType> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 3. Is that the right approach to make it work? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks for verbose explanation, > >>>>>>>>>>>>>>> Jakub. > >>>>>>>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Lance Norskog > >>>>>> goks...@gmail.com > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Lance Norskog > >>>> goks...@gmail.com > >>>> > >>>> > >> > >