Yes, from the current nightly release setting up Stempel is quite easy.

All I did was:

svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr

cd lucene-solr/solr
ant example

cp 
./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar 
./lib
cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar 
./lib

in solrschema.xml

<lib path="../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar" />
<lib path="../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar" />

in schema.xml

<!-- Polish -->
<fieldType name="text_pl" class="solr.TextField">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" />

    <filter class="solr.StempelPolishStemFilterFactory"
language="Polish" />
  </analyzer>
</fieldType>

The end.

Anyway. I don't know if that is Polish stemmer or bad configurated
fieldType, but the results are just wrong.

example:

index for type "text_pl": bilety
query for type "text_pl": bilet
 
Index Analyzer

org.apache.solr.analysis.StempelPolishStemFilterFactory
{language=Polish, luceneMatchVersion=LUCENE_24}
term position
1
term text
bilić
term type
word
source start,end
0,6
payload

Query Analyzer

org.apache.solr.analysis.StempelPolishStemFilterFactory
{language=Polish, luceneMatchVersion=LUCENE_24}
term position
1
term text
binąć
term type
word
source start,end
0,5
payload


But I imagine the result as: bilet and bilet which are the base.

Any clues how to make it work like Polish? Maybe someone has good
experience with hunspell-solr and Polish dictonaries?

Thanks for letting me know!

Cheers,
Jakub Godawa.




On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote:
> https://issues.apache.org/jira/browse/SOLR-2237
> 
> On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa <jakub.god...@gmail.com>
> wrote:
> > I tried to reach the autors twice, but with no luck. I've seen some
> > posts where people finally were able to lunch it (without much
> pain).
> > I don't know. If any pro would be so nice to try to run the stempel
> on
> > his/her machine and paste me some verbose step by step solution I
> > would really appreciate.
> >
> > Cheers,
> > Jakub Godawa.
> >
> > 2010/11/13 Lance Norskog <goks...@gmail.com>:
> >> I don't know of the Stempel jar includes the Java source. At this
> point I
> >> think you should ask the author to Stempel to make a Solr front-end
> for it.
> >> It's very simple for him.
> >>
> >> Jakub Godawa wrote:
> >>>
> >>> Am I not doing it in the point no 4? I am compiling all the folder
> >>> that was extracted before, but now with that new class file.
> >>>
> >>> 2010/11/12 Lance Norskog<goks...@gmail.com>:
> >>>
> >>>>
> >>>> I think you have to compile all of the stempel source including
> your
> >>>> filter factory into one jar at the same time. Everybody does
> this; I
> >>>> don't know how different Java versions make class file binaries.
> >>>>
> >>>> On Thu, Nov 11, 2010 at 3:06 AM, Jakub
> Godawa<jakub.god...@gmail.com>
> >>>>  wrote:
> >>>>
> >>>>>
> >>>>> Hi! Sorry for such a break, but I was moving house... anyway:
> >>>>>
> >>>>> 1. I took the
> >>>>>
> ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
> >>>>> file and modified it (named as StempelFilterFactory.java) in Vim
> that
> >>>>> way:
> >>>>>
> >>>>> package org.getopt.solr.analysis;
> >>>>>
> >>>>> import org.apache.lucene.analysis.TokenStream;
> >>>>> import org.apache.lucene.analysis.standard.StandardFilter;
> >>>>>
> >>>>> public class StempelTokenFilterFactory extends
> BaseTokenFilterFactory {
> >>>>>  public StempelFilter create(TokenStream input) {
> >>>>>    return new StempelFilter(input);
> >>>>>  }
> >>>>> }
> >>>>>
> >>>>> 2. Then I put the file to the extracted stempel-1.0.jar in
> >>>>> ./org/getopt/solr/analysis/
> >>>>> 3. Then I created a class from it: jar -cf
> >>>>> StempelTokenFilterFactory.class StempelFilterFactory.java
> >>>>> 4. Then I created new stempel-1.0.jar archive: jar -cf
> stempel-1.0.jar
> >>>>> -C ./stempel-1.0/ .
> >>>>> 5. Then in schema.xml I've put:
> >>>>>
> >>>>>    <fieldType name="text_pl" class="solr.TextField">
> >>>>>      <analyzer>
> >>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >>>>>        <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>        <filter
> >>>>> class="org.getopt.solr.analysis.StempelTokenFilterFactory" />
> >>>>>      </analyzer>
> >>>>>    </fieldType>
> >>>>>
> >>>>> 6. I started the solr server and I recieved the following error:
> >>>>>
> >>>>> 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
> >>>>> SEVERE: java.lang.ClassFormatError: Incompatible magic value
> >>>>> 1347093252 in class file
> >>>>> org/getopt/solr/analysis/StempelTokenFilterFactory
> >>>>>        at java.lang.ClassLoader.defineClass1(Native Method)
> >>>>>        at
> java.lang.ClassLoader.defineClass(ClassLoader.java:634)
> >>>>>        at
> >>>>>
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> >>>>> ...
> >>>>>
> >>>>> Question: What is wrong? :) I use "jar (fastjar) 0.98" to create
> jars,
> >>>>> I googled on that error but with no answer gave me idea what is
> wrong
> >>>>> in my .java file.
> >>>>>
> >>>>> Please help, as I believe I am close to the end of that subject.
> >>>>>
> >>>>> Cheers,
> >>>>> Jakub Godawa.
> >>>>>
> >>>>> 2010/11/3 Lance Norskog<goks...@gmail.com>:
> >>>>>
> >>>>>>
> >>>>>> Here's the problem: Solr is a little dumb about these Filter
> classes,
> >>>>>> and so you have to make a Factory object for the Stempel
> Filter.
> >>>>>>
> >>>>>> There are a lot of other FilterFactory classes. You would have
> to just
> >>>>>> copy one and change the names to Stempel and it might actually
> work.
> >>>>>>
> >>>>>> This will take some Solr programming- perhaps the author can
> help you?
> >>>>>>
> >>>>>> On Tue, Nov 2, 2010 at 7:08 AM, Jakub
> Godawa<jakub.god...@gmail.com>
> >>>>>>  wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> Sorry, I am not Java programmer at all. I would appreciate
> more
> >>>>>>> verbose (or step by step) help.
> >>>>>>>
> >>>>>>> 2010/11/2 Bernd Fehling<bernd.fehl...@uni-bielefeld.de>:
> >>>>>>>
> >>>>>>>>
> >>>>>>>> So you call
> org.getopt.solr.analysis.StempelTokenFilterFactory.
> >>>>>>>> In this case I would assume a file
> StempelTokenFilterFactory.class
> >>>>>>>> in your directory org/getopt/solr/analysis/.
> >>>>>>>>
> >>>>>>>> And a class which extends the BaseTokenFilterFactory rigth?
> >>>>>>>> ...
> >>>>>>>> public class StempelTokenFilterFactory extends
> BaseTokenFilterFactory
> >>>>>>>> implements ResourceLoaderAware {
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Am 02.11.2010 14:20, schrieb Jakub Godawa:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This is what stempel-1.0.jar consist of after jar -xf:
> >>>>>>>>>
> >>>>>>>>> jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
> >>>>>>>>> org/:
> >>>>>>>>> egothor  getopt
> >>>>>>>>>
> >>>>>>>>> org/egothor:
> >>>>>>>>> stemmer
> >>>>>>>>>
> >>>>>>>>> org/egothor/stemmer:
> >>>>>>>>> Cell.class     Diff.class    Gener.class  MultiTrie2.class
> >>>>>>>>> Optimizer2.class  Reduce.class        Row.class
>  TestAll.class
> >>>>>>>>> TestLoad.class  Trie$StrEnum.class
> >>>>>>>>> Compile.class  DiffIt.class  Lift.class   MultiTrie.class
> >>>>>>>>> Optimizer.class   Reduce$Remap.class  Stock.class
>  Test.class
> >>>>>>>>> Trie.class
> >>>>>>>>>
> >>>>>>>>> org/getopt:
> >>>>>>>>> stempel
> >>>>>>>>>
> >>>>>>>>> org/getopt/stempel:
> >>>>>>>>> Benchmark.class  lucene  Stemmer.class
> >>>>>>>>>
> >>>>>>>>> org/getopt/stempel/lucene:
> >>>>>>>>> StempelAnalyzer.class  StempelFilter.class
> >>>>>>>>> jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
> >>>>>>>>> META-INF/:
> >>>>>>>>> MANIFEST.MF
> >>>>>>>>> jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
> >>>>>>>>> res:
> >>>>>>>>> tables
> >>>>>>>>>
> >>>>>>>>> res/tables:
> >>>>>>>>> readme.txt  stemmer_1000.out  stemmer_100.out
>  stemmer_2000.out
> >>>>>>>>> stemmer_200.out  stemmer_500.out  stemmer_700.out
> >>>>>>>>>
> >>>>>>>>> 2010/11/2 Bernd Fehling<bernd.fehl...@uni-bielefeld.de>:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Hi Jakub,
> >>>>>>>>>>
> >>>>>>>>>> if you unzip your stempel-1.0.jar do you have the
> >>>>>>>>>> required directory structure and file in there?
> >>>>>>>>>> org/getopt/stempel/lucene/StempelFilter.class
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Bernd
> >>>>>>>>>>
> >>>>>>>>>> Am 02.11.2010 13:54, schrieb Jakub Godawa:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Erick I've put the jar files like that before. I also
> added the
> >>>>>>>>>>> directive and put the file in instanceDir/lib
> >>>>>>>>>>>
> >>>>>>>>>>> What is still a problem is that even the files are loaded:
> >>>>>>>>>>> 2010-11-02 13:20:48
> org.apache.solr.core.SolrResourceLoader
> >>>>>>>>>>> replaceClassLoader
> >>>>>>>>>>> INFO: Adding
> >>>>>>>>>>>
> 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
> >>>>>>>>>>> to classloader
> >>>>>>>>>>>
> >>>>>>>>>>> I am not able to use the FilterFactory... maybe I am
> attempting it
> >>>>>>>>>>> in
> >>>>>>>>>>> a wrong way?
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>> Jakub Godawa.
> >>>>>>>>>>>
> >>>>>>>>>>> 2010/11/2 Erick Erickson<erickerick...@gmail.com>:
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> The polish stemmer jar file needs to be findable by Solr,
> if you
> >>>>>>>>>>>> copy
> >>>>>>>>>>>> it to<solr_home>/lib and restart solr you should be set.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Alternatively, you can add another<lib>  directive to the
> >>>>>>>>>>>> solrconfig.xml
> >>>>>>>>>>>> file
> >>>>>>>>>>>> (there are several examples in that file already).
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm a little confused about not being able to find
> TokenFilter,
> >>>>>>>>>>>> is that
> >>>>>>>>>>>> still
> >>>>>>>>>>>> a problem?
> >>>>>>>>>>>>
> >>>>>>>>>>>> HTH
> >>>>>>>>>>>> Erick
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Nov 2, 2010 at 8:07 AM, Jakub
> >>>>>>>>>>>> Godawa<jakub.god...@gmail.com>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thank you Bernd! I couldn't make it run though. Here is
> my
> >>>>>>>>>>>>> problem:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1. There is a file
> ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
> >>>>>>>>>>>>> 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml
> there is
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>> directive:<lib path="../lib/stempel-1.0.jar" />
> >>>>>>>>>>>>> 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml
> there is
> >>>>>>>>>>>>> fieldType:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>  <!-- Polish -->
> >>>>>>>>>>>>>   <fieldType name="text_pl" class="solr.TextField">
> >>>>>>>>>>>>>    <analyzer>
> >>>>>>>>>>>>>       <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
> >>>>>>>>>>>>>      <filter class="solr.LowerCaseFilterFactory"/>
> >>>>>>>>>>>>>      <filter
> class="org.getopt.stempel.lucene.StempelFilter" />
> >>>>>>>>>>>>>      <!--<filter
> >>>>>>>>>>>>>
> class="org.getopt.solr.analysis.StempelTokenFilterFactory"
> >>>>>>>>>>>>> protected="protwords.txt" />  -->
> >>>>>>>>>>>>>    </analyzer>
> >>>>>>>>>>>>>  </fieldType>
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 4. jar file is loaded but I got an error:
> >>>>>>>>>>>>> SEVERE: Could not start SOLR. Check solr/home property
> >>>>>>>>>>>>> java.lang.NoClassDefFoundError:
> >>>>>>>>>>>>> org/apache/lucene/analysis/TokenFilter
> >>>>>>>>>>>>>      at java.lang.ClassLoader.defineClass1(Native
> Method)
> >>>>>>>>>>>>>      at
> java.lang.ClassLoader.defineClass(ClassLoader.java:634)
> >>>>>>>>>>>>>      at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 5. Different class gave me that one:
> >>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException: Error
> loading
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>> 'org.getopt.solr.analysis.StempelTokenFilterFactory'
> >>>>>>>>>>>>>      at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
> >>>>>>>>>>>>>      at
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
> >>>>>>>>>>>>> (...)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Question is: How to make<fieldType />  and<filter />
>  work with
> >>>>>>>>>>>>> that
> >>>>>>>>>>>>> Stempel? :)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>> Jakub Godawa.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2010/10/29 Bernd
> Fehling<bernd.fehl...@uni-bielefeld.de>:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi Jakub,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have ported the KStemmer for use in most recent Solr
> trunk
> >>>>>>>>>>>>>> version.
> >>>>>>>>>>>>>> My stemmer is located in the lib directory of Solr
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> "solr/lib/KStemmer-2.00.jar"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> because it belongs to Solr.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Write it as FilterFactory and use it as Filter like:
> >>>>>>>>>>>>>> <filter
> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> protected="protwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> This is how my fieldType looks like:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>    <fieldType name="text_kstem" class="solr.TextField"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> positionIncrementGap="100">
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      <analyzer type="index">
> >>>>>>>>>>>>>>        <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
> >>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> words="stopwords.txt"
> enablePositionIncrements="false" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1"
> catenateWords="1"
> >>>>>>>>>>>>> catenateNumbers="1"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
> >>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
> >>>>>>>>>>>>>>        <filter
> >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> protected="protwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"
> >>>>>>>>>>>>>> />
> >>>>>>>>>>>>>>      </analyzer>
> >>>>>>>>>>>>>>      <analyzer type="query">
> >>>>>>>>>>>>>>        <tokenizer
> class="solr.WhitespaceTokenizerFactory" />
> >>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> words="stopwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> generateWordParts="1" generateNumberParts="1"
> catenateWords="0"
> >>>>>>>>>>>>> catenateNumbers="0"
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> catenateAll="0" splitOnCaseChange="1" />
> >>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory" />
> >>>>>>>>>>>>>>        <filter
> >>>>>>>>>>>>>> class="de.ubbielefeld.solr.analysis.KStemFilterFactory"
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> protected="protwords.txt" />
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"
> >>>>>>>>>>>>>> />
> >>>>>>>>>>>>>>      </analyzer>
> >>>>>>>>>>>>>>    </fieldType>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Bernd
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 28.10.2010 14:56, schrieb Jakub Godawa:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi!
> >>>>>>>>>>>>>>> There is a polish stemmer
> http://www.getopt.org/stempel/ and I
> >>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>> problems connecting it with solr 1.4.1
> >>>>>>>>>>>>>>> Questions:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 1. Where EXACTLY do I put "stemper-1.0.jar" file?
> >>>>>>>>>>>>>>> 2. How do I register the file, so I can build a
> fieldType
> >>>>>>>>>>>>>>> like:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> <fieldType name="text_pl" class="solr.TextField">
> >>>>>>>>>>>>>>>   <analyzer
> >>>>>>>>>>>>>>>
> class="org.geoopt.solr.analysis.StempelTokenFilterFactory"/>
> >>>>>>>>>>>>>>> </fieldType>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> 3. Is that the right approach to make it work?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks for verbose explanation,
> >>>>>>>>>>>>>>> Jakub.
> >>>>>>>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Lance Norskog
> >>>>>> goks...@gmail.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Lance Norskog
> >>>> goks...@gmail.com
> >>>>
> >>>>
> >>
> >

Reply via email to