On Thu, Apr 12, 2012 at 03:46:56PM +, Michael Ludwig wrote:
> > Von: Walter Underwood
>
> > German noun decompounding is a little more complicated than it might
> > seem.
> >
> > There can be transformations or inflections, like the "s" in
> > "Weinachtsbaum" (Weinachten/Baum).
>
> I remembe
German noun decompounding is a little more complicated than it might seem.
There can be transformations or inflections, like the "s" in "Weinachtsbaum"
(Weinachten/Baum).
Internal nouns should be recapitalized, like "Baum" above.
Some compounds probably should not be decompounded, like "Fahrrad
Hi,
We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a
from TeX generated FOP XML file for the Dutch language and have seen decent
results. A bonus was that now some tokens can be stemmed properly because not
all compounds are listed in the dictionary for the Hunspell
Paul,
nearly two years ago I requested an evaluation license and tested BASIS Tech
Rosette for Lucene & Solr. Was working excellent but the price much much to
high.
Yes, they also have compound analysis for several languages including German.
Just configure your pipeline in solr and setup the pr
Bernd,
can you please say a little more?
I think this list is ok to contain some description for commercial solutions
that satisfy a request formulated on list.
Is there any product at BASIS Tech that provides a compound-analyzer with a big
dictionary of decomposed compounds in German? If yes,
If you want that query "jacke" matches a document containing the word
"windjacke" or "kinderjacke", you could use a custom update processor.
This processor could search the indexed text for words matching the
pattern ".*jacke" and inject the word "jacke" into an additional field
which you can searc
You might have a look at:
http://www.basistech.com/lucene/
Am 12.04.2012 11:52, schrieb Michael Ludwig:
> Given an input of "Windjacke" (probably "wind jacket" in English), I'd
> like the code that prepares the data for the index (tokenizer etc) to
> understand that this is a "Jacke" ("jacket")
Michael,
I'm on this list and the lucene list since several years and have not found
this yet.
It's been one "neglected topics" to my taste.
There is a CompoundAnalyzer but it requires the compounds to be dictionary
based, as you indicate.
I am convinced there's a way to build the de-compound