subject:"Re\: Lexical analysis tools for German language data"

Re: Lexical analysis tools for German language data

2012-04-12 Thread Tomas Zerolo

On Thu, Apr 12, 2012 at 03:46:56PM +, Michael Ludwig wrote: > > Von: Walter Underwood > > > German noun decompounding is a little more complicated than it might > > seem. > > > > There can be transformations or inflections, like the "s" in > > "Weinachtsbaum" (Weinachten/Baum). > > I remembe

Re: Lexical analysis tools for German language data

2012-04-12 Thread Walter Underwood

German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the "s" in "Weinachtsbaum" (Weinachten/Baum). Internal nouns should be recapitalized, like "Baum" above. Some compounds probably should not be decompounded, like "Fahrrad

Re: Lexical analysis tools for German language data

2012-04-12 Thread Markus Jelsma

Hi, We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the dictionary for the Hunspell

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling

Paul, nearly two years ago I requested an evaluation license and tested BASIS Tech Rosette for Lucene & Solr. Was working excellent but the price much much to high. Yes, they also have compound analysis for several languages including German. Just configure your pipeline in solr and setup the pr

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht

Bernd, can you please say a little more? I think this list is ok to contain some description for commercial solutions that satisfy a request formulated on list. Is there any product at BASIS Tech that provides a compound-analyzer with a big dictionary of decomposed compounds in German? If yes,

Re: Lexical analysis tools for German language data

2012-04-12 Thread Valeriy Felberg

If you want that query "jacke" matches a document containing the word "windjacke" or "kinderjacke", you could use a custom update processor. This processor could search the indexed text for words matching the pattern ".*jacke" and inject the word "jacke" into an additional field which you can searc

Re: Lexical analysis tools for German language data

2012-04-12 Thread Bernd Fehling

You might have a look at: http://www.basistech.com/lucene/ Am 12.04.2012 11:52, schrieb Michael Ludwig: > Given an input of "Windjacke" (probably "wind jacket" in English), I'd > like the code that prepares the data for the index (tokenizer etc) to > understand that this is a "Jacke" ("jacket")

Re: Lexical analysis tools for German language data

2012-04-12 Thread Paul Libbrecht

Michael, I'm on this list and the lucene list since several years and have not found this yet. It's been one "neglected topics" to my taste. There is a CompoundAnalyzer but it requires the compounds to be dictionary based, as you indicate. I am convinced there's a way to build the de-compound

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

Re: Lexical analysis tools for German language data

8 matches

Site Navigation

Mail list logo

Footer information