Hi Tom,

On Wed, 2007-10-10 at 12:28 +0200, Thomas Traeger wrote:
> in short: use stemming
ok :)

> 
> Try the SnowballPorterFilterFactory with German2 as language attribute 
> first and use synonyms for combined words i.e. "Herrenhose" => "Herren", 
> "Hose".
so you use a combined approach?

> 
> By using stemming you will maybe have some "interesting" results, but it 
> is much better living with them than having no or much less results ;o)
Do you have an example what "interesting" results I can expect, just to
get an idea?

> 
> Find more infos on the Snowball stemming algorithms here:
> 
> http://snowball.tartarus.org/
Thanx! I also had a look at this site already, but what is missing is a
demo where one can see what's happening. I think I'll play a little with
stemming to get a feeling for this.

> 
> Also have a look at the StopFilterFactory, here is a sample stopwordlist 
> for the german language:
> 
> http://snowball.tartarus.org/algorithms/german/stop.txt
Our application handles products, do you think such stopwords are useful
in this scenario also? I wouldn't expect a user to search for "keine
hose" or s.th. like this :)

Thanx && cheers,
Martin

> 
> Good luck,
> 
> Tom
> 
> 
> Martin Grotzke schrieb:
> > Hello,
> >
> > with our application we have the issue, that we get different
> > results for singular and plural searches (german language).
> >
> > E.g. for "hose" we get 1.000 documents back, but for "hosen"
> > we get 10.000 docs. The same applies to "t-shirt" or "t-shirts",
> > of e.g. "hut" and "hüte" - lots of cases :)
> >
> > This is absolutely correct according to the schema.xml, as right
> > now we do not have any stemming or synonyms included.
> >
> > Now we want to have similar search results for these singular/plural
> > searches. I'm thinking of a solution for this, and want to ask, what
> > are your experiences with this.
> >
> > Basically I see two options: stemming and the usage of synonyms. Are
> > there others?
> >
> > My concern with stemming is, that it might produce unexpected results,
> > so that docs are found that do not match the query from the users point
> > of view. I asume that this needs a lot of testing with different data.
> >
> > The issue with synonyms is, that we would have to create a file
> > containing all synonyms, so we would have to figure out all cases, in
> > contrast to a solutions that is based on an algorithm.
> > The advantage of this approach is IMHO, that it is very predictable
> > which results will be returned for a certain query.
> >
> > Some background information:
> > Our documents contain products (id, name, brand, category, producttype,
> > description, color etc). The singular/plural issue basically applied to
> > the fields name, category and producttype, so we would like to restrict
> > the solution to these fields.
> >
> > Do you have suggestions how to handle this?
> >
> > Thanx in advance for sharing your experiences,
> > cheers,
> > Martin
> >
> >
> >   
> 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to