Re: Re: Re: Multi-lingual Search & Accent Marks

Walter Underwood Wed, 04 Sep 2019 07:41:57 -0700

On Sep 3, 2019, at 1:13 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com 
<audrey.lorberf...@ibm.com> wrote:
> 
> The main issue we are anticipating with the above strategy surrounds scoring. 
> Since we will be increasing the frequency of accented terms, we might bias 
> our page ranker...


You will not be increasing the frequency of the accented terms. Those 
frequencies will stay the same. You’ll be adding new unaccented terms. The new 
terms will probably have higher frequencies than the accented terms. If so, the 
accented terms should be preferred for accented queries. You might or might not 
want that behavior.

doc1: glück
doc1 terms: glück, gluck, glueck

doc2: glueck
doc2 terms: glueck

df for glück: 1
df for gluck: 1
df for glueck: 2

The df for the term “glück” is the same whether you expand or not.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

Re: Re: Re: Multi-lingual Search & Accent Marks

Reply via email to