On Sep 3, 2019, at 1:13 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote: > > The main issue we are anticipating with the above strategy surrounds scoring. > Since we will be increasing the frequency of accented terms, we might bias > our page ranker...
You will not be increasing the frequency of the accented terms. Those frequencies will stay the same. You’ll be adding new unaccented terms. The new terms will probably have higher frequencies than the accented terms. If so, the accented terms should be preferred for accented queries. You might or might not want that behavior. doc1: glück doc1 terms: glück, gluck, glueck doc2: glueck doc2 terms: glueck df for glück: 1 df for gluck: 1 df for glueck: 2 The df for the term “glück” is the same whether you expand or not. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)