On Sep 3, 2019, at 1:13 PM, Audrey Lorberfeld - audrey.lorberf...@ibm.com
wrote:
>
> The main issue we are anticipating with the above strategy surrounds scoring.
> Since we will be increasing the frequency of accented terms, we might bias
> our page ranker...
You will not be increasing the f
Thanks, Alex! We'll look into this.
--
Audrey Lorberfeld
Data Scientist, w3 Search
IBM
audrey.lorberf...@ibm.com
On 9/3/19, 4:27 PM, "Alexandre Rafalovitch" wrote:
What about combining:
1) KeywordRepeatFilterFactory
2) An existing folding filter (need to check it ignores Keyword
What about combining:
1) KeywordRepeatFilterFactory
2) An existing folding filter (need to check it ignores Keyword marked word)
3) RemoveDuplicatesTokenFilterFactory
That may give what you are after without custom coding.
Regards,
Alex.
On Tue, 3 Sep 2019 at 16:14, Audrey Lorberfeld -
audrey
Toke,
Thank you! That makes a lot of sense.
In other news -- we just had a meeting where we decided to try out a hybrid
strategy. I'd love to know what you & everyone else thinks...
- Since we are concerned with the overhead created by "double-fielding" all
tokens per language (because I'm not
Audrey Lorberfeld - audrey.lorberf...@ibm.com wrote:
> Do you find that searching over both the original title field and the
> normalized title
> field increases the time it takes for your search engine to retrieve results?
It is not something we have measured as that index is fast enough (which
Toke,
Do you find that searching over both the original title field and the
normalized title field increases the time it takes for your search engine to
retrieve results?
--
Audrey Lorberfeld
Data Scientist, w3 Search
Digital Workplace Engineering
CIO, Finance and Operations
IBM
audrey.lorberf
Languages are the best. Thank you all so much!
--
Audrey Lorberfeld
Data Scientist, w3 Search
Digital Workplace Engineering
CIO, Finance and Operations
IBM
audrey.lorberf...@ibm.com
On 8/30/19, 4:09 PM, "Walter Underwood" wrote:
The right transliteration for accents is language-dependen
Thank you, Erick!
--
Audrey Lorberfeld
Data Scientist, w3 Search
Digital Workplace Engineering
CIO, Finance and Operations
IBM
audrey.lorberf...@ibm.com
On 8/30/19, 3:49 PM, "Erick Erickson" wrote:
It Depends (tm). In this case on how sophisticated/precise your users are.
If your users
It Depends (tm). In this case on how sophisticated/precise your users are. If
your users are exclusively extremely conversant in the language and are
expected to have keyboards that allow easy access to all the accents… then I
might leave them in. In some cases removing them can change the meani
Aita,
Thanks for that insight!
As the conversation has progressed, we are now leaning towards not having the
ASCII-folding filter in our pipelines in order to keep marks like umlauts and
tildas. Instead, we might add acute and grave accents to a file pointed at by
the MappingCharFilterFactory
10 matches
Mail list logo