Ref Guide - Precision & Recall of Analyzers

Paras Lehana Wed, 06 Nov 2019 00:53:58 -0800

Hi Community,

In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters*
<https://lucene.apache.org/solr/guide/8_3/understanding-analyzers-tokenizers-and-filters.html>
section, the text talks about precision and recall depending on how you use
analyzers during query and index time:


For indexing, you often want to simplify, or normalize, words. For example,
> setting all letters to lowercase, eliminating punctuation and accents,
> mapping words to their stems, and so on. Doing so can *increase recall 
> *because,
> for example, "ram", "Ram" and "RAM" would all match a query for "ram". To 
> *increase
> query-time precision*, a filter could be employed to narrow the matches
> by, for example, *ignoring all-cap acronyms* if you’re interested in male
> sheep, but not Random Access Memory.


In first case (about Recall), is it assumed that "ram" should match to all
three? *[Q1] *Because, to increase recall, we have to decrease false
negatives (documents not retrieved but are relevant). In other case (if the
three are not intended to match the query), precision is actually decreased
here (false positives are increased).

This makes sense for the second case, where precision should increase as we
are decreasing false positives (documents marked relevant wrongly).

However, the text talks about the method of "employing a filter that
ignores all-cap acronyms". How are we supposed to do that on query time?
*[Q2]* Weren't we supposed to remove filter (LCF) during the index time?


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Ref Guide - Precision & Recall of Analyzers

Reply via email to