Hi Community, In Ref Guide 8.3's *Understanding Analyzers, Tokenizers, and Filters* <https://lucene.apache.org/solr/guide/8_3/understanding-analyzers-tokenizers-and-filters.html> section, the text talks about precision and recall depending on how you use analyzers during query and index time:
For indexing, you often want to simplify, or normalize, words. For example, > setting all letters to lowercase, eliminating punctuation and accents, > mapping words to their stems, and so on. Doing so can *increase recall > *because, > for example, "ram", "Ram" and "RAM" would all match a query for "ram". To > *increase > query-time precision*, a filter could be employed to narrow the matches > by, for example, *ignoring all-cap acronyms* if you’re interested in male > sheep, but not Random Access Memory. In first case (about Recall), is it assumed that "ram" should match to all three? *[Q1] *Because, to increase recall, we have to decrease false negatives (documents not retrieved but are relevant). In other case (if the three are not intended to match the query), precision is actually decreased here (false positives are increased). This makes sense for the second case, where precision should increase as we are decreasing false positives (documents marked relevant wrongly). However, the text talks about the method of "employing a filter that ignores all-cap acronyms". How are we supposed to do that on query time? *[Q2]* Weren't we supposed to remove filter (LCF) during the index time? -- -- Regards, *Paras Lehana* [65871] Development Engineer, Auto-Suggest, IndiaMART Intermesh Ltd. 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN - 201303 Mob.: +91-9560911996 Work: 01203916600 | Extn: *8173* -- IMPORTANT: NEVER share your IndiaMART OTP/ Password with anyone.