On 7/27/2017 7:20 AM, Itay K wrote: > I'm trying to measure Precision and recall for a search engine which is > crawling data sources of an organization. > > Are there any best practices regrading these indexes and specific > industries (e.g. for financial organizations, the recommended percentage > for precision and recall is ~60%). > > Is there any best practice in general for the recommended percentage? > > I read an article from 2005 regrading measured precision and recall for web > search engines but unfortunately my use case isn't a web application and I > believe that since than a lot has changed.
I don't believe you can assign concrete numbers to these aspects of a search engine, at least not in a way that has meaning after the query, the index, or the user's expectation changes. Recall is all about numbers, but precision is a completely subjective measurement that is going to vary from person to person. Results that are highly relevant for one user might be completely irrelevant for another, even when both users enter the exact same search terms. Also, the search terms that one user enters are likely to be different from the search terms that another user enters, even if they are looking for exactly the same thing. I cannot think of a way of calculating percentages for precision and recall that would give meaningful numbers when very different searches and expectations must be examined. A recall count for one search will have little meaning when compared to the recall count for a different search ... and as already mentioned, precision is COMPLETELY subjective. IMHO, tuning precision and recall is not about getting some calculated numbers as high as you can. In order to tune successfully, you have to know what people are searching for, what they expect to find, and come up with a configuration that will produce the best balance between precision and recall when applied to the combination of the data in the index and what's actually being searched. Frequently the tuning process involves educating the users, in addition to (or sometimes instead of) changing the search engine configuration. Six months after tuning the search, as the index and your users change, you may need completely different settings to get good results. Changes that affect precision and recall are usually a tradeoff between those two factors. Improving one of them will often make the other worse. They must be approached with a goal of bringing them into balance for the searches done by a majority of the system users. Thanks, Shawn