I'm trying to think through a Solr-based email alerting engine that would have the following properties:
1. Users can enter queries they want to be alerted on, and the syntax for alert queries should be the same syntax as my regular solr (dismax) queries. 1a. Corollary: Because of not just tf-idf but also dismax pf and qf boosting, this implies that the set of documents that match a given query will vary widely in quality; the first page of search results will be quite good, but the last page won't be worth looking at. 2. The email alerting engine shouldn't bother alerting people about *all* new results for a given query; in particular it should avoid the poor-quality tail of results and just alert on "the good stuff". Unfortunately, my current understanding of Solr/Lucene is that there's not a good automatic way to partition the set of query results into "good stuff" vs "not good stuff". The main option I know of is to filter out documents below a certain score threshold, but if you search the Lucene/Solr mailing lists, people will advise that this is unlikely to be fruitful. (It ultimately boils down to how Lucene/Solr scores wasn't especially designed to mean anything as absolute numbers, only when compared to other scores.) This makes me wonder if there's something wrong with my original requirements, or whether people have thought of some other way to approach this. Interestingly, Google appears to have solved this at least to some degree with Google Alerts (http://www.google.com/alerts); there you can choose to receive "Only the best results" rather than "All the results". I'm not clear how they determine which results are "best", but their UI certainly implies they've come up with some scheme for it. Thanks, Chris
