Hi, I'm implementing custom dynamic results filtering to improve fuzzy / phonetic search support in my search application. I use the CommonsHttpSolrServer object to connect remotely to Solr. I would like to be able to index multiple fuzzy / phonetic match encodings, e.g. one of the packaged phonetic encodings, my own phonetic encoding, my own or a packaged q-gram encoding that will capture string overlap, etc., and then be able to filter out the results I consider "false positives" in a dynamic, custom way. The general approaches I've seen for this are:
1. Use Solr's fuzzy queries. I haven't been able to achieve acceptable performance using fuzzy queries, and also the fuzzy queries lack the dynamic flexibility above. e.g. whether or not I filter a phonetic match from results may depend on a lot of things (whether or not there were exact matches on relevant entities, who the user is, etc), and I can't achieve this flexibility with a fuzzy field query. 2. Create an RMI-based client/server setup so that I can use the SolrIndexSearcher to pass in a customer Collector (as in Ch. 9 of Lucene in Action, but add in a custom Collector). A custom Collector seems like exactly what I want but I don't see a way to achieve this using any of the packaged SolrServer implementations that support a remote setup like this. I also worry a about the stability of the remote object framework since it's been moved over to contrib and it seems that there may be serialization issues or other instability (http://lucene.472066.n3.nabble.com/extending-SolrIndexSearcher-td472809.htm l). 3. Continue to use the CommonsHttpSolrServer object for querying my index, but add in post-processing to dynamically filter results. This seems doable but unnatural and potentially inefficient given that I need to worry about supporting pagination and facet counts in such a framework. Is there an easier way to do custom dynamic results filtering (like via a custom Collector) while still using CommonsHttpSolrServer? Do people have any other suggestions or insights about the approaches summarized above? Thanks, Dave