Hi,

I'm implementing custom dynamic results filtering to improve fuzzy /
phonetic search support in my search application.  I use the
CommonsHttpSolrServer object to connect remotely to Solr.  I would like to
be able to index multiple fuzzy / phonetic match encodings, e.g. one of the
packaged phonetic encodings, my own phonetic encoding, my own or a packaged
q-gram encoding that will capture string overlap, etc., and then be able to
filter out the results I consider "false positives" in a dynamic, custom
way.  The general approaches I've seen for this are:

1. Use Solr's fuzzy queries.  I haven't been able to achieve acceptable
performance using fuzzy queries, and also the fuzzy queries lack the dynamic
flexibility above.  e.g. whether or not I filter a phonetic match from
results may depend on a lot of things (whether or not there were exact
matches on relevant entities, who the user is, etc), and I can't achieve
this flexibility with a fuzzy field query.

2. Create an RMI-based client/server setup so that I can use the
SolrIndexSearcher to pass in a customer Collector (as in Ch. 9 of Lucene in
Action, but add in a custom Collector).  A custom Collector seems like
exactly what I want but I don't see a way to achieve this using any of the
packaged SolrServer implementations that support a remote setup like this.
I also worry a about the stability of the remote object framework since it's
been moved over to contrib and it seems that there may be serialization
issues or other instability
(http://lucene.472066.n3.nabble.com/extending-SolrIndexSearcher-td472809.htm
l).

3. Continue to use the CommonsHttpSolrServer object for querying my index,
but add in post-processing to dynamically filter results.  This seems doable
but unnatural and potentially inefficient given that I need to worry about
supporting pagination and facet counts in such a framework.

Is there an easier way to do custom dynamic results filtering (like via a
custom Collector) while still using CommonsHttpSolrServer?  Do people have
any other suggestions or insights about the approaches summarized above?

Thanks,
Dave

Reply via email to