Hey Hamish, You might want to check this out LUCENE-5402 . I added support for index-time pruning for suggesters that consumes from the index itself. I plan to add this support to file-based suggesters as well. In order to use this functionality from Solr, more changes are required. I am planning to support this in the new SuggesterComponent (SOLR-5378) in Solr.
Hope that helps! Areek On Wed, Jan 15, 2014 at 6:10 PM, Hamish Campbell < hamish.campb...@koordinates.com> wrote: > Thanks Tomás, I'll take a look. > > Still interested to hear from anyone about using queries to populate the > list - I'm willing to give up a bit of performance for the flexibility it > would provide. > > > On Thu, Jan 16, 2014 at 1:06 PM, Tomás Fernández Löbbe < > tomasflo...@gmail.com> wrote: > > > I think your use case is the one described in LUCENE-5350, maybe you want > > to take a look to the patch and comments there. > > > > Tomás > > > > > > On Wed, Jan 15, 2014 at 12:58 PM, Hamish Campbell < > > hamish.campb...@koordinates.com> wrote: > > > > > Hi all, > > > > > > I'm looking into options for filtering the search suggestions > dictionary. > > > > > > Using Solr 4.6.0, Suggester component and fst.FuzzyLookupFactory using > a > > > field based dictionary, we're indexing records for a multi-tenanted > SaaS > > > platform. SearchHandler records are always filtered by the particular > > > client warehouse (e.g. by domain), however we need a way to apply a > > similar > > > filter to the spell check dictionary to prevent leaking terms between > > > clients. In other words: when client A searches for a document title > they > > > should not receive spelling suggestions for client B's document titles. > > > > > > This has been asked a couple of times, on the mailing list and on > > > StackOverflow. Some of the suggested approaches: > > > > > > 1. Use dynamic fields to create dictionaries per-warehouse (mentioned > > here: > > > > > > > > > http://lucene.472066.n3.nabble.com/Filtering-down-terms-in-suggest-tt4069627.html > > > ) > > > > > > That might be a reasonable option for us (we already considered a > similar > > > approach), but at what point does this stop scaling efficiently? How > many > > > dynamic fields are too many? > > > > > > 2. Run a query to populate the suggestion list (also mentioned in that > > > thread) > > > > > > If I understand this correctly, this would give us a lot of flexibility > > and > > > power: for example to give a more nuanced result set using the users > > > permissions to expose private documents in their spelling suggestions. > > > > > > I expect this would be a slow query, but our total document count is > > > currently relatively small (on the order of 10^3 objects) and I imagine > > you > > > could create a specific word index with the appropriate fields to keep > > this > > > in check. Is this a feasible approach, and if so, how do you build a > > > dynamic suggestion list? > > > > > > 3. Other options: > > > > > > It seems like this is a common problem - and we could through some > > > resources at building an extension to provide some limited suggestion > > > dictionary filtering. Is anyone already doing something similar, or has > > > found a clever hack around this, or can suggest a starting point? > > > > > > Thanks everyone! > > > > > > -- > > > Hamish Campbell > > > Koordinates Ltd <http://koordinates.com/?_bzhc=esig> > > > PH +64 9 966 0433 > > > FAX +64 9 966 0045 > > > > > > > > > -- > Hamish Campbell > Koordinates Ltd <http://koordinates.com/?_bzhc=esig> > PH +64 9 966 0433 > FAX +64 9 966 0045 >