Pesky users. Life would be sooooo much easier if they'd just leave devs alone <G>....
Right. Well, you can certainly create your own SearchComponent and attach your custom filter at that point, note how I'm skimping on the details here. >From left field, you might create a custom FunctionQuery that returns 0 in the case of excluded documents. Since that gets multiplied into the score, the resulting score is 0. Returning 1 for docs that should be kept wouldn't change the score. But other than that, I'll leave it to the folks in the code. Chris, you there? <G>.. Best Erick On Wed, Apr 18, 2012 at 5:14 PM, Chris Collins <ch...@geekychris.com> wrote: > Great question. > > The set could be in the millions. I over simplified the use case somewhat to > protect the innocent :-}. If a user is querying a large set of documents > (for the sake of argument lets say its high tens of millions but could be in > the small billions), they want to potentially mark a result set or subset of > those docs with a label/tag and use that label /tag later. Now lets throw in > its multi tenant system and we dont want to keep re-indexing documents to add > these tags. Really what I would want todo is to execute a query filtering by > this labeled set, the server fetches the labeled set out of local cache or > over the wire or off disk and then incorporates it by one means or another as > a filter (docset or hashtable in the hitcollector). > > Personally I think the dictionary approach wouldnt be a good one. It may > produce the most optimal filter mechanism but will cost a bunch to construct > the OpenBitSet. > > In a prior company I built a more generic version of this for not only > filtering but for sorting, aggregate stats, etc. We didn't use Solr. I > was curious if there was any methodology for plugging in such a scheme > without taking a branch of solr and hacking at it. This was a multi tenant > system where we were producing aggregate graphs, filtering and ranking by > things such as entity level sentiment so we produced a rather generic > solution here that as you pointed out reinvented perhaps some things that > smell similar. It was about 7B docs and was multi tenant. Users were able > to overide these "features" on a document level which was necessary so their > counts, sorts etc worked correctly. Saying how long it took me to build and > debug it if I can take something close off the shelf.....well you know the > rest of the story :-} > > C > > > On Apr 18, 2012, at 4:38 AM, Erick Erickson wrote: > >> I guess my question is "what advantage are you trying >> to get here?" >> >> At the start, this feels like an "XY" problem. How are >> you intending to use the fq after you've built it? Because >> if there's any way to just create an "fq" clause, Solr >> will take care of it for you. Caching it, autowarming >> it when searchers are re-opened, etc. Otherwise, you're >> going to be re-inventing a bunch of stuff it seems to me, >> you'll have to intercept the queries coming in in order >> to apply the filter from the cache, etc. >> >> Which also may be another way of asking "How big >> is this set of document IDs?" If it's in the 100s, I'd >> just go with an fq. If it's more than that, I'd index >> some kind of set identifier that you could create for >> your fqs. >> >> And if this is gibberish, ignore me <G>.. >> >> Best >> Erick >> >> On Tue, Apr 17, 2012 at 4:34 PM, Chris Collins <ch...@geekychris.com> wrote: >>> Hi, I am a long time Lucene user but new to solr. I would like to use >>> something like the filterCache but build a such a cache not from a query >>> but custom code. I guess I will ask my question by using techniques and >>> vocab I am familiar with. Not sure its actually the right way so I >>> appologize if its just the wrong approach. >>> >>> The scenario is that I would like to filter a result set by a set of >>> labeled documents, I will call that set L. >>> L contains app specific document IDs that are indexed as literals in the >>> lucenefield "myid". >>> I would imagine I could build a OpenBitSet from enumerating the termdocs >>> and look for the intersecting ids in my label set. >>> Now I have my bitset that I assume I could use in a filter. >>> >>> Another approach would be to implement a hits collector, compute a >>> fieldcache from that myid field and look for the intersection in a >>> hashtable of L at scoring time, throwing out results that are not contained >>> in the hashtable. >>> >>> Of course I am working within the confines / concepts that SOLR has layed >>> out. Without going completely off the reservation is their a neat way of >>> doing such a thing with SOLR? >>> >>> Glad to clarify if my question makes absolutely no sense. >>> >>> Best >>> >>> C >> >