The "bailout" bits are just a safety valve. Say the post filter is
expensive. Say also that the query is *:*. That would mean that
your post-filter would be called for each and every document in
the corpus (absent other filters, of course). In order to avoid really
_long_ queries in this case I do some arbitrary limit in my
filter where after N documents being evaluated it just fails the
docs after that.

Not mandatory at all, but can be useful. And you definitely return
incomplete results, numFound won't be accurate, facets won't be
complete and all that. Which is why I also return some warning
to that effect and a "refine your query" message or something.

Best,
Erick

On Wed, Jan 18, 2017 at 9:44 AM, Mike Thomsen <mikerthom...@gmail.com> wrote:
> I finally got a chance to deep dive into this and have a preliminary
> working plugin. I'm starting to look at optimization strategies for how to
> speed processing up and am wondering if you can give me some more
> information about your "bailout" strategy.
>
> Thanks,
>
> Mike
>
> On Wed, Dec 21, 2016 at 9:08 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> "grab the response" is a bit ambiguous here in Solr terms. Sure,
>> a SearchComponent (you can write a plugin) gets the response,
>> but it only sees the final list being returned to the user, i.e. if you
>> have rows=15 it sees only 15 docs. Not sure that's adequate,
>> in the case above you could easily not be allowed to see any of
>> the top N docs. Plus, doing anything like this would give very
>> skewed things like facets, grouping, etc. Say the facets were
>> calculated over 534 hits but the user was only allowed to see 10 docs...
>> Very confusing.
>>
>> The most robust solution would be a "post filter", another bit
>> of custom code that you write (plugin). See:
>> http://yonik.com/advanced-filter-caching-in-solr/
>> A post filter sees _all_ the documents that satisfy the query,
>> and makes an include/exclude decision on each one (just
>> like any other fq clause). So facets, grouping and all the rest
>> "just work". Do be aware that if the ACL calculations are  expensive
>> you need to be prepared for the system administrator doing a
>> *:* query. I usually build in a bailout and stop passing documents
>> after some number and pass back a result about "please narrow
>> down your search". Of course if your business logic is such that
>> you can calculate them all "fast enough", you're golden.
>>
>> All that said, if there's any way you can build this into tokens in the
>> doc and use a standard fq clause it's usually much easier. That may
>> take some creative work at indexing time if it's even possible.
>>
>> Best,
>> Erick
>>
>> On Wed, Dec 21, 2016 at 5:56 PM, Mike Thomsen <mikerthom...@gmail.com>
>> wrote:
>> > We're trying out some ideas on locking down solr and would like to know
>> if
>> > there is a public API that allows you to grab the response before it is
>> > sent and inspect it. What we're trying to do is something for which a
>> > filter query is not a good option to really get where we want to be.
>> > Basically, it's an integration with some business logic to make a final
>> > pass at ensuring that certain business rules are followed in the event a
>> > query returns documents a user is not authorized to see.
>> >
>> > Thanks,
>> >
>> > Mike
>>

Reply via email to