Re: Using payloads and user provided data in score

Erick Erickson Wed, 22 Jul 2015 21:08:07 -0700

I'm not quite getting it here. I'm guessing that you do not
allow fielded queries or you strictly control the fields a user
sees to pick from. Otherwise your security stuff goes out the
window, say you have a drop-down list of fields to choose from
or something.

Assuming you do NOT have such a thing, the user is just typing
words in a box, then you have to figure out, once at the
app layer, what fields they have access to and just append a
qf=field_secure1,field_secure2.....
parameter to the query.

That's it. You do not have to rewrite the user query at all, the q
parameter is just passed through as is.

bq:  I guess in a search component I could look up all of the fields
that are in the index and only run queries against fields they should be
able to see once I know what is in the index (this is what you're
suggesting right?).

Kind of, except not in a search component. You have to have modeled
the access rights somewhere, so I'm not getting why you can't just use
that model to generate the list of restricted fields the user has access to.
You haven't explained that model other than to say it's "complex". So I
have no clue whether you're talking about not _knowing_ what fields are
in the docs in the first place (quite possible with dynamic fields) or
whether you do know the complete field list but calculating the user's access
rights to which fields is complex.

But I should emphasize again that my assumption is that once calculated,
this list is invariant so it does not need to be done for every request. Indeed,
what I'm envisioning is not writing any Solr code at all, all done in
the app layer.

As far as extra work, there isn't any as far as Solr is concerned.
It's exactly as though you were specifying this in, say, the request
handler. So I don't get your concern about lots and lots of fields.
Now, I'm assuming a simple document model with some number
of fields. The access rights to which of those fields a user can
see may be a complex calculation, but again you only need to do it
once. For that matter, you could pre-calculate that set of fields
or otherwise cache it.

Now, this breaks down if the document model isn't that simple,
say the same field in doc1 can be seen by userX, but userX
can't see the _same_ field in doc2. That's an ugly problem...

And let's further say there are a number of fields that _everyone_
can see. They can be placed in an <appends> section of the request
handler so you don't have to specify them for each request.

Best,
Erick

On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson <[email protected]> wrote:
> Looks like this may be what I'm looking for
>
> *SolrRequestInfo*
>
> I have not tried this yet but looks promising.
>
> Assuming this works, thinking about your suggestion I would need to rewrite
> the users query with the appropriate fields, are there any utilities for
> doing this?  I'd be looking to rewrite a fielded query like +field:value
> possibly to something like +(field.secure:value field.secure2:value)
>
> Again thanks for suggestions
> On Jul 22, 2015 5:20 PM, "Jamie Johnson" <[email protected]> wrote:
>
>> I answered my own question, looks like the field infos are always read
>> within the IndexSearcher so that cost is already being paid.
>>
>> I would potentially have to duplicate information in multiple fields if it
>> was present at multiple authorization levels, is there a limit to the
>> number of fields within a document?  I'm also concerned this might skew my
>> search results as terms that had more authorizations would appear in more
>> fields and would result in more matches on query.  I'll play with this a
>> little but I am still wondering about my original question.
>>
>> On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson <[email protected]> wrote:
>>
>>> I had thought about this in the past, but thought it might be too
>>> expensive.  I guess in a search component I could look up all of the fields
>>> that are in the index and only run queries against fields they should be
>>> able to see once I know what is in the index (this is what you're
>>> suggesting right?).
>>>
>>> My concern would be that the number of fields per document would grow too
>>> large to support this.  Our controls aren't simple like user or admin they
>>> are complex combinations of authorizations so I would think there might be
>>> a large number of fields that are generated using this approach.  Would
>>> retrieving all field infos from Solr be expensive on each request to see
>>> what they should be able to query?
>>>
>>> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson <[email protected]>
>>> wrote:
>>>
>>>> Why don't you handle it all at the app level? Here's what I mean:
>>>>
>>>> I'm assuming that you're using edismax here, but the same principle
>>>> applies if not.
>>>>
>>>> Your handler (say the "/select" handler) has a "qf" parameter which
>>>> defines
>>>> the fields that are searched over in the absence of a field qualifier,
>>>> e.g.
>>>> q=whatever&qf=title,description
>>>>
>>>> causes the search term to be looked for in the two fields "title" and
>>>> "description"
>>>> You can also set up the qf fields in the "/select" handler as one of
>>>> the items in
>>>> the <defaults> section....
>>>>
>>>> But, the qf param in the <defaults> section is just that... a default.
>>>> So individual
>>>> queries can override it. What I have in mind is that you'd look up the
>>>> user's
>>>> field-access list and append that list as necessary to the query and
>>>> just pass it
>>>> on through.
>>>>
>>>> Things to watch out for:
>>>> 1> if the user specifies a field, you'll have to strip that off if
>>>> they don't have rights,
>>>> i.e. q=field1:whatever whenever
>>>> ignores the qf parameter for "whatever" but does respect the qf param
>>>> for "whenever".
>>>> 2> If you have some kind of date field say that you want to facet
>>>> over, you'd have
>>>> to control that.
>>>> 3> if you have a "bag of words" where you use copyField to add a bunch
>>>> of field's
>>>> data to an uber-field then the user can infer some things from that
>>>> info, so you probably
>>>> don't want to be careful about what copyFields you use.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson <[email protected]>
>>>> wrote:
>>>> > I am looking for a way to prevent fields that users shouldn't be able
>>>> to
>>>> > know exist from contributing to the score.  The goal is to provide a
>>>> way to
>>>> > essentially hide certain fields from requests based on an access level
>>>> > provided on the query.  I have managed to make terms that users
>>>> shouldn't
>>>> > be able to see not impact the score by implementing a custom Similarity
>>>> > class that looks at the terms payloads and returns 0 for the score if
>>>> they
>>>> > shouldn't know the field exists.  The issue however is that I don't
>>>> have
>>>> > access to the request at this point so getting the users access level
>>>> is
>>>> > proving problematic.  Is there a way to get the current request that is
>>>> > being processed via some thread local variable or something similar
>>>> that
>>>> > Solr maintains?  If not is there another approach that I could be
>>>> using to
>>>> > access information from the request within my Similarity
>>>> implementation?
>>>> > Any thoughts on this would be greatly appreciated.
>>>> >
>>>> > -Jamie
>>>>
>>>
>>>
>>

Re: Using payloads and user provided data in score

Reply via email to