bq: Your "ugly problem" is my situation I think ;) No, your problem is much worse ;(
The _contents_ of fields are restricted, which is horrible. OK, here's another idea out of waaaaaaay left field: Payloads. It hinges on there being an OK number of possible combinations which seems to be the case here. "OK" here means < 1B say. It also hinges on being able to pre-calculate the access rights for each term as you index it. Then you attach a payload to each term which is, in effect, the authorization token for that term that expresses your possibilities, A, B, A&B, A|B, whatever. Payloads are simply a float that gets carried along with the term and is accessible at scoring time. Now at scoring time, you "drop out" any terms that have "bad" auth tokens. WARNING: this is totally off the top of my head, so I'm sure there are gotchas in here. Like does returning 0 from the scoring negate the search..... No clue whether this can work for you, but here's some sample code that could give you an idea of how it all works: https://lucidworks.com/blog/end-to-end-payload-example-in-solr/ Good Luck. You're going places Solr wasn't designed to deal with so whatever you do will be "exciting". And you're right, creating huge clauses will be a performance issue, the payloads thing may help you tame that. Best, Erick On Thu, Jul 23, 2015 at 7:30 AM, Jamie Johnson <jej2...@gmail.com> wrote: > Sorry for being vague, I'll try to explain more. In my use case a > particular field does not have a security control, it's the data in the > field. So for instance if I had a schema with a field called name, there > could be data that should be secured at A, B, A&B, A|B, etc within that > field. So again it's not the field that has this control it's the data in > the field. My thought based on your suggestion was to dynamically generate > the fields based on the authorizations, this way the user would only see > name, but it would get translated to the fields in the index that they can > see. So at index time if a field was added to the solr document that said > name:foo with authorizations A&B I would need to translate that to > name_A&B_txt:foo. Then subsequently on search I would check what fields in > the index the user should be able to see and rewrite queries that said > name:foo to name_A&B_txt:foo (assuming the user can see A&B). > > We do not explicitly control the fields the user or calling application has > access to because I don't want to expose the name_A&B_txt:foo fields to > calling applications, they know that a field "name" exists, based on that I > need to translate a name:foo query into the appropriately controlled > version. Does that make sense? > > My biggest concern with this (beyond the query rewrite) is how it will > impact scoring (especially in the case information is available with > multiple markings, i.e. name_A_txt has a value of foo and name_B_txt has a > value of foo and the user has authorizations A and B) and possibly bumping > up against the maximum clause limit as we expand the query. > > These reasons were why I thought it best to use payloads to make terms with > authorizations a user can't see not impact the score and then resolve the > actual object the user can see using a store that already supports this > type of access pattern (specifically Accumulo in this case). > > Your "ugly problem" is my situation I think ;) > > On Thu, Jul 23, 2015 at 12:06 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> I'm not quite getting it here. I'm guessing that you do not >> allow fielded queries or you strictly control the fields a user >> sees to pick from. Otherwise your security stuff goes out the >> window, say you have a drop-down list of fields to choose from >> or something. >> >> Assuming you do NOT have such a thing, the user is just typing >> words in a box, then you have to figure out, once at the >> app layer, what fields they have access to and just append a >> qf=field_secure1,field_secure2..... >> parameter to the query. >> >> That's it. You do not have to rewrite the user query at all, the q >> parameter is just passed through as is. >> >> bq: I guess in a search component I could look up all of the fields >> that are in the index and only run queries against fields they should be >> able to see once I know what is in the index (this is what you're >> suggesting right?). >> >> Kind of, except not in a search component. You have to have modeled >> the access rights somewhere, so I'm not getting why you can't just use >> that model to generate the list of restricted fields the user has access >> to. >> You haven't explained that model other than to say it's "complex". So I >> have no clue whether you're talking about not _knowing_ what fields are >> in the docs in the first place (quite possible with dynamic fields) or >> whether you do know the complete field list but calculating the user's >> access >> rights to which fields is complex. >> >> But I should emphasize again that my assumption is that once calculated, >> this list is invariant so it does not need to be done for every request. >> Indeed, >> what I'm envisioning is not writing any Solr code at all, all done in >> the app layer. >> >> As far as extra work, there isn't any as far as Solr is concerned. >> It's exactly as though you were specifying this in, say, the request >> handler. So I don't get your concern about lots and lots of fields. >> Now, I'm assuming a simple document model with some number >> of fields. The access rights to which of those fields a user can >> see may be a complex calculation, but again you only need to do it >> once. For that matter, you could pre-calculate that set of fields >> or otherwise cache it. >> >> Now, this breaks down if the document model isn't that simple, >> say the same field in doc1 can be seen by userX, but userX >> can't see the _same_ field in doc2. That's an ugly problem... >> >> And let's further say there are a number of fields that _everyone_ >> can see. They can be placed in an <appends> section of the request >> handler so you don't have to specify them for each request. >> >> Best, >> Erick >> >> On Wed, Jul 22, 2015 at 4:12 PM, Jamie Johnson <jej2...@gmail.com> wrote: >> > Looks like this may be what I'm looking for >> > >> > *SolrRequestInfo* >> > >> > I have not tried this yet but looks promising. >> > >> > Assuming this works, thinking about your suggestion I would need to >> rewrite >> > the users query with the appropriate fields, are there any utilities for >> > doing this? I'd be looking to rewrite a fielded query like +field:value >> > possibly to something like +(field.secure:value field.secure2:value) >> > >> > Again thanks for suggestions >> > On Jul 22, 2015 5:20 PM, "Jamie Johnson" <jej2...@gmail.com> wrote: >> > >> >> I answered my own question, looks like the field infos are always read >> >> within the IndexSearcher so that cost is already being paid. >> >> >> >> I would potentially have to duplicate information in multiple fields if >> it >> >> was present at multiple authorization levels, is there a limit to the >> >> number of fields within a document? I'm also concerned this might skew >> my >> >> search results as terms that had more authorizations would appear in >> more >> >> fields and would result in more matches on query. I'll play with this a >> >> little but I am still wondering about my original question. >> >> >> >> On Wed, Jul 22, 2015 at 4:45 PM, Jamie Johnson <jej2...@gmail.com> >> wrote: >> >> >> >>> I had thought about this in the past, but thought it might be too >> >>> expensive. I guess in a search component I could look up all of the >> fields >> >>> that are in the index and only run queries against fields they should >> be >> >>> able to see once I know what is in the index (this is what you're >> >>> suggesting right?). >> >>> >> >>> My concern would be that the number of fields per document would grow >> too >> >>> large to support this. Our controls aren't simple like user or admin >> they >> >>> are complex combinations of authorizations so I would think there >> might be >> >>> a large number of fields that are generated using this approach. Would >> >>> retrieving all field infos from Solr be expensive on each request to >> see >> >>> what they should be able to query? >> >>> >> >>> On Wed, Jul 22, 2015 at 4:19 PM, Erick Erickson < >> erickerick...@gmail.com> >> >>> wrote: >> >>> >> >>>> Why don't you handle it all at the app level? Here's what I mean: >> >>>> >> >>>> I'm assuming that you're using edismax here, but the same principle >> >>>> applies if not. >> >>>> >> >>>> Your handler (say the "/select" handler) has a "qf" parameter which >> >>>> defines >> >>>> the fields that are searched over in the absence of a field qualifier, >> >>>> e.g. >> >>>> q=whatever&qf=title,description >> >>>> >> >>>> causes the search term to be looked for in the two fields "title" and >> >>>> "description" >> >>>> You can also set up the qf fields in the "/select" handler as one of >> >>>> the items in >> >>>> the <defaults> section.... >> >>>> >> >>>> But, the qf param in the <defaults> section is just that... a default. >> >>>> So individual >> >>>> queries can override it. What I have in mind is that you'd look up the >> >>>> user's >> >>>> field-access list and append that list as necessary to the query and >> >>>> just pass it >> >>>> on through. >> >>>> >> >>>> Things to watch out for: >> >>>> 1> if the user specifies a field, you'll have to strip that off if >> >>>> they don't have rights, >> >>>> i.e. q=field1:whatever whenever >> >>>> ignores the qf parameter for "whatever" but does respect the qf param >> >>>> for "whenever". >> >>>> 2> If you have some kind of date field say that you want to facet >> >>>> over, you'd have >> >>>> to control that. >> >>>> 3> if you have a "bag of words" where you use copyField to add a bunch >> >>>> of field's >> >>>> data to an uber-field then the user can infer some things from that >> >>>> info, so you probably >> >>>> don't want to be careful about what copyFields you use. >> >>>> >> >>>> Best, >> >>>> Erick >> >>>> >> >>>> On Wed, Jul 22, 2015 at 12:21 PM, Jamie Johnson <jej2...@gmail.com> >> >>>> wrote: >> >>>> > I am looking for a way to prevent fields that users shouldn't be >> able >> >>>> to >> >>>> > know exist from contributing to the score. The goal is to provide a >> >>>> way to >> >>>> > essentially hide certain fields from requests based on an access >> level >> >>>> > provided on the query. I have managed to make terms that users >> >>>> shouldn't >> >>>> > be able to see not impact the score by implementing a custom >> Similarity >> >>>> > class that looks at the terms payloads and returns 0 for the score >> if >> >>>> they >> >>>> > shouldn't know the field exists. The issue however is that I don't >> >>>> have >> >>>> > access to the request at this point so getting the users access >> level >> >>>> is >> >>>> > proving problematic. Is there a way to get the current request >> that is >> >>>> > being processed via some thread local variable or something similar >> >>>> that >> >>>> > Solr maintains? If not is there another approach that I could be >> >>>> using to >> >>>> > access information from the request within my Similarity >> >>>> implementation? >> >>>> > Any thoughts on this would be greatly appreciated. >> >>>> > >> >>>> > -Jamie >> >>>> >> >>> >> >>> >> >> >>