On Wed, Nov 19, 2014 at 11:56 AM, Yonik Seeley <yo...@heliosearch.com> wrote: > On Wed, Nov 19, 2014 at 9:22 AM, Philip Durbin > <philip_dur...@harvard.edu> wrote: >> On Wed, Nov 19, 2014 at 5:45 AM, Yonik Seeley <yo...@heliosearch.com> wrote: >>> On Tue, Nov 18, 2014 at 3:47 PM, Philip Durbin >>> <philip_dur...@harvard.edu> wrote: >>>> Solr JOINs are a way to enforce simple document security, as explained >>>> by Yonik Seeley at >>>> http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html >>>> >>>> I'm trying to tweak this pattern so that I don't have to keep the >>>> security information in each of my primary Solr documents. >>>> >>>> I just posted the gist at >>>> https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of >>>> my working Solr JOIN based on data in `before.json` . Permissions per >>>> user are embedded in the primary documents like this: >>>> >>>> { >>>> "id": "dataset_3", >>>> "perms_ss": [ >>>> "alice", >>>> "bob" >>>> ] >>>> }, >>>> { >>>> "id": "dataset_4", >>>> "perms_ss": [ >>>> "alice", >>>> "bob", >>>> "public" >>>> ] >>>> }, >>>> >>>> User document have been created to do the JOIN on: >>>> >>>> { >>>> "id": "alice", >>>> "groups_s": "alice" >>>> }, >>>> >>>> The JOIN looks like this: >>>> >>>> {!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice >>> >>> It would probably be faster written as a single join: >>> fq={!join+from=groups_s+to=perms_ss}id:(public alice) >> >> Hmm, I can't get the single JOIN to work on the "before" example >> (perms embedded in each primary doc) in the gist I posted so I guess >> I'll live with the slower version with "OR". >> >>> Or, if you're using Heliosearch you could cache the filters separately >>> for better hit rates on commonly used perms via the "filter" keyword: >>> fq=filter({!join+from=groups_s+to=perms_ss}id:public) OR >>> filter({!join+from=groups_s+to=perms_ss}id:alice) >> >> Getting back to my original question about keeping permission >> information out of my primary documents, I noticed that >> http://heliosearch.org describes the Pseudo-Join feature as "selects a >> set of documents based on their relationship to a **second** set of >> documents" (emphasis mine) so I assume I can't take the perms out of >> my primary Solr documents and put them in a **third** set of >> "permission assignments" documents with definition points and role >> assignees: >> https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9#file-after-json >> . That is, the three sets of documents would be: >> >> 1. primary (datasets, with no permission info) >> 2. users >> 3. permission assignments > > You should be able to chain joins to follow any number of links. > I don't quite understand how you mean to use your schema... but something like > > fq={!join from=definition_point_s to=id}role_assignee_ss:alice > > That's only following a single link and ignoring the group_s field, so > I'm probably missing something.
No, no, this is PERFECT! I think... Again my goal is to get away from putting the permissions in the primary documents. In the "before" example, I put the permissions in the primary documents. Then I JOIN on those documents using a secondary set of "group" documents: the "public" group, the "alice" group, the "bob" group, etc. As of the commit below, using your suggestion, in the "after" example I've taken the permissions out of the primary documents. Instead the permissions go into a set of "permission assignments" documents. This means that when permissions change, rather than re-indexing my primary documents (which is a somewhat expensive operation with many database calls), I think I'll be able to reindex only the "permission assignments" documents. As you noted, the third set of documents about "groups" aren't being used so I deleted them. I'm going to play around with this in our actual code. Thanks, Yonik! Phil p.s. You were right about the single JOIN as well, so that's in the commit too (looking for both the "alice" group and the "public" group at the same time). In my haste I forgot that when testing this stuff with curl I need to replace spaces with the plus (+) sign. p.p.s. I can't seem to figure out how to link to a specific diff in a gist but what you see below is the third revision. This one: https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9/0c0a9120299e3b0c112dc1687b89de83598fcb02 murphy:4d27fea7b431ef3bf4f9 pdurbin$ git show 0c0a912 | cat commit 0c0a9120299e3b0c112dc1687b89de83598fcb02 Author: Philip Durbin <philipdur...@gmail.com> Date: Wed Nov 19 12:48:00 2014 -0500 A solution from Yonik Seeley! Permissions are gone from primary docs Details at http://lucene.472066.n3.nabble.com/Solr-JOIN-keeping-permission-data-out-of-primary-documents-tp4169739p4169934.html diff --git a/after.json b/after.json index dd817e5..c2516d9 100644 --- a/after.json +++ b/after.json @@ -12,22 +12,6 @@ "id": "dataset_4" }, { - "id": "public", - "groups_s": "public" - }, - { - "id": "alice", - "groups_s": "alice" - }, - { - "id": "bob", - "groups_s": "bob" - }, - { - "id": "charlie", - "groups_s": "charlie" - }, - { "id": "dataset_1_perms", "definition_point_s": "dataset_1", "role_assignee_ss": [ diff --git a/test.after.alice b/test.after.alice index 4fbc13f..a6ceb16 100755 --- a/test.after.alice +++ b/test.after.alice @@ -1,2 +1,2 @@ #!/bin/bash -diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+FIXME)' | jq '.response.docs[] | {id}') alice.expected +diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+alice)' | jq '.response.docs[] | {id}') alice.expected diff --git a/test.after.bob b/test.after.bob index 0e834e0..9ae57e7 100755 --- a/test.after.bob +++ b/test.after.bob @@ -1,2 +1,2 @@ #!/bin/bash -diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+FIXME)' | jq '.response.docs[] | {id}') bob.expected +diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+bob)' | jq '.response.docs[] | {id}') bob.expected diff --git a/test.after.charlie b/test.after.charlie index 89176ad..1527c3f 100755 --- a/test.after.charlie +++ b/test.after.charlie @@ -1,2 +1,2 @@ #!/bin/bash -diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+FIXME)' | jq '.response.docs[] | {id}') charlie.expected +diff <(curl -s --globoff 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+charlie)' | jq '.response.docs[] | {id}') charlie.expected murphy:4d27fea7b431ef3bf4f9 pdurbin$ -- Philip Durbin Software Developer for http://dataverse.org http://www.iq.harvard.edu/people/philip-durbin