On Wed, Nov 19, 2014 at 11:56 AM, Yonik Seeley <yo...@heliosearch.com> wrote:
> On Wed, Nov 19, 2014 at 9:22 AM, Philip Durbin
> <philip_dur...@harvard.edu> wrote:
>> On Wed, Nov 19, 2014 at 5:45 AM, Yonik Seeley <yo...@heliosearch.com> wrote:
>>> On Tue, Nov 18, 2014 at 3:47 PM, Philip Durbin
>>> <philip_dur...@harvard.edu> wrote:
>>>> Solr JOINs are a way to enforce simple document security, as explained
>>>> by Yonik Seeley at
>>>> http://lucene.472066.n3.nabble.com/document-level-security-filter-solution-for-Solr-tp4126992p4126994.html
>>>>
>>>> I'm trying to tweak this pattern so that I don't have to keep the
>>>> security information in each of my primary Solr documents.
>>>>
>>>> I just posted the gist at
>>>> https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9 as an example of
>>>> my working Solr JOIN based on data in `before.json` . Permissions per
>>>> user are embedded in the primary documents like this:
>>>>
>>>>     {
>>>>         "id": "dataset_3",
>>>>         "perms_ss": [
>>>>             "alice",
>>>>             "bob"
>>>>         ]
>>>>     },
>>>>     {
>>>>         "id": "dataset_4",
>>>>         "perms_ss": [
>>>>             "alice",
>>>>             "bob",
>>>>             "public"
>>>>         ]
>>>>     },
>>>>
>>>> User document have been created to do the JOIN on:
>>>>
>>>>     {
>>>>         "id": "alice",
>>>>         "groups_s": "alice"
>>>>     },
>>>>
>>>> The JOIN looks like this:
>>>>
>>>> {!join+from=groups_s+to=perms_ss}id:public+OR+{!join+from=groups_s+to=perms_ss}id:alice
>>>
>>> It would probably be faster written as a single join:
>>> fq={!join+from=groups_s+to=perms_ss}id:(public alice)
>>
>> Hmm, I can't get the single JOIN to work on the "before" example
>> (perms embedded in each primary doc) in the gist I posted so I guess
>> I'll live with the slower version with "OR".
>>
>>> Or, if you're using Heliosearch you could cache the filters separately
>>> for better hit rates on commonly used perms via the "filter" keyword:
>>> fq=filter({!join+from=groups_s+to=perms_ss}id:public) OR
>>> filter({!join+from=groups_s+to=perms_ss}id:alice)
>>
>> Getting back to my original question about keeping permission
>> information out of my primary documents, I noticed that
>> http://heliosearch.org describes the Pseudo-Join feature as "selects a
>> set of documents based on their relationship to a **second** set of
>> documents" (emphasis mine) so I assume I can't take the perms out of
>> my primary Solr documents and put them in a **third** set of
>> "permission assignments" documents with definition points and role
>> assignees: 
>> https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9#file-after-json
>> . That is, the three sets of documents would be:
>>
>> 1. primary (datasets, with no permission info)
>> 2. users
>> 3. permission assignments
>
> You should be able to chain joins to follow any number of links.
> I don't quite understand how you mean to use your schema... but something like
>
> fq={!join from=definition_point_s to=id}role_assignee_ss:alice
>
> That's only following a single link and ignoring the group_s field, so
> I'm probably missing something.

No, no, this is PERFECT! I think...

Again my goal is to get away from putting the permissions in the
primary documents.

In the "before" example, I put the permissions in the primary
documents. Then I JOIN on those documents using a secondary set of
"group" documents: the "public" group, the "alice" group, the "bob"
group, etc.

As of the commit below, using your suggestion, in the "after" example
I've taken the permissions out of the primary documents. Instead the
permissions go into a set of "permission assignments" documents. This
means that when permissions change, rather than re-indexing my primary
documents (which is a somewhat expensive operation with many database
calls), I think I'll be able to reindex only the "permission
assignments" documents. As you noted, the third set of documents about
"groups" aren't being used so I deleted them.

I'm going to play around with this in our actual code. Thanks, Yonik!

Phil

p.s. You were right about the single JOIN as well, so that's in the
commit too (looking for both the "alice" group and the "public" group
at the same time). In my haste I forgot that when testing this stuff
with curl I need to replace spaces with the plus (+) sign.

p.p.s. I can't seem to figure out how to link to a specific diff in a
gist but what you see below is the third revision. This one:
https://gist.github.com/pdurbin/4d27fea7b431ef3bf4f9/0c0a9120299e3b0c112dc1687b89de83598fcb02

murphy:4d27fea7b431ef3bf4f9 pdurbin$ git show 0c0a912 | cat
commit 0c0a9120299e3b0c112dc1687b89de83598fcb02
Author: Philip Durbin <philipdur...@gmail.com>
Date:   Wed Nov 19 12:48:00 2014 -0500

    A solution from Yonik Seeley! Permissions are gone from primary docs

    Details at 
http://lucene.472066.n3.nabble.com/Solr-JOIN-keeping-permission-data-out-of-primary-documents-tp4169739p4169934.html

diff --git a/after.json b/after.json
index dd817e5..c2516d9 100644
--- a/after.json
+++ b/after.json
@@ -12,22 +12,6 @@
         "id": "dataset_4"
     },
     {
-        "id": "public",
-        "groups_s": "public"
-    },
-    {
-        "id": "alice",
-        "groups_s": "alice"
-    },
-    {
-        "id": "bob",
-        "groups_s": "bob"
-    },
-    {
-        "id": "charlie",
-        "groups_s": "charlie"
-    },
-    {
         "id": "dataset_1_perms",
         "definition_point_s": "dataset_1",
         "role_assignee_ss": [
diff --git a/test.after.alice b/test.after.alice
index 4fbc13f..a6ceb16 100755
--- a/test.after.alice
+++ b/test.after.alice
@@ -1,2 +1,2 @@
 #!/bin/bash
-diff <(curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+FIXME)'
| jq '.response.docs[] | {id}') alice.expected
+diff <(curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+alice)'
| jq '.response.docs[] | {id}') alice.expected
diff --git a/test.after.bob b/test.after.bob
index 0e834e0..9ae57e7 100755
--- a/test.after.bob
+++ b/test.after.bob
@@ -1,2 +1,2 @@
 #!/bin/bash
-diff <(curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+FIXME)'
| jq '.response.docs[] | {id}') bob.expected
+diff <(curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+bob)'
| jq '.response.docs[] | {id}') bob.expected
diff --git a/test.after.charlie b/test.after.charlie
index 89176ad..1527c3f 100755
--- a/test.after.charlie
+++ b/test.after.charlie
@@ -1,2 +1,2 @@
 #!/bin/bash
-diff <(curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq=({!join+FIXME)'
| jq '.response.docs[] | {id}') charlie.expected
+diff <(curl -s --globoff
'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*%3A*&fq={!join+from=definition_point_s+to=id}role_assignee_ss:(public+charlie)'
| jq '.response.docs[] | {id}') charlie.expected
murphy:4d27fea7b431ef3bf4f9 pdurbin$


-- 
Philip Durbin
Software Developer for http://dataverse.org
http://www.iq.harvard.edu/people/philip-durbin

Reply via email to