Re: ACL implementation: Pseudo-join performance & Atomic Updates

Jack Krupansky Sun, 14 Jul 2013 09:06:59 -0700

Take a look at LucidWorks Search and its access control:
http://docs.lucidworks.com/display/help/Search+Filters+for+Access+Control


Role-based security is an easier nut to crack.

Karl Wright of ManifoldCF had a Solr patch for document access control atone point:SOLR-1895 - ManifoldCF SearchComponent plugin for enforcing ManifoldCFsecurity at search time

https://issues.apache.org/jira/browse/SOLR-1895

http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011

For some other thoughts:
http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security

I'm not sure if external file fields will be of any value in this situation.

There is also a proposal for bitwise operations:

SOLR-1913 - QParserPlugin plugin for Search Results Filtering Based onBitwise Operations on Integer Fields

https://issues.apache.org/jira/browse/SOLR-1913

But the bottom line is that clearly updating all documents in the index is anon-starter.


-- Jack Krupansky

-----Original Message-----From: Oleg Burlaca

Sent: Sunday, July 14, 2013 11:02 AM
To: solr-user@lucene.apache.org
Subject: ACL implementation: Pseudo-join performance & Atomic Updates

Hello all,

Situation:
We have a collection of files in SOLR with ACL applied: each file has a
multi-valued field that contains the list of userID's that can read it:

here is sample data:
Id | content  | userId
1  | text text | 4,5,6,2
2  | text text | 4,5,9
3  | text text | 4,2

Problem:
when ACL is changed for a big folder, we compute the ACL for all child
items and reindex in SOLR using atomic updates (updating only 'userIds'
column), but because it deletes/reindexes the record, the performance is
very poor.

Question:
I suppose the delete/reindex approach will not change soon (probably it's
due to actual SOLR architecture), ?

Possible solution: assuming atomic updates will be super fast on an index
without fulltext, keep a separate ACLIndex and FullTextIndex and use
Pseudo-Joins:

Example: searching 'foo' as user '999'
/solr/FullTextIndex/select/?q=foo&fq{!join fromIndex=ACLIndex from=Id to=Id
}userId:999

Question: what about performance here? what if the index is 100,000
records?
notice that the worst situation is when everyone has access to all the
files, it means the first filter will be the full index.

Would be happy to get any links that deal with the issue of Pseudo-join
performance for large datasets (i.e. initial filtered set of IDs).

Regards,
Oleg

P.S. we found that having the list of all users that have access for each
record is better overall, because there are much more read requests (people

accessing the library) then write requests (a new user is added/removed).

Re: ACL implementation: Pseudo-join performance & Atomic Updates

Reply via email to