Searching with access controls
I'm trying to index data in a system that implements some rather nasty access controls on the data. Basically, there are users, and communities, and users are members of the communities. Potentially a user could be a member of hundreds or even thousands of communities (there's no enforced upper limit). Now I'm trying for a solution such that a user only gets documents that are either "public" or belong to a community that they're a member of. I figure there are two approaches (if there are other/better ones, please let me know). 1) For each document in the index, I store userid in a multivalued field. I simply store every single userid that IS allowed access to the document. This has the advantage of the query being quite simple (e.g. useracecss:MYUSERID) but I will have to store HEAPS of data, and potentially have to do many more updates (as users join/leave communities). 2) For each document in the index, store the community id that it belongs to. The obvious advantage here is less updates, and less storage. HOWEVER, this means queries get bigger and bigger as users are in more and more communities (e.g. communityid:(myCID1 OR myCID2 OR myCID3 ) Does anyone have any thoughts on this?, are there blindingly obvious options I'm missing that would take all this complication away?, what performance implications do each of these methods have? Many thanks in advance for any comments or helpful suggestions :) -- Martyn
Re: Searching with access controls
I was just reading about the limit on boolean operators in a query (it seems to default to 1024 in Solr). Using option 2 would mean that a user can't be in any more than 1024 communities (assuming no other boolean logic in the query). Potentially a huge number of communities (10,000+ ?). Each community could easily have say 100 documents each, and there's some other "global" type documents too. Say 500,000 - 1,000,000 documents? What do you mean by "You could also store user documents in the collection to avoid passing the security info" ? I'm not really a Java programmer of any significance, but I work with people who are, and I can bully them into helping out. (I'm a Perl guy myself). Thanks, -- Martyn On Thu, 2006-08-10 at 23:43 -0400, Yonik Seeley wrote: > On 8/10/06, Martyn Smith <[EMAIL PROTECTED]> wrote: > > I'm trying to index data in a system that implements some rather nasty > > access controls on the data. > > > > Basically, there are users, and communities, and users are members of > > the communities. Potentially a user could be a member of hundreds or > > even thousands of communities (there's no enforced upper limit). > > I think option 2 (storing the community id with the document) is the way to > go. > If it's not fast enough, custom query handlers and using filters may help. > You could also store user documents in the collection to avoid passing > the security info (this would definitely require a custom query > handler). > > What are the number of documents, and number of communities? > > -Yonik >
Re: Searching with access controls
We're not really sure how big the userbase is going to get, but it could become huge. I think initially we need to be able to cope with several thousand users, and probably only several thousand communities. I'll certainly have a look at "faceted browsing" :), and yeah, a query handler that does that sounds quite useful. I think I need to have a read on what "filters" actually are :) Thanks thought, It looks like I've got some more reading to do ... -- Martyn On Fri, 2006-08-11 at 00:07 -0400, Yonik Seeley wrote: > On 8/10/06, Martyn Smith <[EMAIL PROTECTED]> wrote: > > I was just reading about the limit on boolean operators in a query (it > > seems to default to 1024 in Solr). > > > > Using option 2 would mean that a user can't be in any more than 1024 > > communities (assuming no other boolean logic in the query). > > > > Potentially a huge number of communities (10,000+ ?). Each community > > could easily have say 100 documents each, and there's some other > > "global" type documents too. > > > > Say 500,000 - 1,000,000 documents? > > How many users for this system? > > > What do you mean by "You could also store user documents in the > > collection to avoid passing the security info" ? > > Store a document of type "user" that contains the communities they belong to. > Create a custom query handler that takes a base query in addition to > the user id. > Get the user document, get a filter for each community they belong to > from the filter cache, union them all, and then do a filtered query. > > If the number of users is low, you could cache the resulting filter > from unioning all the communities. If the number of users is high > compared to the number of communities, cache the community filters > instead. > > Search the archives for faceted browsing... many of the techniques may > be applicable. > > -Yonik >
Re: Faceted Searching Presentation @ ApacheCon US
Will this be available on-line anywhere after your presentation? I'd be very interested to see it :) -- Martyn On Tue, 2006-08-15 at 18:13 -0700, Chris Hostetter wrote: > I'm stoked to anounce that I'll be presenting at this years ApacheCon US, > In Austin Texas on October 13th. > > I'll be discussing how CNET uses Solr to power our Faceted searching > pages, showing some examples of how you can use the Solr RequestHandler > API to impliment very customized Faceted searching plugins, and > (hopefully) demonstrating the new general purpose Faceted searching > functionality in the Standard and DisMax request handlers (assuming I have > time to write it) > > More info can be found at the ApacheCon website... > http://www.us.apachecon.com/html/sessions.html#FR26 > > > -Hoss >
Re: Faceted Searching Presentation @ ApacheCon US
I can't make it to Texas very easily :( On Tue, 2006-08-15 at 22:03 -0700, Chris Hostetter wrote: > : Will this be available on-line anywhere after your presentation? > : > : I'd be very interested to see it :) > > The slides, or the code? > > If I have time to write the code, it will be in Subversion. > > As for the slides, i think so -- but i can't make any promises; besides: > > 1) I'm a very animated speaker ... my slides typically don't contain > most of the juicy stuff I talk about. > 2) If i say yes, then what's your incentive to come to the confrence? :) > > > > -Hoss > >