Re: access control list

Mike Klaas Wed, 30 Apr 2008 19:04:50 -0700


On 30-Apr-08, at 5:31 PM, Kevin Osborn wrote:

I have an index of about 3,000,000 products and about 8500customers. Each customers has access to about 50 to about 500,000 ofthe products.
Our current method was using a bitset in the filter. So, for eachcustomer, they have a bitset in the cache. For each docId that theyhave access to, the bit is set. This is probably the bestperformance-wise for searches, but it consumes a lot of memory,especially because each document that they don't have access to alsoconsumes space (a 0). It also is probably the cause of our problemswhen either these customer access lists (stored in files) or theindex is updated.
Is there a better way to manage access control? I was thinking ofstoring the user access list as a specific document type in theindex. Basically, a single multi-value field. But I'm not quite surewhere to go from here.

The best way to go about this is to refactor the problem into the trueconstraints that exist. It is unlikely that ~2,125,000,000 customer-product pairs were manually created. Surely these resulted fromgroups of less fine-grained control. Could these groups be thefilters you use?

Another option is to look for ways to transform the data based on itsintristic characteristics. Even if there are no longer explicitcontrol categories that you can leverage, you can look for groups ofdocuments that many users share access to, or large groups of docsthat few users have access to, and compose a single query's filter outgroups. This is probably pretty hard. A simpler application of theidea is to look for a partitioning of the documents where few usershaving access to one set have access to the other set. Put these intwo separate solrs/cores. Assuming a perfect partitioning, thathalves memory consumption.

Also consider that currently filters of size < 3000 are stored ashashes (size proportional to # docs) rather than bitsets, thus consumeless memory. This is configurable (but don't go too high).


-Mike

Re: access control list

Reply via email to