I have an index of about 3,000,000 products and about 8500 customers. Each customers has access to about 50 to about 500,000 of the products.

Our current method was using a bitset in the filter. So, for each customer, they have a bitset in the cache. For each docId that they have access to, the bit is set. This is probably the best performance-wise for searches, but it consumes a lot of memory, especially because each document that they don't have access to also consumes space (a 0). It also is probably the cause of our problems when either these customer access lists (stored in files) or the index is updated.

Is there a better way to manage access control? I was thinking of storing the user access list as a specific document type in the index. Basically, a single multi-value field. But I'm not quite sure where to go from here.

Another approach is to add an additional "acl" field, where the contents of this would be the list of customers ids with access to that document.

Then your query is an implicit acl:<customer id> AND (actual query).

No idea if that would work for your case, but we use it to control access to source code in our Krugle enterprise product. Though we're using LDAP-provided groups (more in line with Mike's suggestion) than individual user ids.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

Reply via email to