I have an index of about 3,000,000 products and about 8500
customers. Each customers has access to about 50 to about 500,000 of
the products.
Our current method was using a bitset in the filter. So, for each
customer, they have a bitset in the cache. For each docId that they
have access to, the bit is set. This is probably the best
performance-wise for searches, but it consumes a lot of memory,
especially because each document that they don't have access to also
consumes space (a 0). It also is probably the cause of our problems
when either these customer access lists (stored in files) or the
index is updated.
Is there a better way to manage access control? I was thinking of
storing the user access list as a specific document type in the
index. Basically, a single multi-value field. But I'm not quite sure
where to go from here.
Another approach is to add an additional "acl" field, where the
contents of this would be the list of customers ids with access to
that document.
Then your query is an implicit acl:<customer id> AND (actual query).
No idea if that would work for your case, but we use it to control
access to source code in our Krugle enterprise product. Though we're
using LDAP-provided groups (more in line with Mike's suggestion) than
individual user ids.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"