Thanks for the responses.

Groups probably wouldn't work as while there will be some overlap between 
customers, each will have a very different overall set of accessible resources.

I'll try the suggestion about simply reindexing, or using the no-cache option 
and see how I get on.

Failing that, are there hooks to write custom filter modules that used other 
parts of the records to decide on whether to include them in a result set or 
not? In our use case, the documents represent articles, which have an "issue" 
field. Each customer has defined issues (or ranges of issues) that they have 
subscriptions to, so the upper bounds for "what to filter" would probably be 
fairly small (10k - 20k issues/ranges). This could probably be used with the 
no-cache option you've pointed me to.

Best wishes,

Phil.

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 23 January 2012 17:34
To: solr-user@lucene.apache.org
Subject: Re: Filtering search results by an external set of values

A second, but arguably quite expert option, is to use the no-cache option.
See: https://issues.apache.org/jira/browse/SOLR-2429

The idea here is that you can specify that a filter is "expensive" and it will 
only be run after all the other filters & etc have been applied.
Furthermore,
it will not be cached and only documents that pass through all the other 
filters will be matched against this filter. It has been specifically used for 
ACL calculations...

That said, see exactly how painful storing auth tokens is. I can index, on a 
relatively underpowered laptop, 11M Wiki documents in 5 minutes or so. If your 
worst-case rights update take 1/2 hour to re-index and it only happens once a 
month, why be complex?

And groups, as Jan says, often make even this unnecessary.

Best
Erick

On Mon, Jan 23, 2012 at 5:16 AM, Jan Høydahl <jan....@cominvent.com> wrote:
> Hi,
>
> Do you have any kind of "group" membership for you users? If you have, 
> a resource's list of security access tokens could be smaller and avoid 
> re-indexing most resources when adding "normal" users which mostly 
> belong to groups. The common way is to add filters on the query. You 
> may do it yourself or have some framework/plugin to it for you, see 
> http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
>
> --
> Jan Høydahl, search solution architect Cominvent AS - 
> www.cominvent.com Solr Training - www.solrtraining.com
>
> On 23. jan. 2012, at 11:49, John, Phil (CSS) wrote:
>
>> Hi,
>>
>>
>>
>> We're building quite a large shared index of resources, using Solr. 
>> The application that makes use of these resources is a multitenant 
>> one (i.e., many customers using the same index). For resources that 
>> are "private" to a customer, it's fairly easy to tag a document with 
>> their customer ID and using a FilterQuery to limit results to just 
>> their "stuff".
>>
>>
>>
>> We are soon going to be adding a large number (many tens of millions) 
>> of records that will be shared amongst customers. Not all customers 
>> will have access to the same shared resources, e.g.:
>>
>>
>>
>> *         Shared resource 1:
>>
>> o   Customer 1
>>
>> o   Customer 3
>>
>>
>>
>> *         Shared resource 2:
>>
>> o   Customer 2
>>
>> o   Customer 1
>>
>>
>>
>> The issue is, what is the best way to model this in Solr? Should we 
>> have multiple customer_id fields on each record, and then use the 
>> filter query as with "private" resources, or is there a better way of doing 
>> it?
>> What happens if we need to do a bulk change - i.e. adding new 
>> customer, or a previous customer has a large change in what shared 
>> resources they have access to? Am I right in thinking that we'd need 
>> to go through every shared resource, read it, make the required 
>> change, and reindex it?
>>
>>
>>
>> I'm wondering if there's a way, instead of updating these resources 
>> directly, I could construct a set of documents that would act as a 
>> filter at query time of which shared resources to return?
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Phil John
>>
>> Technical Lead, Capita Software Services
>>
>> Knights Court, Solihull Parkway
>>
>> Birmingham Business Park B37 7YB
>>
>> Office: 0870 400 5000
>>
>> Fax: 0870 400 5001
>> email: philj...@capita.co.uk <mailto:philj...@capita.co.uk>
>>
>>
>>
>> Part of Capita plc www.capita.co.uk <http://www.capita.co.uk>
>>
>>
>>
>>
>>
>> This email and any attachment to it are confidential.  Unless you are the 
>> intended recipient, you may not use, copy or disclose either the message or 
>> any information contained in the message. If you are not the intended 
>> recipient, you should delete this email and notify the sender immediately.
>>
>> Any views or opinions expressed in this email are those of the sender only, 
>> unless otherwise stated.  All copyright in any Capita material in this email 
>> is reserved.
>>
>> All emails, incoming and outgoing, may be recorded by Capita and monitored 
>> for legitimate business purposes.
>>
>> Capita exclude all liability for any loss or damage arising or resulting 
>> from the receipt, use or transmission of this email to the fullest extent 
>> permitted by law.
>

Reply via email to