Re: searching only within allowed documents

Yonik Seeley Tue, 10 Jun 2008 14:18:41 -0700

On Mon, Jun 9, 2008 at 7:44 PM, Stephen Weiss <[EMAIL PROTECTED]> wrote:
> However, in the plain text search, the user automatically searches through
> *all* of the folders to which they have subscribed.  This means, for (good!)
> users who have subscribed to a large (1000+) number of folders, the filter
> query would be quite long,


This is not a well-solved problem in Lucene & Solr in general.

> and would exceed the default number of boolean
> parameters allowed.

Solr allows you to specify filters in separate parameters that are
applied to the main query, but cached separately.

q=the user query&fq=folder:f13&fq=folder:f24

The other option is to have a user field and index the users that have
access to the specific document.  The downside to this is that the
document must be re-indexed to reflect permission changes (like a new
user that now has access to it).  This may or may not be feasible,
depending on how many users you have to support and how fast
permissions must change.

> Now, I'm reading on this tutorial page for Lucene:
>  http://www.lucenetutorial.com/techniques/permission-filtering.html that the
> best way to do this would involve some combination of HitCollector &
> FieldCache.  From what the author is saying, this sounds like exactly what I
> need.  Unfortunately, I am almost completely Java-illiterate, and on top of
> that, I'm  not really finding any explanation of:
>
> a) What exactly I would do with the HitCollector & FieldCache objects that
> would help me achieve this goal - even just at the level of Lucene, there's
> no real explanation in the tutorial
> or

I think he's saying that with the FieldCache, you can get the external
String id of each matching document and then through some other
external mechanism, determine if that document should be allowed.  So
that still leaves that application-specific part to be solved.

> b) Where exactly these classes fit in to Solr (if they do at all)

A custom request handler or a custom query component would be the
likely place to add/change behavior.

> So far I have already written my own (tiny, tiny) Tokenizer and
> TokenizerFactory for correctly parsing the tags that come in from the
> database, and that works great,

What's the format of the tags... you might be able to use an existing
tokenizer (a regex one perhaps).

-Yonik

Re: searching only within allowed documents

Reply via email to