On Mon, Jun 9, 2008 at 7:44 PM, Stephen Weiss <[EMAIL PROTECTED]> wrote: > However, in the plain text search, the user automatically searches through > *all* of the folders to which they have subscribed. This means, for (good!) > users who have subscribed to a large (1000+) number of folders, the filter > query would be quite long,
This is not a well-solved problem in Lucene & Solr in general. > and would exceed the default number of boolean > parameters allowed. Solr allows you to specify filters in separate parameters that are applied to the main query, but cached separately. q=the user query&fq=folder:f13&fq=folder:f24 The other option is to have a user field and index the users that have access to the specific document. The downside to this is that the document must be re-indexed to reflect permission changes (like a new user that now has access to it). This may or may not be feasible, depending on how many users you have to support and how fast permissions must change. > Now, I'm reading on this tutorial page for Lucene: > http://www.lucenetutorial.com/techniques/permission-filtering.html that the > best way to do this would involve some combination of HitCollector & > FieldCache. From what the author is saying, this sounds like exactly what I > need. Unfortunately, I am almost completely Java-illiterate, and on top of > that, I'm not really finding any explanation of: > > a) What exactly I would do with the HitCollector & FieldCache objects that > would help me achieve this goal - even just at the level of Lucene, there's > no real explanation in the tutorial > or I think he's saying that with the FieldCache, you can get the external String id of each matching document and then through some other external mechanism, determine if that document should be allowed. So that still leaves that application-specific part to be solved. > b) Where exactly these classes fit in to Solr (if they do at all) A custom request handler or a custom query component would be the likely place to add/change behavior. > So far I have already written my own (tiny, tiny) Tokenizer and > TokenizerFactory for correctly parsing the tags that come in from the > database, and that works great, What's the format of the tags... you might be able to use an existing tokenizer (a regex one perhaps). -Yonik