If you're asking whether there's a way to find, say, all the values for the "auth" field associated with a document... no. The nature of an inverted index makes this hard (think of finding all the definitions in a dictionary where the word "earth" was in the definition).
Best Erick On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson <jej2...@gmail.com> wrote: > Thanks Erick, if I did not know the token up front that could be in > the index is there not an efficient way to get the field for a > specific document and do some custom processing on it? > > On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> > wrote: >> Start here I think: >> >> http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html >> >> Best >> Erick >> >> On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>> Thanks for the reply. The fields I want are indexed, but how would I >>> go directly at the fields I wanted? >>> >>> In regards to indexing the auth tokens I've thought about this and am >>> trying to get confirmation if that is reasonable given our >>> constraints. >>> >>> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson <erickerick...@gmail.com> >>> wrote: >>>> Yeah, loading the document inside a Collector is a >>>> definite no-no. Have you tried going directly >>>> at the fields you want (assuming they're >>>> indexed)? That *should* be much faster, but >>>> whether it'll be fast enough is a good question. I'm >>>> thinking some of the Terms methods here. You >>>> *might* get some joy out of making sure lazy >>>> field loading is enabled (and make sure the >>>> fields you're accessing for your logic are >>>> indexed), but I'm not entirely sure about >>>> that bit. >>>> >>>> This kind of problem is sometimes handled >>>> by indexing "auth tokens" with the documents >>>> and including an OR clause on the query >>>> with the authorizations for a particular >>>> user, but that works best if there is an upper >>>> limit (in the 100s) of tokens that a user can possibly >>>> have, often this works best with some kind of >>>> grouping. Making this work when a user can >>>> have tens of thousands of auth tokens is...er... >>>> contra-indicated... >>>> >>>> Hope this helps a bit... >>>> Erick >>>> >>>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>> Just a bit more information. Inside my class which extends >>>>> FilteredDocIdSet all of the time seems to be getting spent in >>>>> retrieving the document from the readerCtx, doing this >>>>> >>>>> Document doc = readerCtx.reader.document(docid); >>>>> >>>>> If I comment out this and just return true things fly along as I >>>>> expect. My query is returning a total of 2 million documents also. >>>>> >>>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> wrote: >>>>>> I have a need to post process Solr results based on some access >>>>>> controls which are setup outside of Solr, currently we've written >>>>>> something that extends SearchComponent and in the prepare method I'm >>>>>> doing something like this >>>>>> >>>>>> QueryWrapperFilter qwf = new >>>>>> QueryWrapperFilter(rb.getQuery()); >>>>>> Filter filter = new CustomFilter(qwf); >>>>>> FilteredQuery fq = new FilteredQuery(rb.getQuery(), >>>>>> filter); >>>>>> rb.setQuery(fq); >>>>>> >>>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the >>>>>> document should be returned. This works as I expect but for some >>>>>> reason is very very slow. Even if I take out any of the machinery >>>>>> which does any logic with the document and only return true in the >>>>>> FilteredDocIdSets match method the query still takes an inordinate >>>>>> amount of time as compared to not including this custom filter. So my >>>>>> question, is this the most appropriate way of handling this? What >>>>>> should the performance out of such a setup be expected to be? Any >>>>>> information/pointers would be greatly appreciated. >>>>>> >>>>> >>>> >>> >> >