Re: Post Processing Solr Results

Erick Erickson Mon, 29 Aug 2011 06:59:28 -0700

If you're asking whether there's a way to find, say,
all the values for the "auth" field associated with
a document... no. The nature of an inverted
index makes this hard (think of finding all
the definitions in a dictionary where the word
"earth" was in the definition).


Best
Erick

On Mon, Aug 29, 2011 at 9:21 AM, Jamie Johnson <jej2...@gmail.com> wrote:
> Thanks Erick, if I did not know the token up front that could be in
> the index is there not an efficient way to get the field for a
> specific document and do some custom processing on it?
>
> On Mon, Aug 29, 2011 at 8:34 AM, Erick Erickson <erickerick...@gmail.com> 
> wrote:
>> Start here I think:
>>
>> http://lucene.apache.org/java/3_0_2/api/core/index.html?org/apache/lucene/index/TermDocs.html
>>
>> Best
>> Erick
>>
>> On Mon, Aug 29, 2011 at 8:24 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>>> Thanks for the reply.  The fields I want are indexed, but how would I
>>> go directly at the fields I wanted?
>>>
>>> In regards to indexing the auth tokens I've thought about this and am
>>> trying to get confirmation if that is reasonable given our
>>> constraints.
>>>
>>> On Mon, Aug 29, 2011 at 8:20 AM, Erick Erickson <erickerick...@gmail.com> 
>>> wrote:
>>>> Yeah, loading the document inside a Collector is a
>>>> definite no-no. Have you tried going directly
>>>> at the fields you want (assuming they're
>>>> indexed)? That *should* be much faster, but
>>>> whether it'll be fast enough is a good question. I'm
>>>> thinking some of the Terms methods here. You
>>>> *might* get some joy out of making sure lazy
>>>> field loading is enabled (and make sure the
>>>> fields you're accessing for your logic are
>>>> indexed), but I'm not entirely sure about
>>>> that bit.
>>>>
>>>> This kind of problem is sometimes handled
>>>> by indexing "auth tokens" with the documents
>>>> and including an OR clause on the query
>>>> with the authorizations for a particular
>>>> user, but that works best if there is an upper
>>>> limit (in the 100s) of tokens that a user can possibly
>>>> have, often this works best with some kind of
>>>> grouping. Making this work when a user can
>>>> have tens of thousands of auth tokens is...er...
>>>> contra-indicated...
>>>>
>>>> Hope this helps a bit...
>>>> Erick
>>>>
>>>> On Sun, Aug 28, 2011 at 11:59 PM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>> Just a bit more information.  Inside my class which extends
>>>>> FilteredDocIdSet all of the time seems to be getting spent in
>>>>> retrieving the document from the readerCtx, doing this
>>>>>
>>>>> Document doc = readerCtx.reader.document(docid);
>>>>>
>>>>> If I comment out this and just return true things fly along as I
>>>>> expect.  My query is returning a total of 2 million documents also.
>>>>>
>>>>> On Sun, Aug 28, 2011 at 11:39 AM, Jamie Johnson <jej2...@gmail.com> wrote:
>>>>>> I have a need to post process Solr results based on some access
>>>>>> controls which are setup outside of Solr, currently we've written
>>>>>> something that extends SearchComponent and in the prepare method I'm
>>>>>> doing something like this
>>>>>>
>>>>>>                    QueryWrapperFilter qwf = new
>>>>>> QueryWrapperFilter(rb.getQuery());
>>>>>>                    Filter filter = new CustomFilter(qwf);
>>>>>>                    FilteredQuery fq = new FilteredQuery(rb.getQuery(), 
>>>>>> filter);
>>>>>>                    rb.setQuery(fq);
>>>>>>
>>>>>> Inside my CustomFilter I have a FilteredDocIdSet which checks if the
>>>>>> document should be returned.  This works as I expect but for some
>>>>>> reason is very very slow.  Even if I take out any of the machinery
>>>>>> which does any logic with the document and only return true in the
>>>>>> FilteredDocIdSets match method the query still takes an inordinate
>>>>>> amount of time as compared to not including this custom filter.  So my
>>>>>> question, is this the most appropriate way of handling this?  What
>>>>>> should the performance out of such a setup be expected to be?  Any
>>>>>> information/pointers would be greatly appreciated.
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Post Processing Solr Results

Reply via email to