Re: Anti-Pattern in lucent-join jar?

Darin Amos Fri, 05 Dec 2014 11:46:52 -0800

In this case I was thinking about something like the following.. if you changed 
the Query implementation or created your own similar query:


If you consider this query: q={!scorejoin from=parent to=id}type:child

public class ScoreJoinQuery extends Query(){


        private Query q = null;
        private IndexSearcher s = null;

        public JoinQuery(Query q, IndexSearcher s){
                this.q = q;   //THis is the term query type:child
                this.s = s;
        }

        .
        .
        .
        public Weight createWeight(…..){
                return new Weight(){
                        .
                        .
                        .
                        public Scorer scorer(){
                                TermsWithScoreCollector collector = new 
TermsWithScoreCollector();
                                JoinQuery.this.s.search(JoinQuery.this.q, 
collector);

                                //do the rest..         

                        }
                        
                }
        }
}

This is what I was thinking in my head…. but I don’t really believe it offers 
any value above how the scorcejoin query works today.



> On Dec 5, 2014, at 2:16 PM, Roman Chyla <roman.ch...@gmail.com> wrote:
> 
> Not sure I understand. It is the searcher which executes the query, how
> would you 'convince' it to pass the query? First the Weight is created,
> weight instance creates scorer - you would have to change the API to do the
> passing (or maybe not...?)
> In my case, the relationships were across index segments, so I had to
> collect them first - but in some other situations, when you look only at
> the data inside one index segments, it _might_ be better to wait
> 
> 
> 
> On Fri, Dec 5, 2014 at 1:25 PM, Darin Amos <dari...@gmail.com> wrote:
> 
>> Couldn’t you just keep passing the wrapped query and searcher down to
>> Weight.scorer()?
>> 
>> This would allow you to wait until the query is executed to do term
>> collection. If you want to protect against creating and executing the query
>> with different searchers, you would have to make the query factory (or
>> constructor) only visible to the query parser or parser plugin?
>> 
>> I might not have followed you, this discussing challenges my understanding
>> of Lucene and SOLR.
>> 
>> Darin
>> 
>> 
>> 
>>> On Dec 5, 2014, at 12:47 PM, Roman Chyla <roman.ch...@gmail.com> wrote:
>>> 
>>> Hi Mikhail, I think you are right, it won't be problem for SOLR, but it
>> is
>>> likely an antipattern inside a lucene component. Because custom
>> components
>>> may create join queries, hold to them and then execute much later
>> against a
>>> different searcher. One approach would be to postpone term collection
>> until
>>> the query actually runs, I looked far and wide for appropriate place, but
>>> only found createWeight() - but at least it does give developers NO
>>> opportunity to shoot their feet! ;-)
>>> 
>>> Since it may serve as an inspiration to someone, here is a link:
>>> 
>> https://github.com/romanchyla/montysolr/blob/master-next/contrib/adsabs/src/java/org/apache/lucene/search/SecondOrderQuery.java#L101
>>> 
>>> roman
>>> 
>>> On Fri, Dec 5, 2014 at 4:52 AM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com
>>>> wrote:
>>> 
>>>> Thanks Roman! Let's expand it for the sake of completeness.
>>>> Such issue is not possible in Solr, because caches are associated with
>> the
>>>> searcher. While you follow this design (see Solr userCache), and don't
>>>> update what's cached once, there is no chance to shoot the foot.
>>>> There were few caches inside of Lucene (old FieldCache,
>>>> CachingWrapperFilter, ExternalFileField, etc), but they are properly
>> mapped
>>>> onto segment keys, hence it exclude such leakage across different
>>>> searchers.
>>>> 
>>>> On Fri, Dec 5, 2014 at 6:43 AM, Roman Chyla <roman.ch...@gmail.com>
>> wrote:
>>>> 
>>>>> +1, additionally (as it follows from your observation) the query can
>> get
>>>>> out of sync with the index, if eg it was saved for later use and ran
>>>>> against newly opened searcher
>>>>> 
>>>>> Roman
>>>>> On 4 Dec 2014 10:51, "Darin Amos" <dari...@gmail.com> wrote:
>>>>> 
>>>>>> Hello All,
>>>>>> 
>>>>>> I have been doing a lot of research in building some custom queries
>>>> and I
>>>>>> have been looking at the Lucene Join library as a reference. I noticed
>>>>>> something that I believe could actually have a negative side effect.
>>>>>> 
>>>>>> Specifically I was looking at the JoinUtil.createJoinQuery(…) method
>>>> and
>>>>>> within that method you see the following code:
>>>>>> 
>>>>>>       TermsWithScoreCollector termsWithScoreCollector =
>>>>>>           TermsWithScoreCollector.create(fromField,
>>>>>> multipleValuesPerDocument, scoreMode);
>>>>>>       fromSearcher.search(fromQuery, termsWithScoreCollector);
>>>>>> 
>>>>>> As you can see, when the JoinQuery is being built, the code is
>>>> executing
>>>>>> the query that is wraps with it’s own collector to collect all the
>>>>> scores.
>>>>>> If I were to write a query parser using this library (which someone
>> has
>>>>>> done here), doesn’t this reduce the benefit of the SOLR query cache?
>>>> The
>>>>>> wrapped query is being executing when the Join Query is being
>>>>> constructed,
>>>>>> not when it is executed.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> Darin
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>> 
>>>> <http://www.griddynamics.com>
>>>> <mkhlud...@griddynamics.com>
>>>> 
>> 
>>

Re: Anti-Pattern in lucent-join jar?

Reply via email to