Re: Nested Join Queries

Gerald Blanck Tue, 13 Nov 2012 17:59:41 -0800

Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
leverage.


- We have modeled our document types as different indexes/cores.
- Our relationships which we are attempting to join across are not
single-parent to many-children relationships.  They are in fact many to
many.
- Additionally, memory usage is a concern.

FYI.  After making the code change I mentioned in my original post, we have
completed a full test cycle and did not experience any adverse impacts to
the change.  And our join query functionality returns the results we
wanted.  I would still be interested in hearing an explanation as to why
the code is written as it is in v4.0.0.

Thanks.




On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Please find reference materials
>
>
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
>
>
>
>
> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
> gerald.bla...@barometerit.com> wrote:
>
>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>>  Thanks.
>>
>>
>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com> wrote:
>>
>>> Replied. pls check maillist.
>>>
>>>
>>>
>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>>> mkhlud...@griddynamics.com> wrote:
>>>
>>>> Gerald,
>>>>
>>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>>> afford less frequent updates?
>>>>
>>>>
>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>>> gerald.bla...@barometerit.com> wrote:
>>>>
>>>>> Thank you Erick for your reply.  I understand that search is not an
>>>>> RDBMS.
>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize and
>>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>>> Solr
>>>>> developers were trying to solve with the addition of the Join query.
>>>>>  And
>>>>> while the example I gave illustrates the problem we are solving with
>>>>> the
>>>>> Join functionality, it is simplistic in nature compared to what we
>>>>> have in
>>>>> actuality.
>>>>>
>>>>> Am still looking for an answer here if someone can shed some light.
>>>>>  Thanks.
>>>>>
>>>>>
>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
>>>>> erickerick...@gmail.com>wrote:
>>>>>
>>>>> > I'm going to go a bit sideways on you, partly because I can't answer
>>>>> the
>>>>> > question <G>...
>>>>> >
>>>>> > But, every time I see someone doing what looks like substituting
>>>>> "core" for
>>>>> > "table" and
>>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>>> preach......
>>>>> >
>>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>>> query in
>>>>> > terms
>>>>> > of search rather than joins. e.g.
>>>>> >
>>>>> > Make each document a combination of the author and the book, with an
>>>>> > additional
>>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>>> really
>>>>> > simple
>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
>>>>> True, this
>>>>> > kind
>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
>>>>> rather than
>>>>> > a query.
>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>>> > explosion, that's
>>>>> > not a problem.
>>>>> >
>>>>> > And the join functionality isn't called "pseudo" for nothing. It was
>>>>> > written for a specific
>>>>> > use-case. It is often expensive, especially when the field being
>>>>> joined has
>>>>> > many unique
>>>>> > values.
>>>>> >
>>>>> > FWIW,
>>>>> > Erick
>>>>> >
>>>>> >
>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>>> > gerald.bla...@barometerit.com> wrote:
>>>>> >
>>>>> > > At a high level, I have a need to be able to execute a query that
>>>>> joins
>>>>> > > across cores, and that query during its joining may join back to
>>>>> the
>>>>> > > originating core.
>>>>> > >
>>>>> > > Example:
>>>>> > > Find all Books written by an Author who has written a best selling
>>>>> Book.
>>>>> > >
>>>>> > > In Solr query syntax
>>>>> > > A) against the book core - bestseller:true
>>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>>> > > to=bookid}bestseller:true
>>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>>> > > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
>>>>> > >
>>>>> > > A - returns results
>>>>> > > B - returns results
>>>>> > > C - does not return results
>>>>> > >
>>>>> > > Given that A and C use the same core, I started looking for join
>>>>> code
>>>>> > that
>>>>> > > compares the originating core to the fromIndex and found this
>>>>> > > in JoinQParserPlugin (line #159).
>>>>> > >
>>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>>> > >
>>>>> > >           // if this is the same core, use the searcher passed
>>>>> in...
>>>>> > > otherwise we could be warming and
>>>>> > >
>>>>> > >           // get an older searcher from the core.
>>>>> > >
>>>>> > >           fromSearcher = searcher;
>>>>> > >
>>>>> > >         } else {
>>>>> > >
>>>>> > >           // This could block if there is a static warming query
>>>>> with a
>>>>> > > join in it, and if useColdSearcher is true.
>>>>> > >
>>>>> > >           // Deadlock could result if two cores both had
>>>>> useColdSearcher
>>>>> > > and had joins that used eachother.
>>>>> > >
>>>>> > >           // This would be very predictable though (should happen
>>>>> every
>>>>> > > time if misconfigured)
>>>>> > >
>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>>> > >
>>>>> > >
>>>>> > >           // be careful not to do anything with this searcher that
>>>>> > requires
>>>>> > > the thread local
>>>>> > >
>>>>> > >           // SolrRequestInfo in a manner that requires the core in
>>>>> the
>>>>> > > request to match
>>>>> > >
>>>>> > >           fromSearcher = fromRef.get();
>>>>> > >
>>>>> > >         }
>>>>> > >
>>>>> > > I found that if I were to modify the above code so that it always
>>>>> follows
>>>>> > > the logic in the else block, I get the results I expect.
>>>>> > >
>>>>> > > Can someone explain to me why the code is written as it is?  And
>>>>> if we
>>>>> > were
>>>>> > > to run with only the else block being executed, what type of
>>>>> adverse
>>>>> > > impacts we might have?
>>>>> > >
>>>>> > > Does anyone have other ideas on how to solve this issue?
>>>>> > >
>>>>> > > Thanks in advance.
>>>>> > > -Gerald
>>>>> > >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Gerald Blanck*
>>>>>
>>>>> baro*m*eter*IT*
>>>>>
>>>>> 1331 Tyler Street NE, Suite 100
>>>>> Minneapolis, MN 55413
>>>>>
>>>>>
>>>>> 612.208.2802
>>>>>
>>>>> gerald.bla...@barometerit.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sincerely yours
>>>> Mikhail Khludnev
>>>> Principal Engineer,
>>>> Grid Dynamics
>>>>
>>>> <http://www.griddynamics.com>
>>>>  <mkhlud...@griddynamics.com>
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mkhlud...@griddynamics.com>
>>>
>>>
>>
>>
>> --
>>
>> *Gerald Blanck*
>>
>> baro*m*eter*IT*
>>
>> 1331 Tyler Street NE, Suite 100
>> Minneapolis, MN 55413
>>
>>
>> 612.208.2802
>>
>> gerald.bla...@barometerit.com
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>
>


-- 

*Gerald Blanck*

baro*m*eter*IT*

1331 Tyler Street NE, Suite 100
Minneapolis, MN 55413


612.208.2802

gerald.bla...@barometerit.com

Re: Nested Join Queries

Reply via email to