Re: Nested Join Queries

Gerald Blanck Wed, 14 Nov 2012 14:44:03 -0800

Mikhail-

Let me know how to contribute a test case and I will put it on my to do
list.


When your many-to-many BlockJoin solution matures I would love to see it.

Thanks.
-Gerald


On Tue, Nov 13, 2012 at 11:52 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Gerald,
> Nice to hear the the your problem is solved. Can you contribute a test
> case to reproduce this issue?
>
> FWIW, my team successfully deals with Many-to-Many in BlockJoin. It works,
> but solution is a little bit immature yet.
>
>
>
> On Wed, Nov 14, 2012 at 5:59 AM, Gerald Blanck <
> gerald.bla...@barometerit.com> wrote:
>
>> Thank you Mikhail.  Unfortunately BlockJoinQuery is not an option we can
>> leverage.
>>
>> - We have modeled our document types as different indexes/cores.
>> - Our relationships which we are attempting to join across are not
>> single-parent to many-children relationships.  They are in fact many to
>> many.
>> - Additionally, memory usage is a concern.
>>
>> FYI.  After making the code change I mentioned in my original post, we
>> have completed a full test cycle and did not experience any adverse impacts
>> to the change.  And our join query functionality returns the results we
>> wanted.  I would still be interested in hearing an explanation as to why
>> the code is written as it is in v4.0.0.
>>
>> Thanks.
>>
>>
>>
>>
>> On Tue, Nov 13, 2012 at 8:31 AM, Mikhail Khludnev <
>> mkhlud...@griddynamics.com> wrote:
>>
>>> Please find reference materials
>>>
>>>
>>> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
>>> http://blog.griddynamics.com/2012/08/block-join-query-performs.html
>>>
>>>
>>>
>>>
>>> On Tue, Nov 13, 2012 at 3:25 PM, Gerald Blanck <
>>> gerald.bla...@barometerit.com> wrote:
>>>
>>>> Thank you.  I've not heard of BlockJoin.  I will look into it today.
>>>>  Thanks.
>>>>
>>>>
>>>> On Tue, Nov 13, 2012 at 5:05 AM, Mikhail Khludnev <
>>>> mkhlud...@griddynamics.com> wrote:
>>>>
>>>>> Replied. pls check maillist.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Nov 13, 2012 at 11:44 AM, Mikhail Khludnev <
>>>>> mkhlud...@griddynamics.com> wrote:
>>>>>
>>>>>> Gerald,
>>>>>>
>>>>>> I wonder if you tried to approach BlockJoin for your problem? Can you
>>>>>> afford less frequent updates?
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <
>>>>>> gerald.bla...@barometerit.com> wrote:
>>>>>>
>>>>>>> Thank you Erick for your reply.  I understand that search is not an
>>>>>>> RDBMS.
>>>>>>>  Yes, we do have a huge combinatorial explosion if we de-normalize
>>>>>>> and
>>>>>>> duplicate data.  In fact, I believe our use case is exactly what the
>>>>>>> Solr
>>>>>>> developers were trying to solve with the addition of the Join query.
>>>>>>>  And
>>>>>>> while the example I gave illustrates the problem we are solving with
>>>>>>> the
>>>>>>> Join functionality, it is simplistic in nature compared to what we
>>>>>>> have in
>>>>>>> actuality.
>>>>>>>
>>>>>>> Am still looking for an answer here if someone can shed some light.
>>>>>>>  Thanks.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <
>>>>>>> erickerick...@gmail.com>wrote:
>>>>>>>
>>>>>>> > I'm going to go a bit sideways on you, partly because I can't
>>>>>>> answer the
>>>>>>> > question <G>...
>>>>>>> >
>>>>>>> > But, every time I see someone doing what looks like substituting
>>>>>>> "core" for
>>>>>>> > "table" and
>>>>>>> > then trying to use Solr like a DB, I get on my soap-box and
>>>>>>> preach......
>>>>>>> >
>>>>>>> > In this case, consider de-normalizing your DB so you can ask the
>>>>>>> query in
>>>>>>> > terms
>>>>>>> > of search rather than joins. e.g.
>>>>>>> >
>>>>>>> > Make each document a combination of the author and the book, with
>>>>>>> an
>>>>>>> > additional
>>>>>>> > field "author_has_written_a_bestseller". Now your query becomes a
>>>>>>> really
>>>>>>> > simple
>>>>>>> > search, "author:name AND author_has_written_a_bestseller:true".
>>>>>>> True, this
>>>>>>> > kind
>>>>>>> > of approach isn't as flexible as an RDBMS, but it's a _search_
>>>>>>> rather than
>>>>>>> > a query.
>>>>>>> > Yes, it replicates data, but unless you have a huge combinatorial
>>>>>>> > explosion, that's
>>>>>>> > not a problem.
>>>>>>> >
>>>>>>> > And the join functionality isn't called "pseudo" for nothing. It
>>>>>>> was
>>>>>>> > written for a specific
>>>>>>> > use-case. It is often expensive, especially when the field being
>>>>>>> joined has
>>>>>>> > many unique
>>>>>>> > values.
>>>>>>> >
>>>>>>> > FWIW,
>>>>>>> > Erick
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
>>>>>>> > gerald.bla...@barometerit.com> wrote:
>>>>>>> >
>>>>>>> > > At a high level, I have a need to be able to execute a query
>>>>>>> that joins
>>>>>>> > > across cores, and that query during its joining may join back to
>>>>>>> the
>>>>>>> > > originating core.
>>>>>>> > >
>>>>>>> > > Example:
>>>>>>> > > Find all Books written by an Author who has written a best
>>>>>>> selling Book.
>>>>>>> > >
>>>>>>> > > In Solr query syntax
>>>>>>> > > A) against the book core - bestseller:true
>>>>>>> > > B) against the author core - {!join fromIndex=book from=id
>>>>>>> > > to=bookid}bestseller:true
>>>>>>> > > C) against the book core - {!join fromIndex=author from=id
>>>>>>> > > to=authorid}{!join fromIndex=book from=id
>>>>>>> to=bookid}bestseller:true
>>>>>>> > >
>>>>>>> > > A - returns results
>>>>>>> > > B - returns results
>>>>>>> > > C - does not return results
>>>>>>> > >
>>>>>>> > > Given that A and C use the same core, I started looking for join
>>>>>>> code
>>>>>>> > that
>>>>>>> > > compares the originating core to the fromIndex and found this
>>>>>>> > > in JoinQParserPlugin (line #159).
>>>>>>> > >
>>>>>>> > >         if (info.getReq().getCore() == fromCore) {
>>>>>>> > >
>>>>>>> > >           // if this is the same core, use the searcher passed
>>>>>>> in...
>>>>>>> > > otherwise we could be warming and
>>>>>>> > >
>>>>>>> > >           // get an older searcher from the core.
>>>>>>> > >
>>>>>>> > >           fromSearcher = searcher;
>>>>>>> > >
>>>>>>> > >         } else {
>>>>>>> > >
>>>>>>> > >           // This could block if there is a static warming query
>>>>>>> with a
>>>>>>> > > join in it, and if useColdSearcher is true.
>>>>>>> > >
>>>>>>> > >           // Deadlock could result if two cores both had
>>>>>>> useColdSearcher
>>>>>>> > > and had joins that used eachother.
>>>>>>> > >
>>>>>>> > >           // This would be very predictable though (should
>>>>>>> happen every
>>>>>>> > > time if misconfigured)
>>>>>>> > >
>>>>>>> > >           fromRef = fromCore.getSearcher(false, true, null);
>>>>>>> > >
>>>>>>> > >
>>>>>>> > >           // be careful not to do anything with this searcher
>>>>>>> that
>>>>>>> > requires
>>>>>>> > > the thread local
>>>>>>> > >
>>>>>>> > >           // SolrRequestInfo in a manner that requires the core
>>>>>>> in the
>>>>>>> > > request to match
>>>>>>> > >
>>>>>>> > >           fromSearcher = fromRef.get();
>>>>>>> > >
>>>>>>> > >         }
>>>>>>> > >
>>>>>>> > > I found that if I were to modify the above code so that it
>>>>>>> always follows
>>>>>>> > > the logic in the else block, I get the results I expect.
>>>>>>> > >
>>>>>>> > > Can someone explain to me why the code is written as it is?  And
>>>>>>> if we
>>>>>>> > were
>>>>>>> > > to run with only the else block being executed, what type of
>>>>>>> adverse
>>>>>>> > > impacts we might have?
>>>>>>> > >
>>>>>>> > > Does anyone have other ideas on how to solve this issue?
>>>>>>> > >
>>>>>>> > > Thanks in advance.
>>>>>>> > > -Gerald
>>>>>>> > >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> *Gerald Blanck*
>>>>>>>
>>>>>>> baro*m*eter*IT*
>>>>>>>
>>>>>>> 1331 Tyler Street NE, Suite 100
>>>>>>> Minneapolis, MN 55413
>>>>>>>
>>>>>>>
>>>>>>> 612.208.2802
>>>>>>>
>>>>>>> gerald.bla...@barometerit.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sincerely yours
>>>>>> Mikhail Khludnev
>>>>>> Principal Engineer,
>>>>>> Grid Dynamics
>>>>>>
>>>>>> <http://www.griddynamics.com>
>>>>>>  <mkhlud...@griddynamics.com>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sincerely yours
>>>>> Mikhail Khludnev
>>>>> Principal Engineer,
>>>>> Grid Dynamics
>>>>>
>>>>> <http://www.griddynamics.com>
>>>>>  <mkhlud...@griddynamics.com>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Gerald Blanck*
>>>>
>>>> baro*m*eter*IT*
>>>>
>>>> 1331 Tyler Street NE, Suite 100
>>>> Minneapolis, MN 55413
>>>>
>>>>
>>>> 612.208.2802
>>>>
>>>> gerald.bla...@barometerit.com
>>>>
>>>>
>>>
>>>
>>> --
>>> Sincerely yours
>>> Mikhail Khludnev
>>> Principal Engineer,
>>> Grid Dynamics
>>>
>>> <http://www.griddynamics.com>
>>>  <mkhlud...@griddynamics.com>
>>>
>>>
>>
>>
>> --
>>
>> *Gerald Blanck*
>>
>> baro*m*eter*IT*
>>
>> 1331 Tyler Street NE, Suite 100
>> Minneapolis, MN 55413
>>
>>
>> 612.208.2802
>>
>> gerald.bla...@barometerit.com
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhlud...@griddynamics.com>
>
>


-- 

*Gerald Blanck*

baro*m*eter*IT*

1331 Tyler Street NE, Suite 100
Minneapolis, MN 55413


612.208.2802

gerald.bla...@barometerit.com

Re: Nested Join Queries

Reply via email to