Re: Nested Join Queries

Mikhail Khludnev Tue, 13 Nov 2012 02:45:37 -0800

Gerald,

I wonder if you tried to approach BlockJoin for your problem? Can you
afford less frequent updates?



On Wed, Nov 7, 2012 at 5:40 PM, Gerald Blanck <gerald.bla...@barometerit.com
> wrote:

> Thank you Erick for your reply.  I understand that search is not an RDBMS.
>  Yes, we do have a huge combinatorial explosion if we de-normalize and
> duplicate data.  In fact, I believe our use case is exactly what the Solr
> developers were trying to solve with the addition of the Join query.  And
> while the example I gave illustrates the problem we are solving with the
> Join functionality, it is simplistic in nature compared to what we have in
> actuality.
>
> Am still looking for an answer here if someone can shed some light.
>  Thanks.
>
>
> On Sat, Nov 3, 2012 at 9:38 PM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > I'm going to go a bit sideways on you, partly because I can't answer the
> > question <G>...
> >
> > But, every time I see someone doing what looks like substituting "core"
> for
> > "table" and
> > then trying to use Solr like a DB, I get on my soap-box and preach......
> >
> > In this case, consider de-normalizing your DB so you can ask the query in
> > terms
> > of search rather than joins. e.g.
> >
> > Make each document a combination of the author and the book, with an
> > additional
> > field "author_has_written_a_bestseller". Now your query becomes a really
> > simple
> > search, "author:name AND author_has_written_a_bestseller:true". True,
> this
> > kind
> > of approach isn't as flexible as an RDBMS, but it's a _search_ rather
> than
> > a query.
> > Yes, it replicates data, but unless you have a huge combinatorial
> > explosion, that's
> > not a problem.
> >
> > And the join functionality isn't called "pseudo" for nothing. It was
> > written for a specific
> > use-case. It is often expensive, especially when the field being joined
> has
> > many unique
> > values.
> >
> > FWIW,
> > Erick
> >
> >
> > On Fri, Nov 2, 2012 at 11:32 AM, Gerald Blanck <
> > gerald.bla...@barometerit.com> wrote:
> >
> > > At a high level, I have a need to be able to execute a query that joins
> > > across cores, and that query during its joining may join back to the
> > > originating core.
> > >
> > > Example:
> > > Find all Books written by an Author who has written a best selling
> Book.
> > >
> > > In Solr query syntax
> > > A) against the book core - bestseller:true
> > > B) against the author core - {!join fromIndex=book from=id
> > > to=bookid}bestseller:true
> > > C) against the book core - {!join fromIndex=author from=id
> > > to=authorid}{!join fromIndex=book from=id to=bookid}bestseller:true
> > >
> > > A - returns results
> > > B - returns results
> > > C - does not return results
> > >
> > > Given that A and C use the same core, I started looking for join code
> > that
> > > compares the originating core to the fromIndex and found this
> > > in JoinQParserPlugin (line #159).
> > >
> > >         if (info.getReq().getCore() == fromCore) {
> > >
> > >           // if this is the same core, use the searcher passed in...
> > > otherwise we could be warming and
> > >
> > >           // get an older searcher from the core.
> > >
> > >           fromSearcher = searcher;
> > >
> > >         } else {
> > >
> > >           // This could block if there is a static warming query with a
> > > join in it, and if useColdSearcher is true.
> > >
> > >           // Deadlock could result if two cores both had
> useColdSearcher
> > > and had joins that used eachother.
> > >
> > >           // This would be very predictable though (should happen every
> > > time if misconfigured)
> > >
> > >           fromRef = fromCore.getSearcher(false, true, null);
> > >
> > >
> > >           // be careful not to do anything with this searcher that
> > requires
> > > the thread local
> > >
> > >           // SolrRequestInfo in a manner that requires the core in the
> > > request to match
> > >
> > >           fromSearcher = fromRef.get();
> > >
> > >         }
> > >
> > > I found that if I were to modify the above code so that it always
> follows
> > > the logic in the else block, I get the results I expect.
> > >
> > > Can someone explain to me why the code is written as it is?  And if we
> > were
> > > to run with only the else block being executed, what type of adverse
> > > impacts we might have?
> > >
> > > Does anyone have other ideas on how to solve this issue?
> > >
> > > Thanks in advance.
> > > -Gerald
> > >
> >
>
>
>
> --
>
> *Gerald Blanck*
>
> baro*m*eter*IT*
>
> 1331 Tyler Street NE, Suite 100
> Minneapolis, MN 55413
>
>
> 612.208.2802
>
> gerald.bla...@barometerit.com
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Re: Nested Join Queries

Reply via email to