You may also want to keep an eye on SOLR-8925 which supports distributed, cross collection graph traversals. This may be useful in traversing the relationships.
Joel Bernstein http://joelsolr.blogspot.com/ On Fri, Apr 15, 2016 at 9:56 AM, Joel Bernstein <joels...@gmail.com> wrote: > Solr now has full distributed join capabilities as part of the Streaming > Expression library. Keep in mind that these are distributed joins so they > shuffle records to worker nodes to perform the joins. These are comparable > to joins done by SQL over MapReduce systems, but they are very responsive > and can respond with sub-second response time for fairly large joins in > parallel mode. But these joins do lend themselves to large distributed > architectures (lot's of shards an replicas). Target QPS also needs to be > taken into account and tested in deciding whether these joins will meet the > specific use case. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Fri, Apr 15, 2016 at 9:17 AM, Dennis Gove <dpg...@gmail.com> wrote: > >> The Streaming API with Streaming Expressions (or Parallel SQL if you want >> to use SQL) can give you the functionality you're looking for. See >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions >> and >> https://cwiki.apache.org/confluence/display/solr/Parallel+SQL+Interface. >> SQL queries coming in through the Parallel SQL Interface are translated >> down into Streaming Expressions - if you need to do something that SQL >> doesn't yet support you should check out the Streaming Expressions to see >> if it can support it. >> >> With these you could store your data in separate collections (or the same >> collection with different docType field values) and then during search >> perform a join (inner, outer, hash) across the collections. You could, if >> you wanted, even join with data NOT in solr using the jdbc streaming >> function. >> >> - Dennis Gove >> >> >> On Fri, Apr 15, 2016 at 3:21 AM, Bastien Latard - MDPI AG < >> lat...@mdpi.com.invalid> wrote: >> >>> '*would I then be able to query a specific field of articles or other >>> "table" (with the same OR BETTER performances)?*' >>> -> And especially, would I be able to get only 1 article in the result... >>> >>> On 15/04/2016 09:06, Bastien Latard - MDPI AG wrote: >>> >>> Thanks Jack. >>> >>> I know that Solr is a search engine, but this replace a search in my >>> mysql DB with this model: >>> >>> >>> *My goal is to improve my environment (and my performances at the same >>> time).* >>> >>> *Yes, I have a Solr data model... but atm I created 4 different indexes >>> for "similar service usage".* >>> *So atm, for 70 millions of documents, I am duplicating journal data and >>> publisher data all the time in 1 index (for all articles from the same >>> journal/pub) in order to be able to retrieve all data in 1 query...* >>> >>> *I found yesterday that there is the possibility to create like an array >>> of <entity> in the data-conf.xml.* >>> e.g. (pseudo code - incomplete): >>> <entity name="solr_publisher" query="select name from publishers"> >>> <entity name="solr_journal" query="select name as j_name from journals >>> WHERE publisher_id='${solr_publisher.id}'"> >>> <entity name="solr_articles" query="select title, abstract from articles >>> WHERE journal_id='${solr_journal.id}'"> >>> <entity name="solr_authors" query="select given_name, last_name from >>> authors WHERE article_id='${solr_article.id}'"> >>> >>> >>> * Would this be a good option? Is this the denormalization you were >>> proposing? * >>> >>> *If yes, would I then be able to query a specific field of articles or >>> other "table" (with the same OR BETTER performances)? If yes, I might >>> probably merge all the different indexes together. * >>> *I'm currently joining everything in mysql, so duplicating the fields in >>> the solr (pseudo code):* >>> <entity name="all" query="select * from articles INNER JOIN journal on >>> [...]"> >>> *So I have an index for authors query, a general one for articles (only >>> needed info of other tables) ...* >>> >>> Thanks in advance for the tips. :) >>> >>> Kind regards, >>> Bastien >>> >>> On 14/04/2016 16:23, Jack Krupansky wrote: >>> >>> Solr is a search engine, not a database. >>> >>> JOINs? Although Solr does have some limited JOIN capabilities, they are >>> more for special situations, not the front-line go-to technique for data >>> modeling for search. >>> >>> Rather, denormalization is the front-line go-to technique for data >>> modeling in Solr. >>> >>> In any case, the first step in data modeling is always to focus on your >>> queries - what information will be coming into your apps and what >>> information will the apps want to access based on those inputs. >>> >>> But wait... you say you are upgrading, which suggests that you have an >>> existing Solr data model, and probably queries as well. So... >>> >>> 1. Share at least a summary of your existing Solr data model as well as >>> at least a summary of the kinds of queries you perform today. >>> 2. Tell us what exacting is driving your inquiry - are queries too slow, >>> too cumbersome, not sufficiently powerful, or... what exactly is the >>> problem you need to solve. >>> >>> >>> -- Jack Krupansky >>> >>> On Thu, Apr 14, 2016 at 10:12 AM, Bastien Latard - MDPI AG < >>> <lat...@mdpi.com.invalid>lat...@mdpi.com.invalid> wrote: >>> >>>> Hi Guys, >>>> >>>> *I am upgrading from solr 4.2 to 6.0.* >>>> *I successfully (after some time) migrated the config files and other >>>> parameters...* >>>> >>>> Now I'm just wondering if my indexes are following the best >>>> practices...(and they are probably not :-) ) >>>> >>>> What would be the best if we have this kind of sql data to write in >>>> Solr: >>>> >>>> >>>> I have several different services which need (more or less), different >>>> data based on these JOINs... >>>> >>>> e.g.: >>>> Service A needs lots of data (but bot all), >>>> Service B needs a few data (some fields already included in A), >>>> Service C needs a bit more data than B(some fields already included in >>>> A/B)... >>>> >>>> *1. Would it be better to create one single index?* >>>> *-> i.e.: this will duplicate journal info for every single article* >>>> >>>> *2. Would it be better to create several specific indexes for each >>>> similar services?* >>>> >>>> >>>> >>>> >>>> >>>> *-> i.e.: this will use more space on the disks (and there are >>>> ~70millions of documents to join) 3. Would it be better to create an index >>>> per table and make a join? -> if yes, how?? * >>>> >>>> Kind regards, >>>> Bastien >>>> >>>> >>> >>> Kind regards, >>> Bastien Latard >>> Web engineer >>> -- >>> MDPI AG >>> Postfach, CH-4005 Basel, Switzerland >>> Office: Klybeckstrasse 64, CH-4057 >>> Tel. +41 61 683 77 35 >>> Fax: +41 61 302 89 18 >>> E-mail: latard@mdpi.comhttp://www.mdpi.com/ >>> >>> >>> Kind regards, >>> Bastien Latard >>> Web engineer >>> -- >>> MDPI AG >>> Postfach, CH-4005 Basel, Switzerland >>> Office: Klybeckstrasse 64, CH-4057 >>> Tel. +41 61 683 77 35 >>> Fax: +41 61 302 89 18 >>> E-mail: latard@mdpi.comhttp://www.mdpi.com/ >>> >>> >> >