Hi again, I got the join to work. A team mate pointed out that one of the search functions in the innerJoin query was missing a field in the join - adding the e1 field to the fl parameter of the second search function gave the result I expected:
http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted , fl="id", q=text:John, sort="id asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, fl="id,e1", q=text:Friends, sort="id asc",zkHost="localhost:9983",qt="/export"), on="id=e1") I am still interested in whether we can specify a join, using an arbitrary number of searches. Cheers Akiel From: Akiel Ahmed/UK/IBM@IBMGB To: solr-user@lucene.apache.org Date: 16/12/2015 17:05 Subject: Re: Solr 6 Distributed Join Hi Dennis, Thank you for your help. I used your explanation to construct an innerJoin query; I think I am getting further but didn't get the results I expected. The following describes what I did – is there any chance you can tell where I am going wrong: Solr 6 Developer Builds: #2738 and #2743 1. Modified server/solr/configsets/basic_configs/conf/managed-schema so it reads: <?xml version="1.0" encoding="UTF-8" ?> <schema name="search" version="1.5"> <uniqueKey>id</uniqueKey> <field name="id" type="id" indexed="true" stored="true" required="true" multiValued="false" docValues="true"/> <field name="_version_" type="solr_version" indexed="true" stored="true" required="false" multiValued="false" docValues="true"/> <field name="type" type="id" indexed="true" stored="true" required="false" multiValued="false" docValues="true"/> <field name="e1" type="id" indexed="true" stored="true" required="false" multiValued="false" docValues="true"/> <field name="e2" type="id" indexed="true" stored="true" required="false" multiValued="false" docValues="true"/> <field name="text" type="free_text" indexed="true" stored="true" required="false" multiValued="false"/> <fieldType name="id" class="solr.StrField" sortMissingLast="true"/> <fieldType name="solr_version" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="free_text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/> </analyzer> </fieldType> </schema> 2. Modified server/solr/configsets/basic_configs/conf/solrconfig.xml, adding the following near the bottom of the file so it is the last request handler <requestHandler name="/stream" class="solr.StreamHandler"> <lst name="invariants"> <str name="wt">json</str> <str name="distrib">false</str> </lst> </requestHandler> 3. Used solr -e cloud to setup a solr cloud instance, picking all the defaults except I chose basic_configs 4. After solr is running I ingested the following data via the Solr Web UI (/update handler, Document Type = CSV) id,type,e1,e2,text 1,ABC,,,John Smith 2,ABC,,,Jane Smith 3,ABC,,,MiKe Smith 4,ABC,,,John Doe 5,ABC,,,Jane Doe 6,ABC,,,MiKe Doe 7,ABC,,,John Smith 8,DEF,,,Chicken Burger 9,DEF,,,Veggie Burger 10,DEF,,,Beef Burger 11,DEF,,,Chicken Donar 12,DEF,,,Chips 13,DEF,,,Drink 20,GHI,1,2,Friends 21,GHI,3,4,Friends 22,GHI,5,6,Friends 23,GHI,7,6,Friends 24,GHI,6,4,Friends 25,JKL,1,8,Order 26,JKL,2,9,Order 27,JKL,3,10,Order 28,JKL,4,11,Order 29,JKL,5,12,Order 30,JKL,6,13,Order 5. Navigating to the following URL in a browser returned an expected result: http://localhost:8983/solr/gettingstarted/select?q={!join from=id to=e1}text:John&fl="id" <response> ... <result> <doc> <str name="id">20</str> <str name="e1">1</str> <str name="e2">2</str> ... </doc> <doc> <str name="id">28</str> <str name="e1">4</str> <str name="e2">11</str> ... </doc> <doc> <str name="id">23</str> <str name="e1">7</str> <str name="e2">6</str> ... </doc> </result> </response> 6. Navigating to the following URL in a browser does NOT return what I expected: http://localhost:8983/solr/gettingstarted/stream?stream=innerJoin(search(gettingstarted , fl="id", q=text:John, sort="id asc",zkHost="localhost:9983",qt="/export"), search(gettingstarted, fl="id", q=text:Friends, sort="id asc",zkHost="localhost:9983",qt="/export"), on="id=e1") {"result-set":{"docs":[ {"EOF":true,"RESPONSE_TIME":124}]}} I also have a join related question. Is there any chance I can specify a query and join for more than 2 things. For example: innerJoin(search(gettingstarted, fl="id", q=text:John, ...) as s1, search(gettingstarted, fl="id", q=text:Chicken, ...) as s2 search(gettingstarted, fl="id", q=text:Friends, ...) as s3) on="s1.id=s3.e1", on="s2.id=s3.e2") Sorry if the query does not make sense, but given the data above my intention is to find a single result made up of 3 documents: s1.id=1,s2.id=8,s3.id=25 Is that possible? If yes, will Solr 6 support an arbitrary number of queries and associated joins? Cheers Akiel From: Dennis Gove <dpg...@gmail.com> To: Akiel Ahmed/UK/IBM@IBMGB, solr-user@lucene.apache.org Date: 11/12/2015 15:34 Subject: Re: Solr 6 Distributed Join Akiel, Without seeing your full url I assume that you're missing the stream=innerJoin(.....) part of it. A full sample url would look like this http://localhost:8983/solr/careers/stream?stream=innerJoin(search(careers, fl="personId,companyId,title", q=companyId:*, sort="companyId asc",zkHost="localhost:2181",qt="/export"),search(companies, fl="id,companyName", q=*:*, sort="id asc",zkHost="localhost:2181",qt="/export"),on="companyId=id") This example will return a join of career records with the company name for all career records with a non-null companyId. And the pieces have the following meaning: http://localhost:8983/solr/careers/stream? - you have a collection called careers available on localhost:8983 and you're hitting its stream handler ?stream= - you are passing the stream parameter to the stream handler zkHost="localhost:2181" - there is a zk instance running on localhost:2181 where solr can get clusterstate information. Note, that since you're sending the request to the careers collection this param is not required in the search(careers....) part but is required in the search(companies....) part. For simplicity I usually just provide it for all. qt="/export" - tells solr to use the export handler. this assumes all your fields are in docValues. if you'd rather not use the export handler then you probably want to provide the rows=##### param to tell solr to return a large # of rows for each underlying search. Without it solr will default to, I believe, 10 rows. CCing the user list so others can see this as well. We're working on additional documentation for Streaming Aggregation and Expressions. The page can be found at https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions but it's missing a lot of things we've added recently. - Dennis On Fri, Dec 11, 2015 at 9:51 AM, Akiel Ahmed <ahmed...@uk.ibm.com> wrote: > Hi, > > Sorry, this is out of the blue - I have joined the Solr mailing list, but > I don't know if that it is the correct place to ask my question. If you are > not the best person to talk to can you please point me in the right > direction. > > I want to try using the Solr 6 distributed joins but cant find enough > material on the web to make it work. I have added the stream handler to my > solrconfig.xml (see below) and when issuing an inner join query (see below) > I get a an error - the localparm named stream is missing so I get a > NullPointerException. Is there a way to play with the join via the Solr web > UI, or if not do you have a code snippet via a SolrJ client that performs a > join? > > solrconfig.xml > > <requestHandler name="/stream" class="solr.StreamHandler"> > <lst name="invariants"> > <str name="wt">json</str> > <str name="distrib">false</str> > </lst> > </requestHandler> > > query > innerJoin( > search(getting_started, _search_field:john), > search(getting_started, _search_field:friends), > on="id=_link_from_id") > > Cheers > > Akiel > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU