Hi Joel, I have managed to get the Join to work, but so far it is only working when I use qt="/select". It is not working when I use qt="/export".
For the display of the field, is there a way to allow it to list them in the order which I want? Currently, the display is quite random, and I can get a field in collection1, followed by a field in collection3, then collection1 again, and then collection2. It will be good if we can arrange the field to display in the order that we want. Regards, Edwin On 4 May 2017 at 09:56, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi Joel, > > It works when I started off with just one expression. > > Could it be that the data size is too big for export after the join, which > causes the error? > > Regards, > Edwin > > On 4 May 2017 at 02:53, Joel Bernstein <joels...@gmail.com> wrote: > >> I was just testing with the query below and it worked for me. Some of the >> error messages I was getting with the syntax was not what I was expecting >> though, so I'll look into the error handling. But the joins do work when >> the syntax correct. The query below is joining to the same collection >> three >> times, but the mechanics are exactly the same joining three different >> tables. In this example each join narrows down the result set. >> >> hashJoin(parallel(collection2, >> workers=3, >> sort="id asc", >> innerJoin(search(collection2, q="*:*", >> fl="id", >> sort="id asc", qt="/export", partitionKeys="id"), >> search(collection2, >> q="year_i:42", fl="id, year_i", sort="id asc", qt="/export", >> partitionKeys="id"), >> on="id")), >> hashed=search(collection2, q="day_i:7", fl="id, day_i", >> sort="id asc", qt="/export"), >> on="id") >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> On Wed, May 3, 2017 at 1:29 PM, Joel Bernstein <joels...@gmail.com> >> wrote: >> >> > Start off with just this expression: >> > >> > search(collection2, >> > q=*:*, >> > fl="a_s,b_s,c_s,d_s,e_s", >> > sort="a_s asc", >> > qt="/export") >> > >> > And then check the logs for exceptions. >> > >> > Joel Bernstein >> > http://joelsolr.blogspot.com/ >> > >> > On Wed, May 3, 2017 at 12:35 PM, Zheng Lin Edwin Yeo < >> edwinye...@gmail.com >> > > wrote: >> > >> >> Hi Joel, >> >> >> >> I am getting this error after I change add qt=/export and removed the >> rows >> >> param. Do you know what could be the reason? >> >> >> >> { >> >> "error":{ >> >> "metadata":[ >> >> "error-class","org.apache.solr.common.SolrException", >> >> "root-error-class","org.apache.http.MalformedChunkCodingExce >> >> ption"], >> >> "msg":"org.apache.http.MalformedChunkCodingException: CRLF >> expected >> >> at >> >> end of chunk", >> >> "trace":"org.apache.solr.common.SolrException: >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of >> >> chunk\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr >> >> iteMap$0(TupleStream.java:79)\r\n\tat >> >> org.apache.solr.response.JSONWriter.writeIterator(JSONRespon >> >> seWriter.java:523)\r\n\tat >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes >> >> ponseWriter.java:175)\r\n\tat >> >> org.apache.solr.response.JSONWriter$2.put(JSONResponseWriter >> >> .java:559)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.TupleStream.writeMap( >> >> TupleStream.java:64)\r\n\tat >> >> org.apache.solr.response.JSONWriter.writeMap(JSONResponseWri >> >> ter.java:547)\r\n\tat >> >> org.apache.solr.response.TextResponseWriter.writeVal(TextRes >> >> ponseWriter.java:193)\r\n\tat >> >> org.apache.solr.response.JSONWriter.writeNamedListAsMapWithD >> >> ups(JSONResponseWriter.java:209)\r\n\tat >> >> org.apache.solr.response.JSONWriter.writeNamedList(JSONRespo >> >> nseWriter.java:325)\r\n\tat >> >> org.apache.solr.response.JSONWriter.writeResponse(JSONRespon >> >> seWriter.java:120)\r\n\tat >> >> org.apache.solr.response.JSONResponseWriter.write(JSONRespon >> >> seWriter.java:71)\r\n\tat >> >> org.apache.solr.response.QueryResponseWriterUtil.writeQueryR >> >> esponse(QueryResponseWriterUtil.java:65)\r\n\tat >> >> org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrC >> >> all.java:732)\r\n\tat >> >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java: >> 473)\r\n\tat >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp >> >> atchFilter.java:345)\r\n\tat >> >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDisp >> >> atchFilter.java:296)\r\n\tat >> >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >> >> r(ServletHandler.java:1691)\r\n\tat >> >> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHan >> >> dler.java:582)\r\n\tat >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped >> >> Handler.java:143)\r\n\tat >> >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHa >> >> ndler.java:548)\r\n\tat >> >> org.eclipse.jetty.server.session.SessionHandler.doHandle( >> >> SessionHandler.java:226)\r\n\tat >> >> org.eclipse.jetty.server.handler.ContextHandler.doHandle( >> >> ContextHandler.java:1180)\r\n\tat >> >> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHand >> >> ler.java:512)\r\n\tat >> >> org.eclipse.jetty.server.session.SessionHandler.doScope( >> >> SessionHandler.java:185)\r\n\tat >> >> org.eclipse.jetty.server.handler.ContextHandler.doScope( >> >> ContextHandler.java:1112)\r\n\tat >> >> org.eclipse.jetty.server.handler.ScopedHandler.handle(Scoped >> >> Handler.java:141)\r\n\tat >> >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha >> >> ndle(ContextHandlerCollection.java:213)\r\n\tat >> >> org.eclipse.jetty.server.handler.HandlerCollection.handle( >> >> HandlerCollection.java:119)\r\n\tat >> >> org.eclipse.jetty.server.handler.HandlerWrapper.handle(Handl >> >> erWrapper.java:134)\r\n\tat >> >> org.eclipse.jetty.server.Server.handle(Server.java:534)\r\n\tat >> >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel. >> java:320)\r\n\tat >> >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConne >> >> ction.java:251)\r\n\tat >> >> org.eclipse.jetty.io.AbstractConnection$ReadCallback. >> >> succeeded(AbstractConnection.java:273)\r\n\tat >> >> org.eclipse.jetty.io.FillInterest.fillable(FillInterest. >> java:95)\r\n\tat >> >> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChann >> >> elEndPoint.java:93)\r\n\tat >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >> >> .executeProduceConsume(ExecuteProduceConsume.java:303)\r\n\tat >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >> >> .produceConsume(ExecuteProduceConsume.java:148)\r\n\tat >> >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >> >> .run(ExecuteProduceConsume.java:136)\r\n\tat >> >> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(Queued >> >> ThreadPool.java:671)\r\n\tat >> >> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedT >> >> hreadPool.java:589)\r\n\tat >> >> java.lang.Thread.run(Thread.java:745)\r\nCaused by: >> >> org.apache.http.MalformedChunkCodingException: CRLF expected at end of >> >> chunk\r\n\tat >> >> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(Chun >> >> kedInputStream.java:255)\r\n\tat >> >> org.apache.http.impl.io.ChunkedInputStream.nextChunk(Chunked >> >> InputStream.java:227)\r\n\tat >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput >> >> Stream.java:186)\r\n\tat >> >> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInput >> >> Stream.java:215)\r\n\tat >> >> org.apache.http.impl.io.ChunkedInputStream.close(ChunkedInpu >> >> tStream.java:316)\r\n\tat >> >> org.apache.http.conn.BasicManagedEntity.streamClosed(BasicMa >> >> nagedEntity.java:164)\r\n\tat >> >> org.apache.http.conn.EofSensorInputStream.checkClose(EofSens >> >> orInputStream.java:228)\r\n\tat >> >> org.apache.http.conn.EofSensorInputStream.close(EofSensorInp >> >> utStream.java:174)\r\n\tat >> >> sun.nio.cs.StreamDecoder.implClose(StreamDecoder.java:378)\r\n\tat >> >> sun.nio.cs.StreamDecoder.close(StreamDecoder.java:193)\r\n\tat >> >> java.io.InputStreamReader.close(InputStreamReader.java:199)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.JSONTupleStream.close >> >> (JSONTupleStream.java:92)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.SolrStream.close(Solr >> >> Stream.java:193)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.CloudSolrStream.close >> >> (CloudSolrStream.java:464)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.HashJoinStream.close( >> >> HashJoinStream.java:231)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.ExceptionStream.close >> >> (ExceptionStream.java:93)\r\n\tat >> >> org.apache.solr.handler.StreamHandler$TimerStream.close( >> >> StreamHandler.java:452)\r\n\tat >> >> org.apache.solr.client.solrj.io.stream.TupleStream.lambda$wr >> >> iteMap$0(TupleStream.java:71)\r\n\t... >> >> 40 more\r\n", >> >> "code":500}} >> >> >> >> >> >> Regards, >> >> Edwin >> >> >> >> >> >> On 4 May 2017 at 00:00, Joel Bernstein <joels...@gmail.com> wrote: >> >> >> >> > I've reformatted the expression below and made a few changes. You >> have >> >> put >> >> > things together properly. But these are MapReduce joins that require >> >> > exporting the entire result sets. So you will need to add qt=/export >> to >> >> all >> >> > the searches and remove the rows param. In Solr 6.6. there is a new >> >> > "shuffle" expression that does this automatically. >> >> > >> >> > To test things you'll want to break down each expression and make >> sure >> >> it's >> >> > behaving as expected. >> >> > >> >> > For example first run each search. Then run the innerJoin, not in >> >> parallel >> >> > mode. Then run it in parallel mode. Then try the whole thing. >> >> > >> >> > hashJoin(parallel(collection2, >> >> > innerJoin(search(collection2, >> >> > q=*:*, >> >> > >> >> > fl="a_s,b_s,c_s,d_s,e_s", >> >> > sort="a_s >> asc", >> >> > >> >> partitionKeys="a_s", >> >> > qt="/export"), >> >> > search(collection1, >> >> > q=*:*, >> >> > >> >> > fl="a_s,f_s,g_s,h_s,i_s,j_s", >> >> > sort="a_s >> asc", >> >> > >> >> partitionKeys="a_s", >> >> > qt="/export"), >> >> > on="a_s"), >> >> > workers="2", >> >> > sort="a_s asc"), >> >> > hashed=search(collection3, >> >> > q=*:*, >> >> > fl="a_s,k_s,l_s", >> >> > sort="a_s asc", >> >> > qt="/export"), >> >> > on="a_s") >> >> > >> >> > Joel Bernstein >> >> > http://joelsolr.blogspot.com/ >> >> > >> >> > On Wed, May 3, 2017 at 11:26 AM, Zheng Lin Edwin Yeo < >> >> edwinye...@gmail.com >> >> > > >> >> > wrote: >> >> > >> >> > > Hi Joel, >> >> > > >> >> > > Thanks for the clarification. >> >> > > >> >> > > Would like to check, is this the correct way to do the join? >> >> Currently, I >> >> > > could not get any results after putting in the hashJoin for the >> 3rd, >> >> > > smallerStream collection (collection3). >> >> > > >> >> > > http://localhost:8983/solr/collection1/stream?expr= >> >> > > hashJoin(parallel(collection2 >> >> > > , >> >> > > innerJoin( >> >> > > search(collection2, >> >> > > q=*:*, >> >> > > fl="a_s,b_s,c_s,d_s,e_s", >> >> > > sort="a_s asc", >> >> > > partitionKeys="a_s", >> >> > > rows=200), >> >> > > search(collection1, >> >> > > q=*:*, >> >> > > fl="a_s,f_s,g_s,h_s,i_s,j_s", >> >> > > sort="a_s asc", >> >> > > partitionKeys="a_s", >> >> > > rows=200), >> >> > > on="a_s"), >> >> > > workers="2", >> >> > > sort="a_s asc"), >> >> > > hashed=search(collection3, >> >> > > q=*:*, >> >> > > fl="a_s,k_s,l_s", >> >> > > sort="a_s asc", >> >> > > rows=200), >> >> > > on="a_s") >> >> > > &indent=true >> >> > > >> >> > > >> >> > > Regards, >> >> > > Edwin >> >> > > >> >> > > >> >> > > On 3 May 2017 at 20:59, Joel Bernstein <joels...@gmail.com> wrote: >> >> > > >> >> > > > Sorry, it's just called hashJoin >> >> > > > >> >> > > > Joel Bernstein >> >> > > > http://joelsolr.blogspot.com/ >> >> > > > >> >> > > > On Wed, May 3, 2017 at 2:45 AM, Zheng Lin Edwin Yeo < >> >> > > edwinye...@gmail.com> >> >> > > > wrote: >> >> > > > >> >> > > > > Hi Joel, >> >> > > > > >> >> > > > > I am getting this error when I used the innerHashJoin. >> >> > > > > >> >> > > > > "EXCEPTION":"Invalid stream expression innerHashJoin(parallel( >> >> > > innerJoin >> >> > > > > >> >> > > > > I also can't find the documentation on innerHashJoin for the >> >> > Streaming >> >> > > > > Expressions. >> >> > > > > >> >> > > > > Are you referring to hashJoin? >> >> > > > > >> >> > > > > Regards, >> >> > > > > Edwin >> >> > > > > >> >> > > > > >> >> > > > > On 3 May 2017 at 13:20, Zheng Lin Edwin Yeo < >> edwinye...@gmail.com >> >> > >> >> > > > wrote: >> >> > > > > >> >> > > > > > Hi Joel, >> >> > > > > > >> >> > > > > > Thanks for the info. >> >> > > > > > >> >> > > > > > Regards, >> >> > > > > > Edwin >> >> > > > > > >> >> > > > > > >> >> > > > > > On 3 May 2017 at 02:04, Joel Bernstein <joels...@gmail.com> >> >> wrote: >> >> > > > > > >> >> > > > > >> Also take a look at the documentation for the "fetch" >> streaming >> >> > > > > >> expression. >> >> > > > > >> >> >> > > > > >> Joel Bernstein >> >> > > > > >> http://joelsolr.blogspot.com/ >> >> > > > > >> >> >> > > > > >> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein < >> >> > joels...@gmail.com> >> >> > > > > >> wrote: >> >> > > > > >> >> >> > > > > >> > Yes you join more then one collection with Streaming >> >> > Expressions. >> >> > > > Here >> >> > > > > >> are >> >> > > > > >> > a few things to keep in mind. >> >> > > > > >> > >> >> > > > > >> > * You'll likely want to use the parallel function around >> the >> >> > > largest >> >> > > > > >> join. >> >> > > > > >> > You'll need to use the join keys as the partitionKeys. >> >> > > > > >> > * innerJoin: requires that the streams be sorted on the >> join >> >> > keys. >> >> > > > > >> > * innerHashJoin: has no sorting requirement. >> >> > > > > >> > >> >> > > > > >> > So a strategy for a three collection join might look like >> >> this: >> >> > > > > >> > >> >> > > > > >> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)), >> >> > > > > smallerStream) >> >> > > > > >> > >> >> > > > > >> > The largest join can be done in parallel using an >> innerJoin. >> >> You >> >> > > can >> >> > > > > >> then >> >> > > > > >> > wrap the stream coming out of the parallel function in an >> >> > > > > innerHashJoin >> >> > > > > >> to >> >> > > > > >> > join it to another stream. >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> > Joel Bernstein >> >> > > > > >> > http://joelsolr.blogspot.com/ >> >> > > > > >> > >> >> > > > > >> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo < >> >> > > > > >> edwinye...@gmail.com> >> >> > > > > >> > wrote: >> >> > > > > >> > >> >> > > > > >> >> Hi, >> >> > > > > >> >> >> >> > > > > >> >> Is it possible to join more than 2 collections using one >> of >> >> the >> >> > > > > >> streaming >> >> > > > > >> >> expressions (Eg: innerJoin)? If not, is there other ways >> we >> >> can >> >> > > do >> >> > > > > it? >> >> > > > > >> >> >> >> > > > > >> >> Currently, I may need to join 3 or 4 collections >> together, >> >> and >> >> > to >> >> > > > > >> output >> >> > > > > >> >> selected fields from all these collections together. >> >> > > > > >> >> >> >> > > > > >> >> I'm using Solr 6.4.2. >> >> > > > > >> >> >> >> > > > > >> >> Regards, >> >> > > > > >> >> Edwin >> >> > > > > >> >> >> >> > > > > >> > >> >> > > > > >> > >> >> > > > > >> >> >> > > > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> >> > >> > >> > >