Hi Alessandro, Exactly. The response time varies but let's have a concrete other example. This is my call: http://localhost:8983/solr/bandwpl/mlt?q=id:10812&fl=id
This is my result: { "responseHeader":{ "status":0, "QTime":6232}, "response":{"numFound":4564,"start":0,"docs":[ { "id":"11335"}, { "id":"14984"}, { "id":"13948"}, { "id":"11105"}, { "id":"12122"}, { "id":"12315"}, { "id":"19145"}, { "id":"11843"}, { "id":"11640"}, { "id":"19053"}] }, "interestingTerms":[ "content:hinduski",1.0, "content:hindus",1.0174515, "content:głowa",1.0453196, "content:życie",1.0666888, "content:czas",1.0824177, "content:kobieta",1.0927386, "content:indie",1.119314, "content:quentin",1.1349105, "content:madras",1.239089, "content:musieć",1.2626213, "content:matka",1.2966589, "content:chcieć",1.299024, "content:domu",1.3370595, "content:stać",1.4053295, "content:sari",1.4284334, "content:ojciec",1.4596463, "content:lindsay",1.5857035, "content:wiedzieć",1.6952671, "content:powiedzieć",1.8430523, "content:baba",1.8915937, "content:mieć",2.1113522, "content:Nata",2.4373012, "content:Gopal",2.518996, "content:david",3.0211911, "content:Trixie",7.082156]} Cheers, Roland 2015-09-30 10:16 GMT+02:00 Alessandro Benedetti <benedetti.ale...@gmail.com> : > I am still missing why you quote the number of the documents... > If you have 5600 polish books, but you use the MLT only when you land in > the page of a specific book ... > I think i still miss the point ! > MLT on 1 polish book, takes 7 secs ? > > > 2015-09-30 9:10 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>: > > > Hi Alessandro, > > > > You are right. I forget to mention one important factor. For 3000 > hungarian > > e-books the approach you mentioned is absolutely fine as the response > time > > is some 0.7 sec. But when I use the same mlt for 5600 polish e-books the > > response time is 7 sec which is definetely not acceptable for the users. > > > > Regards, > > Roland > > > > 2015-09-29 17:19 GMT+02:00 Alessandro Benedetti < > > benedetti.ale...@gmail.com> > > : > > > > > Hi Roland, > > > you said "The main goal is that when a customer is on the pruduct page > ". > > > But if you are in a product page, I guess you have the product Id. > > > If you have the product id , you can simply execute the MLT request > with > > > the single Doc Id in input. > > > > > > Why do you need to calculate beforehand? > > > > > > Cheers > > > > > > 2015-09-29 15:44 GMT+01:00 Szűcs Roland <szucs.rol...@bookandwalk.hu>: > > > > > > > Hello Upayavira, > > > > > > > > The main goal is that when a customer is on the pruduct page on an > > e-book > > > > and he does not like it somehow I want to immediately offer her/him > > > > alternative e-books in the same topic. If I expect from the customer > to > > > > click on a button like "similar e-books" I lose half of them as they > > are > > > > lazy to click anywhere. So I would like to present on the product > pages > > > the > > > > alternatives of the e-books without clicking. > > > > > > > > I assumed the best idea to claculate the similar e-books for all the > > > other > > > > (n*(n-1) similarity calculation) and present only the top 5. I > planned > > to > > > > do it when our server is not busy. In this point I found the > > description > > > of > > > > mlt as a search component which seemed to be a good candidate as it > > > > calculates the similar documents to all the result set of the query. > So > > > if > > > > I say q=*:* and mlt component is enabled I get similar document for > my > > > > entire document set. The only problem was with this approach that mlt > > > > search component does not give back the interesting terms for my tag > > > cloud > > > > calculation. > > > > > > > > That's why I tried to mix the flexibility of mlt compoonent (multiple > > > docs > > > > as an input accepted) with the robustness of MoreLikeThisHandler > > (having > > > > interesting terms). > > > > > > > > If there is no solution, I will use the mlt component and solve the > tag > > > > cloud calculation other way. By the way if I am not mistaken, the > 5.3.1 > > > > version takes the union of the feature set of the mlt component, and > > > > handler > > > > > > > > Best Regards, > > > > Roland > > > > > > > > > > > > > > > > 2015-09-29 14:38 GMT+02:00 Upayavira <u...@odoko.co.uk>: > > > > > > > > > Let's take a step back. So, you have 3000 or so docs, and you want > to > > > > > know which documents are similar to these. > > > > > > > > > > Why do you want to know this? What feature do you need to build > that > > > > > will use that information? Knowing this may help us to arrive at > the > > > > > right technology for you. > > > > > > > > > > For example, you might want to investigate offline clustering > > > algorithms > > > > > (e.g. [1], which might be a bit dense to follow). A good book on > > > machine > > > > > learning if you are okay with Python is "Programming Collective > > > > > Intelligence" as it explains the usual algorithms with simple for > > loops > > > > > making it very clear. > > > > > > > > > > Or, you could do searches, and then cluster the results at search > > time > > > > > (so if you search for 100 docs, it will identify clusters within > > those > > > > > 100 matching documents). That might get you there. See [2] > > > > > > > > > > So, if you let us know what the end-goal is, perhaps we can suggest > > an > > > > > alternative approach, rather than burying ourselves neck-deep in > MLT > > > > > problems. > > > > > > > > > > Upayavira > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > http://mylazycoding.blogspot.co.uk/2012/03/cluster-apache-solr-data-using-apache_13.html > > > > > [2] > > https://cwiki.apache.org/confluence/display/solr/Result+Clustering > > > > > > > > > > On Tue, Sep 29, 2015, at 12:42 PM, Szűcs Roland wrote: > > > > > > Hello Upayavira, > > > > > > > > > > > > Thanks dealing with my issue. I have applied already the > > > > termVectors=true > > > > > > to all fileds involved in the more like this calculation. I have > > > just 3 > > > > > > 000 > > > > > > documents each of them is represented by a relativly big term > > vector > > > > with > > > > > > more than 20 000 unique terms. If I run the more like this > handler > > > for > > > > a > > > > > > solr doc it takes close to 1 sec to get back the first 10 similar > > > > > > documents. Aftwr this I have to pass the docid-s to my other > > > > application > > > > > > which find the cover of the e-book and other metadata and put it > on > > > the > > > > > > web. The end-to-end process takes too much time from customer > > > > perspective > > > > > > that is why I tried to find solution for offline more like this > > > > > > calculation. But if my app has to call the morelikethishandler > for > > > each > > > > > > doc > > > > > > it puts overhead for the offline calculation. > > > > > > > > > > > > Best Regards, > > > > > > Roland > > > > > > > > > > > > 2015-09-29 13:01 GMT+02:00 Upayavira <u...@odoko.co.uk>: > > > > > > > > > > > > > If MoreLikeThis is slow for large documents that are indexed, > > have > > > > you > > > > > > > enabled term vectors on the similarity fields? > > > > > > > > > > > > > > Basically, what more like this does is this: > > > > > > > > > > > > > > * decide on what terms in the source doc are "interesting", and > > > pick > > > > > the > > > > > > > 25 most interesting ones > > > > > > > * build and execute a boolean query using these interesting > > terms. > > > > > > > > > > > > > > Looking at the first phase of this in more detail: > > > > > > > > > > > > > > If you pass in a document using stream.body, it will analyse > this > > > > > > > document into terms, and then calculate the most interesting > > terms > > > > from > > > > > > > that. > > > > > > > > > > > > > > If you reference document in your index with a field that is > > > stored, > > > > it > > > > > > > will take the stored version, and analyse it and identify the > > > > > > > interesting terms from there. > > > > > > > > > > > > > > If, however, you have stored term vectors against that field, > > this > > > > work > > > > > > > is not needed. You have already done much of the work, and the > > > > > > > identification of your "interesting terms" will be much faster. > > > > > > > > > > > > > > Thus, on the content field of your documents, add > > > termVectors="true" > > > > in > > > > > > > your schema, and re-index. Then you could well find MLT > becoming > > a > > > > lot > > > > > > > more efficient. > > > > > > > > > > > > > > Upayavira > > > > > > > > > > > > > > On Tue, Sep 29, 2015, at 10:39 AM, Szűcs Roland wrote: > > > > > > > > Hi Alessandro, > > > > > > > > > > > > > > > > My original goal was to get offline suggestsion on content > > based > > > > > > > > similarity > > > > > > > > for every e-book we have . We wanted to run a bulk more like > > this > > > > > > > > calculation in the evening when the usage of our site is low > > and > > > we > > > > > > > > submit > > > > > > > > a new e-book. Real time more like this can take a while as we > > > have > > > > > > > > typically long documents (2-5MB text) with all the content > > > indexed. > > > > > > > > > > > > > > > > When we upload a new document we wanted to recalculate the > more > > > > like > > > > > this > > > > > > > > suggestions and a tf-idf based tag cloouds. Both of them are > > > > > delivered by > > > > > > > > the More LikeThisHandler but only for one document as you > > wrote. > > > > > > > > > > > > > > > > The text input is not good for us because we need the similar > > doc > > > > > list > > > > > > > > for > > > > > > > > each of the matched document. If I put together text of 10 > > > document > > > > > I can > > > > > > > > not separate which suggestion relates to which matched > document > > > and > > > > > also > > > > > > > > the tag cloud will belong to the mixed text. > > > > > > > > > > > > > > > > Most likley we will use the MoreLikeThisHandler for each of > the > > > > > documents > > > > > > > > and parse the json repsonse and store the result in a DQL > > > database > > > > > > > > > > > > > > > > Thanks your help. > > > > > > > > > > > > > > > > 2015-09-29 11:18 GMT+02:00 Alessandro Benedetti > > > > > > > > <benedetti.ale...@gmail.com> > > > > > > > > : > > > > > > > > > > > > > > > > > Hi Roland, > > > > > > > > > what is your exact requirement ? > > > > > > > > > Do you want to basically build a "description" for a set of > > > > > documents > > > > > > > and > > > > > > > > > then find documents in the index, similar to this > > description ? > > > > > > > > > > > > > > > > > > By default , based on my experience ( and on the code) this > > is > > > > the > > > > > > > entry > > > > > > > > > point for the Lucene More Like This : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *org.apache.lucene.queries.mlt.MoreLikeThis/*** Return a > > > query > > > > > that > > > > > > > will > > > > > > > > > > return docs like the passed lucene document ID.** @param > > > docNum > > > > > the > > > > > > > > > > documentID of the lucene doc to generate the 'More Like > > This" > > > > > query > > > > > > > for.* > > > > > > > > > > @return a query that will return docs like the passed > > lucene > > > > > document > > > > > > > > > > ID.*/public Query like(int docNum) throws IOException {if > > > > > > > (fieldNames == > > > > > > > > > > null) {// gather list of valid fields from > > > > > luceneCollection<String> > > > > > > > > > fields > > > > > > > > > > = MultiFields.getIndexedFields(ir);fieldNames = > > > > > fields.toArray(new > > > > > > > > > > String[fields.size()]);}return > > > > > createQuery(retrieveTerms(docNum));}* > > > > > > > > > > > > > > > > > > It means that talking about "documents" you can feed only > one > > > > Solr > > > > > doc. > > > > > > > > > > > > > > > > > > But you can also feed the MLT with simple text. > > > > > > > > > > > > > > > > > > So you should study better your use case and understand > which > > > > > option > > > > > > > > > fits better : > > > > > > > > > > > > > > > > > > 1) customising the MLT component starting from Lucene > > > > > > > > > > > > > > > > > > 2) doing some processing client side and use the "text" > > > > similarity > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > > > > > > > > > > > > > > > > > > > > 2015-09-29 10:05 GMT+01:00 Roland Szűcs < > > > > > roland.sz...@bookandwalk.com > > > > > > > >: > > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > Is it possible to feed multiple solr id for a > > > > > MoreLikeThisHandler? > > > > > > > > > > > > > > > > > > > > <requestHandler name="/mlt" > > class="solr.MoreLikeThisHandler"> > > > > > > > > > > <lst name="defaults"> > > > > > > > > > > <str name="mlt.match.include">false</str> > > > > > > > > > > <str name="mlt.interestingTerms">details</str> > > > > > > > > > > <str name="mlt.fl">title,content</str> > > > > > > > > > > <str name="mlt.minwl">4</str> > > > > > > > > > > <str name="mlt.qf">title^12 content^1</str> > > > > > > > > > > <str name="mlt.mintf">2</str> > > > > > > > > > > <int name="mlt.count">10</int> > > > > > > > > > > <str name="mlt.boost">true</str> > > > > > > > > > > <str name="wt">json</str> > > > > > > > > > > <str name="indent">true</str> > > > > > > > > > > </lst> > > > > > > > > > > </requestHandler> > > > > > > > > > > > > > > > > > > > > when I call this: > > > > > > > http://localhost:8983/solr/bandwhu/mlt?q=id:8&fl=id > > > > > > > > > > it works fine. Is there any way to have a kind of "bulk" > > > call > > > > of > > > > > > > more > > > > > > > > > like > > > > > > > > > > this handler . I need the intresting terms as well and as > > far > > > > as > > > > > I > > > > > > > know > > > > > > > > > if > > > > > > > > > > i use more like this as a search component it does not > > return > > > > > with > > > > > > > it so > > > > > > > > > it > > > > > > > > > > is not an alternative. > > > > > > > > > > > > > > > > > > > > Thanks in advance, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > < > > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > > > >Roland > > > > > > > > > Szűcs > > > > > > > > > > < > > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > > > >Connect > > > > > > > > > with > > > > > > > > > > me on Linkedin < > > > > > > > > > > > > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > > > > > > > > > > <https://bookandwalk.hu/>CEOPhone: +36 1 210 81 > > > > 13Bookandwalk.hu > > > > > > > > > > <https://bokandwalk.hu/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > -------------------------- > > > > > > > > > > > > > > > > > > Benedetti Alessandro > > > > > > > > > Visiting card - http://about.me/alessandro_benedetti > > > > > > > > > Blog - http://alexbenedetti.blogspot.co.uk > > > > > > > > > > > > > > > > > > "Tyger, tyger burning bright > > > > > > > > > In the forests of the night, > > > > > > > > > What immortal hand or eye > > > > > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > >Szűcs > > > > > > > Roland > > > > > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > > > > > >Ismerkedjünk > > > > > > > > meg a Linkedin > > > > > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > > > > > > > > -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 > > > > > > > > 13Bookandwalk.hu > > > > > > > > <https://bokandwalk.hu/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > >Szűcs > > > > > Roland > > > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > > > >Ismerkedjünk > > > > > > meg a Linkedin > > > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > > > > > > -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 > > > > > > 13Bookandwalk.hu > > > > > > <https://bokandwalk.hu/> > > > > > > > > > > > > > > > > > > > > > -- > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs > > > Roland > > > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > > > >Ismerkedjünk > > > > meg a Linkedin < > > > > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > > > > -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 > > > > 13Bookandwalk.hu > > > > <https://bokandwalk.hu/> > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card - http://about.me/alessandro_benedetti > > > Blog - http://alexbenedetti.blogspot.co.uk > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > -- > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs > Roland > > <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu > >Ismerkedjünk > > meg a Linkedin < > > https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> > > -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 > > 13Bookandwalk.hu > > <https://bokandwalk.hu/> > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England > -- <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Szűcs Roland <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu>Ismerkedjünk meg a Linkedin <https://www.linkedin.com/pub/roland-sz%C5%B1cs/28/226/24/hu> -en <https://bookandwalk.hu/>ÜgyvezetőTelefon: +36 1 210 81 13Bookandwalk.hu <https://bokandwalk.hu/>