Sure, I will proceed tomorrow with the Jira and the simple patch + tests. In the meantime let's try to collect some additional feedback.
Cheers On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net> wrote: > Feel free to create a JIRA and put up a patch if you can. > > On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti < > abenede...@apache.org > > wrote: > > > Hi guys, > > While I was exploring the way we build the More Like This query, I > > discovered a part I am not convinced of : > > > > > > > > Let's see how we build the query : > > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int) > > > > 1) we extract the terms from the interesting fields, adding them to a > map : > > > > Map<String, Int> termFreqMap = new HashMap<>(); > > > > *( we lose the relation field-> term, we don't know anymore where the > term > > was coming ! )* > > > > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue > > > > 2) we build the queue that will contain the query terms, at this point we > > connect again there terms to some field, but : > > > > ... > >> // go through all the fields and find the largest document frequency > >> String topField = fieldNames[0]; > >> int docFreq = 0; > >> for (String fieldName : fieldNames) { > >> int freq = ir.docFreq(new Term(fieldName, word)); > >> topField = (freq > docFreq) ? fieldName : topField; > >> docFreq = (freq > docFreq) ? freq : docFreq; > >> } > >> ... > > > > > > We identify the topField as the field with the highest document frequency > > for the term t . > > Then we build the termQuery : > > > > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf)); > > > > In this way we lose a lot of precision. > > Not sure why we do that. > > I would prefer to keep the relation between terms and fields. > > The MLT query can improve a lot the quality. > > If i run the MLT on 2 fields : *description* and *facilities* for > example. > > It is likely I want to find documents with similar terms in the > > description and similar terms in the facilities, without mixing up the > > things and loosing the semantic of the terms. > > > > Let me know your opinion, > > > > Cheers > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card : http://about.me/alessandro_benedetti > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > Anshum Gupta > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England