Answered my own question... mlt.mintf: Minimum Term Frequency - the frequency below which terms will be ignored in the source doc
Our "source doc" is a set of limited terms...not a large content field. So in our case I need to set that value to 1 (rather than the default of 2). Now I'm getting results...and they indeed are relevant. Thanks, Andy Pickler On Wed, May 22, 2013 at 12:20 PM, Andy Pickler <andy.pick...@gmail.com>wrote: > I'm a developing a recommendation feature in our app using the > MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>, > and so far it is doing a great job. We're using a user's "competency > keywords" as the MLT field list and the user's corresponding document in > Solr as the "comparison document". I have found that for one user I'm not > receiving any recommendations, and I'm not sure why. > > Solr: 4.1.0 > > *relevant schema*: > > <field name="competencyKeywords" type="short-mlt-text" indexed="true" > stored="true" multiValued="true" termVectors="true"/> > > <fieldType name="short-mlt-text" class="solr.TextField" > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.PorterStemFilterFactory"/> > </analyzer> > </fieldType> > > *user's values*: > > <arr name="competencyKeywords"> > <str>Healthcare Cost Trends</str> > </arr> > > Is it possible that among all the ~40,000 users in this index (about 500 > of which have the same competency keywords), that the words "healthcare", > "cost" and "trends" are just judged by Lucene to not be "significant". I > realize that I may not understand how the MLT Handler is doing things under > the covers...I've only been guessing until now based on the (otherwise > excellent) results I've been seeing. > > Thanks, > Andy Pickler > > P.S. For some additional information, the following query: > > > /mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false > > ...produces the following results... > > <response> > <lst name="responseHeader"> > <int name="status">0</int> > <int name="QTime">2</int> > </lst> > <result name="response" numFound="0" start="0"/> > <lst name="interestingTerms"/> > <lst name="debug"> > <str name="rawquerystring">objectId:user91813</str> > <str name="querystring">objectId:user91813</str> > <str name="parsedquery"/> > <str name="parsedquery_toString"/> > <lst name="explain"/> > </lst> > </response> >