Answered my own question...

mlt.mintf: Minimum Term Frequency - the frequency below which terms will be
ignored in the source doc

Our "source doc" is a set of limited terms...not a large content field.  So
in our case I need to set that value to 1 (rather than the default of 2).
 Now I'm getting results...and they indeed are relevant.

Thanks,
Andy Pickler

On Wed, May 22, 2013 at 12:20 PM, Andy Pickler <andy.pick...@gmail.com>wrote:

> I'm a developing a recommendation feature in our app using the
> MoreLikeThisHandler <http://wiki.apache.org/solr/MoreLikeThisHandler>,
> and so far it is doing a great job.  We're using a user's "competency
> keywords" as the MLT field list and the user's corresponding document in
> Solr as the "comparison document".  I have found that for one user I'm not
> receiving any recommendations, and I'm not sure why.
>
> Solr: 4.1.0
>
> *relevant schema*:
>
> <field name="competencyKeywords" type="short-mlt-text" indexed="true"
> stored="true" multiValued="true" termVectors="true"/>
>
>     <fieldType name="short-mlt-text" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> *user's values*:
>
> <arr name="competencyKeywords">
> <str>Healthcare Cost Trends</str>
> </arr>
>
> Is it possible that among all the ~40,000 users in this index (about 500
> of which have the same competency keywords), that the words "healthcare",
> "cost" and "trends" are just judged by Lucene to not be "significant".  I
> realize that I may not understand how the MLT Handler is doing things under
> the covers...I've only been guessing until now based on the (otherwise
> excellent) results I've been seeing.
>
> Thanks,
> Andy Pickler
>
> P.S.  For some additional information, the following query:
>
>
> /mlt?q=objectId:user91813&mlt.fl=competencyKeywords&mlt.interestingTerms=details&debugQuery=true&mlt.match.include=false
>
> ...produces the following results...
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">2</int>
> </lst>
> <result name="response" numFound="0" start="0"/>
> <lst name="interestingTerms"/>
> <lst name="debug">
> <str name="rawquerystring">objectId:user91813</str>
> <str name="querystring">objectId:user91813</str>
> <str name="parsedquery"/>
> <str name="parsedquery_toString"/>
> <lst name="explain"/>
> </lst>
> </response>
>

Reply via email to