Hi Maria,
Would it help to add a filter to your query to restrict the results to
just those where the description field is populated? Eg. add
fq=description:[* TO *]
to your query parameters.
Apologies if I'm misunderstanding the problem!
Best,
Matt
On 28/01/2019 16:29, Maria Mestre wrote:
Hi all,
First of all, I’m not a Java developer, and a SolR newbie. I have worked with
Elasticsearch for some years (not contributing, just as a user), so I think I
have the basics of text search engines covered. I am always learning new things
though!
I created an index in SolR and used more-like-this on it, by passing a
document_id. My data has a special feature, which is that one of the fields is
called “description” but is only populated about 10% of the time. Most of the
time it is empty. I am using that field to query similar documents.
So I query the /mlt endpoint using these parameters (for example):
{q=id:"0c7c4d74-0f37-44ea-8933-cd2ee7964457”,
mlt=true,
mlt.fl=description,
mlt.mindf=1,
mlt.mintf=1,
mlt.maxqt=5,
wt=json,
mlt.interestingTerms=details}
The issue I have is that when retrieving the key scored terms
(interestingTerms), the code uses the total number of documents in the index,
not the total number of documents with populated “description” field. This is
where it’s done in the code:
https://github.com/apache/lucene-solr/blob/master/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java#L651
The effect of this choice is that the “idf” does not vary much, given that numDocs
>> number of documents with “description”, so the key terms end up being just
the terms with the highest term frequencies.
It is inconsistent because the MLT-search then uses these extracted key terms
and scores all documents using an idf which is computed only on the subset of
documents with “description”. So one part of the MLT uses a different numDocs
than another part. This sounds like an odd choice, and not expected at all, and
I wonder if I’m missing something.
Best,
Maria
--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk