Hi all,
I was trying out the MoreLikeThis support, and getting some odd results.
I realized that unless the fields being used for similarity
calculation have a stored term vector, the MoreLikeThis code from
Lucene will re-analyze the field using the StandardAnalyzer. Which,
in my case, is quite different from what I'm using in the Solr schema.
So the first note is just for anybody using MoreLikeThis, make sure
you also specify termVectors=true in the Solr schema for any fields
being passed to the query as mlt.fl parameters.
The second note is that the Wiki page and the example schema might
want to include some reference to the termVectors field attribute.
For example, the sample schema says:
<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a previously defined type from
the <types> section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
compressed: [false] if this field should be stored using gzip compression
(this will only apply if the field type is compressable; among
the standard field types, only TextField and StrField are)
multiValued: true if this field may contain multiple values per document
omitNorms: (expert) set to true to omit the norms associated with
this field (this disables length normalization and index-time
boosting for the field, and saves some memory). Only full-text
fields or fields that need an index-time boost need norms.
Which made me think initially these were the only valid attributes
for fields. Likewise the wiki page at
http://wiki.apache.org/solr/SchemaXml also doesn't make any mention
of termVectors, termPositions, or termOffsets. I would edit that
page, but there currently isn't a section that talks about all the
attributes, only the common ones.
Thanks,
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"