Yes, the LimitTokenCountFilterFactory will do the trick.

I have some examples in the book, showing for a given input string, what the output tokens will be.

Otherwise, the Solr Javadoc does given one generic example, but without showing how it actually works:
http://lucene.apache.org/core/4_3_1/analyzers-common/org/apache/lucene/analysis/miscellaneous/LimitTokenCountFilterFactory.html

The new Apache Solr Reference? No mention of the filter.

-- Jack Krupansky

-----Original Message----- From: Daniel Collins
Sent: Wednesday, June 26, 2013 3:38 AM
To: solr-user@lucene.apache.org
Subject: How to truncate a particular field, LimitTokenCountAnalyzer or LimitTokenCountFilter?

We have a requirement to grab the first N words in a particular field and
weight them differently for scoring purposes.  So I thought to use a
<copyField> and have some extra filter on the destination to truncate it
down (post tokenization).

Did a quick search and found both a LimitTokenCountAnalyzer
and LimitTokenCountFilter mentioned, if I read the wiki right, the Filter
is the correct approach for Solr as we have the schema-able analyzer chain,
so we don't need to code anything, right?

The Analyzer version would be more useful if we were explicitly coding up a
set of operations in Java, so that's what Lucene users directly would tend
to use.

Just in search of confirmation really.

Reply via email to