Thank you Ahmet, this is exactly what I was looking for.  Looks like
the shingle filter can produce 3+-gram terms as well, that's great.
I'm going to try this with both western and CJK language tokenizers
and see how it turns out.

On Tue, Feb 9, 2010 at 5:07 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
>> I've been looking at the Solr TermVectorComponent
>> (http://wiki.apache.org/solr/TermVectorComponent) and it
>> seems to have
>> something similar to this, but it looks to me like this is
>> a component
>> that is processed at query time (?) and is limited to
>> 1-gram terms.
>
> If you use <filter class="solr.ShingleFilterFactory" maxShingleSize="2" 
> outputUnigrams="false"/> it can give you info about 2-gram terms.
>
>> Also, the tf/idf scores are a little different as they come
>> back in integer values as separate components.
>
> In wiki, example output only tf and df values - which are integer - are 
> displayed. You can calculate tf*idf (double) with these parameters:
>
> &qt=tvrh&tv=true&fl=yourFieldName&tv.tf=true&tv.df=true&tv.tf_idf=true
>
>
>
>

Reply via email to