Thank you Ahmet, this is exactly what I was looking for. Looks like
the shingle filter can produce 3+-gram terms as well, that's great.
I'm going to try this with both western and CJK language tokenizers
and see how it turns out.
On Tue, Feb 9, 2010 at 5:07 PM, Ahmet Arslan wrote:
>> I've been l
Hello,
One of the commercial search platforms I work with has the concept of
'document vectors', which are 1-gram and 2-gram phrases and their
associated tf/idf weights on a 0-1 scale, i.e. ["banana pie", 0.99]
means banana pie is very relevant for this document.
During the ingest/indexing proces