: I have catchall "text" field, and use it for searching.This field
: stores the non-unique terms. For example, this field stores the
: following terms:test test searchIs it possible to store non-unique
: terms in the following way: "term"|"number of terms", i.e. test|2
: search?
: I guess it should reduce the size of index
: 
: And if yes - is it possible to use this number of terms when
: calculating the relevance?

what you are describing is exactly how an inverted index like Lucene/Solr 
works -- the original raw text can optionally be "stored" for retrieval, 
but the index that is *searched* contains each term a single time, along 
with pointers refering to which documents and where in those documents the 
term exists.  the number of times a term exists in a document is the term 
frequency (or "tf") and is one of the two primary components used in 
the basic scoring formula (TF/IDF)

https://lucene.apache.org/java/3_5_0/fileformats.html
https://en.wikipedia.org/wiki/Tf%E2%80%93idf



-Hoss

Reply via email to