: I have catchall "text" field, and use it for searching.This field : stores the non-unique terms. For example, this field stores the : following terms:test test searchIs it possible to store non-unique : terms in the following way: "term"|"number of terms", i.e. test|2 : search? : I guess it should reduce the size of index : : And if yes - is it possible to use this number of terms when : calculating the relevance?
what you are describing is exactly how an inverted index like Lucene/Solr works -- the original raw text can optionally be "stored" for retrieval, but the index that is *searched* contains each term a single time, along with pointers refering to which documents and where in those documents the term exists. the number of times a term exists in a document is the term frequency (or "tf") and is one of the two primary components used in the basic scoring formula (TF/IDF) https://lucene.apache.org/java/3_5_0/fileformats.html https://en.wikipedia.org/wiki/Tf%E2%80%93idf -Hoss