Hi guys,
let's open a discussion :

*Use Case *:
A set of fields I use only for :
- exact search
- faceting

*Field Configuration*

<field name="author" type="string" indexed="true" stored="true" docValues=
"true" omitTermFreqsAndPositions="true" omitNorms="true" />

I don't need norms, I don't need term freq and I don't need positions.
I do need the index for exact search.
I would like to have docValues because facets are going to be heavy on
those fields.
I like to store them.

*Faceting approach *
*1) *Indexing the human readable field value
Facets will be returned readable, out of the box.
I can not see any cons in this approach, I would say it is the standard one.

   - When building the docValues and flushing them to the disk, good
   compression algorithm are going to be used.
   - When calculating faceting, in memory it is used the ordinal for each
   term, which means in memory we don't waste space for the actual term, or
   waste the time looking up for the value until the very end of the process,
   after the counts are done .

*2)* Correlate outside the search system each term to a custom ID. Index
the custom ID. After facets are calculated resolve the ID and show the
human readable labels.

According to my knowledge in this way we are overcomplicating  the
situation.
We basically duplicate the effort in looking up for the facet values ( we
do internally in Lucene in the end of the faceting process : from Ordinal
to CustomID and we do it again from the CustomID to the value in the front
end)

The only apparent gain could be in term of disk space, but also in this
case I am not 100% sure that compressing a set of IDs will produce much
benefit in compressing the real values ( which can present repeated
sequence of characters for example) .

What are your consideration ?
Any additional pro/con ?

Cheers


-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Reply via email to