Hi.

One of our faceting use-cases:
We are creating trend graphs of how many blog posts that contains a certain
term and groups it by day/week/year etc. with the nice DateMathParser
functions.

The performance degrades really fast and consumes a lot of memory which
forces OOM from time to time
We think it is due the fact that the cardinality of the field publishedDate
in our index is huge, almost equal to the nr of documents in the index.

We need to address that...

Some questions:

1. Can a datefield have other date-formats than the default of yyyy-MM-dd
HH:mm:ssZ ?

2. We are thinking of adding a field to the index which have the format
yyyy-MM-dd to reduce the cardinality, if that field can't be a date, it
could perhaps be a string, but the question then is if faceting can be used
?

3. Since we now already have such a huge index, is there a way to add a
field afterwards and apply it to all documents without actually reindexing
the whole shebang ?

4. If the field cannot be a string can we just leave out the
hour/minute/second information and to reduce the cardinality and improve
performance ? Example: 2009-01-01 00:00:00Z

5. I am afraid that we need to reindex everything to get this to work
(negates Q3). We have 8 shards as of current, what would the most efficient
way be to reindexing the whole shebang ? Dump the entire database to disk
(sigh), create many xml file splits and use curl in a
random/hash(numServers) manner on them ?


Kindly

//Marcus







-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Reply via email to