Hi Marcus. You must supply dates in the format that you are doing now -- ISO-8601 with the Z to indicate there is no time-zone offset occurring. To reduce cardinality to the day level instead of to the second that you are currently performing, the date you supply can include DateMathParser operations. So if you supply: 2009-04-01 20:15:01Z/DAY then this will do what you think it does. Of course you then loose the ability to search based on a granularity finer than a day. And the date you get back (i.e. the stored value) is the rounded date; not the date prior to rounding.
Yes you will certainly need to re-index. Since you have architected your indexing strategy, only you know how to go about doing that. By now I'm sure you are aware that you cannot update individual fields. By the way, if your current strategy involves periodic updates then you could take the strategy of simply waiting until all your data eventually gets re-indexed. There's no harm in some of the dates being rounded and some not -- it's just that until most of them are rounded, you have your current problem of sporadic OOM. ~ David ________________________________________ From: Marcus Herou [marcus.he...@tailsweep.com] Sent: Saturday, April 25, 2009 6:54 AM To: solr-user@lucene.apache.org Subject: Date faceting - howto improve performance Hi. One of our faceting use-cases: We are creating trend graphs of how many blog posts that contains a certain term and groups it by day/week/year etc. with the nice DateMathParser functions. The performance degrades really fast and consumes a lot of memory which forces OOM from time to time We think it is due the fact that the cardinality of the field publishedDate in our index is huge, almost equal to the nr of documents in the index. We need to address that... Some questions: 1. Can a datefield have other date-formats than the default of yyyy-MM-dd HH:mm:ssZ ? 2. We are thinking of adding a field to the index which have the format yyyy-MM-dd to reduce the cardinality, if that field can't be a date, it could perhaps be a string, but the question then is if faceting can be used ? 3. Since we now already have such a huge index, is there a way to add a field afterwards and apply it to all documents without actually reindexing the whole shebang ? 4. If the field cannot be a string can we just leave out the hour/minute/second information and to reduce the cardinality and improve performance ? Example: 2009-01-01 00:00:00Z 5. I am afraid that we need to reindex everything to get this to work (negates Q3). We have 8 shards as of current, what would the most efficient way be to reindexing the whole shebang ? Dump the entire database to disk (sigh), create many xml file splits and use curl in a random/hash(numServers) manner on them ? Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/