Re: Date faceting - howto improve performance

Marcus Herou Wed, 29 Apr 2009 12:30:34 -0700

Aha!

Hmm , googling wont help me I see. any hints of usages ?


/M


On Tue, Apr 28, 2009 at 12:29 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Sorry, I'm late in this thread.
>
> Did you try using Trie fields (new in 1.4)? The regular date faceting won't
> work out-of-the-box for trie fields I think. But you could use facet.query
> to achieve the same effect. On my simple benchmarks I've found trie fields
> to give a huge improvement in range searches.
>
> On Sat, Apr 25, 2009 at 4:24 PM, Marcus Herou <marcus.he...@tailsweep.com
> >wrote:
>
> > Hi.
> >
> > One of our faceting use-cases:
> > We are creating trend graphs of how many blog posts that contains a
> certain
> > term and groups it by day/week/year etc. with the nice DateMathParser
> > functions.
> >
> > The performance degrades really fast and consumes a lot of memory which
> > forces OOM from time to time
> > We think it is due the fact that the cardinality of the field
> publishedDate
> > in our index is huge, almost equal to the nr of documents in the index.
> >
> > We need to address that...
> >
> > Some questions:
> >
> > 1. Can a datefield have other date-formats than the default of yyyy-MM-dd
> > HH:mm:ssZ ?
> >
> > 2. We are thinking of adding a field to the index which have the format
> > yyyy-MM-dd to reduce the cardinality, if that field can't be a date, it
> > could perhaps be a string, but the question then is if faceting can be
> used
> > ?
> >
> > 3. Since we now already have such a huge index, is there a way to add a
> > field afterwards and apply it to all documents without actually
> reindexing
> > the whole shebang ?
> >
> > 4. If the field cannot be a string can we just leave out the
> > hour/minute/second information and to reduce the cardinality and improve
> > performance ? Example: 2009-01-01 00:00:00Z
> >
> > 5. I am afraid that we need to reindex everything to get this to work
> > (negates Q3). We have 8 shards as of current, what would the most
> efficient
> > way be to reindexing the whole shebang ? Dump the entire database to disk
> > (sigh), create many xml file splits and use curl in a
> > random/hash(numServers) manner on them ?
> >
> >
> > Kindly
> >
> > //Marcus
> >
> >
> >
> >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Date faceting - howto improve performance

Reply via email to