Re: Date faceting - howto improve performance

Marcus Herou Thu, 30 Apr 2009 03:40:08 -0700

Thanks should have grep'ed the source of course (like I always seem to end
up with doing haha)



/M

On Wed, Apr 29, 2009 at 10:13 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Some basic documentation is in the example schema.xml. Ask away if you have
> specific questions.
>
> On Thu, Apr 30, 2009 at 1:00 AM, Marcus Herou <marcus.he...@tailsweep.com
> >wrote:
>
> > Aha!
> >
> > Hmm , googling wont help me I see. any hints of usages ?
> >
> > /M
> >
> >
> > On Tue, Apr 28, 2009 at 12:29 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Sorry, I'm late in this thread.
> > >
> > > Did you try using Trie fields (new in 1.4)? The regular date faceting
> > won't
> > > work out-of-the-box for trie fields I think. But you could use
> > facet.query
> > > to achieve the same effect. On my simple benchmarks I've found trie
> > fields
> > > to give a huge improvement in range searches.
> > >
> > > On Sat, Apr 25, 2009 at 4:24 PM, Marcus Herou <
> > marcus.he...@tailsweep.com
> > > >wrote:
> > >
> > > > Hi.
> > > >
> > > > One of our faceting use-cases:
> > > > We are creating trend graphs of how many blog posts that contains a
> > > certain
> > > > term and groups it by day/week/year etc. with the nice DateMathParser
> > > > functions.
> > > >
> > > > The performance degrades really fast and consumes a lot of memory
> which
> > > > forces OOM from time to time
> > > > We think it is due the fact that the cardinality of the field
> > > publishedDate
> > > > in our index is huge, almost equal to the nr of documents in the
> index.
> > > >
> > > > We need to address that...
> > > >
> > > > Some questions:
> > > >
> > > > 1. Can a datefield have other date-formats than the default of
> > yyyy-MM-dd
> > > > HH:mm:ssZ ?
> > > >
> > > > 2. We are thinking of adding a field to the index which have the
> format
> > > > yyyy-MM-dd to reduce the cardinality, if that field can't be a date,
> it
> > > > could perhaps be a string, but the question then is if faceting can
> be
> > > used
> > > > ?
> > > >
> > > > 3. Since we now already have such a huge index, is there a way to add
> a
> > > > field afterwards and apply it to all documents without actually
> > > reindexing
> > > > the whole shebang ?
> > > >
> > > > 4. If the field cannot be a string can we just leave out the
> > > > hour/minute/second information and to reduce the cardinality and
> > improve
> > > > performance ? Example: 2009-01-01 00:00:00Z
> > > >
> > > > 5. I am afraid that we need to reindex everything to get this to work
> > > > (negates Q3). We have 8 shards as of current, what would the most
> > > efficient
> > > > way be to reindexing the whole shebang ? Dump the entire database to
> > disk
> > > > (sigh), create many xml file splits and use curl in a
> > > > random/hash(numServers) manner on them ?
> > > >
> > > >
> > > > Kindly
> > > >
> > > > //Marcus
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Marcus Herou CTO and co-founder Tailsweep AB
> > > > +46702561312
> > > > marcus.he...@tailsweep.com
> > > > http://www.tailsweep.com/
> > > > http://blogg.tailsweep.com/
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
> >
> >
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > marcus.he...@tailsweep.com
> > http://www.tailsweep.com/
> > http://blogg.tailsweep.com/
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Re: Date faceting - howto improve performance

Reply via email to