Hi Otis,
thanks for your reply!
You could say I'm lucky (and I totally agree since I've made the choice of
ordering the data that way:p).
What you describe is what I've thought about doing and I'm happy to read
that you approve. It is always nice to know that you are not doing things
completely off - that's what I love about this mailing list!
I've implemented a sharded "yellow pages" that builds up the shard
parameter and it will obviously be easy to search in two shards to
overcome the beginning of the year situation, just thought it might be a
bit stupid to search for 1% of the data in the "latest shard" and the rest
in shard n-1. How much of a performance decrease do you recon I will get
from searching two shards instead of one?
Anyways, thanks for confirming things, Otis!
Cheers,
Aleksander
On Wed, 10 Jun 2009 07:51:16 +0200, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
Aleksander,
In a sense you are lucky you have time-ordered data. That makes it very
easy to shard and cheaper to search - you know exactly which shards you
need to query. The beginning of the year situation should also be
easy. Do start with the latest shard for the current year, and go to
next shard only if you have to (e.g. if you don't get enough results
from the first shard).
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Aleksander M. Stensby <aleksander.sten...@integrasco.no>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Sent: Tuesday, June 9, 2009 7:07:47 AM
Subject: Sharding strategy
Hi all,
I'm trying to figure out how to shard our index as it is growing
rapidly and we
want to make our solution scalable.
So, we have documents that are most commonly sorted by their date. My
initial
thought is to shard the index by date, but I wonder if you have any
input on
this and how to best solve this...
I know that the most frequent queries will be executed against the
"latest"
shard, but then let's say we shard by year, how do we best solve the
situation
that will occur in the beginning of a new year? (Some of the data will
be in the
last shard, but most of it will be on the second last shard.)
Would it be stupid to have a "latest" shard with duplicate data (always
consisting of the last 6 months or something like that) and maintain
that index
in addition to the regular yearly shards? Any one else facing a similar
situation with a good solution?
Any input would be greatly appreciated :)
Cheers,
Aleksander
--Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco
Please consider the environment before printing all or any of this
e-mail
--
Aleksander M. Stensby
Lead software developer and system architect
Integrasco A/S
www.integrasco.no
http://twitter.com/Integrasco
Please consider the environment before printing all or any of this e-mail