Re: scaling / sharding questions

Otis Gospodnetic Sun, 15 Jun 2008 11:43:11 -0700

With Lance's MD5 schema you'd do this:

1 shard: 0-f*
2 shards: 0-8*, 9-f*
3 shards: 0-5*, 6-a*, b-f*
4 shards: 0-3*, 4-7*, 8-b*, c-f*
...
16 shards: 0*, 1*, 2*....... d*, e*, f*


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Marcus Herou <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Cc: [EMAIL PROTECTED]
> Sent: Saturday, June 14, 2008 5:53:35 AM
> Subject: Re: scaling / sharding questions
> 
> Hi.
> 
> We as well use md5 as the uid.
> 
> I guess by saying each 1/16th is because the md5 is hex, right? (0-f).
> Thinking about md5 sharding.
> 1 shard: 0-f
> 2 shards: 0-7:8-f
> 3 shards: problem!
> 4 shards: 0-3....
> 
> This technique would require that you double the amount of shards each time
> you split right ?
> 
> Split by delete sounds really smart, damn that I did'nt think of that :)
> 
> Anyway over time the technique of moving the whole index to a new shard and
> then delete would probably be more than challenging.
> 
> 
> 
> 
> I will never ever store the data in Lucene mainly because of bad exp and
> since I want to create modules which are fast,  scalable and flexible and
> storing the data alongside with the index do not match that for me at least.
> 
> So yes I will have the need to do a "foreach id in ids get document"
> approach in the searcher code, but at least I can optimize the retrieval of
> docs myself and let Lucene do what it's good at: indexing and searching not
> storage.
> 
> I am more and more thinking in terms of having different levels of searching
> instead of searcing in all shards at the same time.
> 
> Let's say you start with 4 shards where you each document is replicated 4
> times based on publishdate. Since all shards have the same data you can lb
> the query to any of the 4 shards.
> 
> One day you find that 4 shards is not enough because of search performance
> so you add 4 new shards. Now you only index these 4 new shards with the new
> documents making the old ones readonly.
> 
> The searcher would then prioritize the new shards and only if the query
> returns less than X results you start querying the old shards.
> 
> This have a nice side effect of having the most relevant/recent entries in
> the index which is searched the most. Since the old shards will be mostly
> idle you can as well convert 2 of the old shards to "new" shards reducing
> the need for buying new servers.
> 
> What I'm trying to say is that you will end up with an architecture which
> have many nodes on top which each have few documents and fewer and fewer
> nodes as you go down the architecture but where each node store more
> documents since the search speed get's less and less relevant.
> 
> Something like this:
> 
> xxxxxxxx - Primary: 10M docs per shard, make sure 95% of the results comes
> from here.
>    yyyy - Standby: 100M docs per shard - merges of 10 primary indices.
>      zz - Archive: 1000M docs per shard - merges of 10 standby indices.
> 
> Search top-down.
> The numbers are just speculative. The drawback with this architecture is
> that you get no indexing benefit at all if the architecture drawn above is
> the same as which you use for indexing. I think personally you should use X
> indexers which then merge indices (MapReduce) for max performance and lay
> them out as described above.
> 
> I think Google do something like this.
> 
> 
> Kindly
> 
> //Marcus
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Sat, Jun 14, 2008 at 2:27 AM, Lance Norskog wrote:
> 
> > Yes, I've done this split-by-delete several times. The halved index still
> > uses as much disk space until you optimize it.
> >
> > As to splitting policy: we use an MD5 signature as our unique ID. This has
> > the lovely property that we can wildcard.  'contentid:f*' denotes 1/16 of
> > the whole index. This 1/16 is a very random sample of the whole index. We
> > use this for several things. If we use this for shards, we have a query
> > that
> > matches a shard's contents.
> >
> > The Solr/Lucene syntax does not support modular arithmetic,and so it will
> > not let you query a subset that matches one of your shards.
> >
> > We also found that searching a few smaller indexes via the Solr 1.3
> > Distributed Search feature is actually faster than searching one large
> > index, YMMV. So for us, a large pile of shards will be optimal anyway, so
> > we
> > have to need "rebalance".
> >
> > It sounds like you're not storing the data in a backing store, but are
> > storing all data in the index itself. We have found this "challenging".
> >
> > Cheers,
> >
> > Lance Norskog
> >
> > -----Original Message-----
> > From: Jeremy Hinegardner [mailto:[EMAIL PROTECTED]
> > Sent: Friday, June 13, 2008 3:36 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: scaling / sharding questions
> >
> > Sorry for not keeping this thread alive, lets see what we can do...
> >
> > One option I've thought of for 'resharding' would splitting an index into
> > two by just copying it, the deleting 1/2 the documents from one, doing a
> > commit, and delete the other 1/2 from the other index and commit.  That is:
> >
> >  1) Take original index
> >  2) copy to b1 and b2
> >  3) delete docs from b1 that match a particular query A
> >  4) delete docs from b2 that do not match a particular query A
> >  5) commit b1 and b2
> >
> > Has anyone tried something like that?
> >
> > As for how to know where each document is stored, generally we're
> > considering unique_document_id % N.  If we rebalance we change N and
> > redistribute, but that
> > probably will take too much time.    That makes us move more towards a
> > staggered
> > age based approach where the most recent docs filter down to "permanent"
> > indexes based upon time.
> >
> > Another thought we've had recently is to have many many many physical
> > shards, on the indexing writer side, but then merge groups of them into
> > logical shards which are snapshotted to reader solrs' on a frequent basis.
> > I haven't done any testing along these lines, but logically it seems like
> > an
> > idea worth pursuing.
> >
> > enjoy,
> >
> > -jeremy
> >
> > On Fri, Jun 06, 2008 at 03:14:10PM +0200, Marcus Herou wrote:
> > > Cool sharding technique.
> > >
> > > We as well are thinking of howto "move" docs from one index to another
> > > because we need to re-balance the docs when we add new nodes to the
> > cluster.
> > > We do only store id's in the index otherwise we could have moved stuff
> > > around with IndexReader.document(x) or so. Luke
> > > (http://www.getopt.org/luke/) is able to reconstruct the indexed
> > Document
> > data so it should be doable.
> > > However I'm thinking of actually just delete the docs from the old
> > > index and add new Documents to the new node. It would be cool to not
> > > waste cpu cycles by reindexing already indexed stuff but...
> > >
> > > And we as well will have data amounts in the range you are talking
> > > about. We perhaps could share ideas ?
> > >
> > > How do you plan to store where each document is located ? I mean you
> > > probably need to store info about the Document and it's location
> > > somewhere perhaps in a clustered DB ? We will probably go for HBase for
> > this.
> > >
> > > I think the number of documents is less important than the actual data
> > > size (just speculating). We currently search 10M (will get much much
> > > larger) indexed blog entries on one machine where the JVM has 1G heap,
> > > the index size is 3G and response times are still quite fast. This is
> > > a readonly node though and is updated every morning with a freshly
> > > optimized index. Someone told me that you probably need twice the RAM
> > > if you plan to both index and search at the same time. If I were you I
> > > would just test to index X entries of your data and then start to
> > > search in the index with lower JVM settings each round and when
> > > response times get too slow or you hit OOE then you get a rough estimate
> > of the bare minimum X RAM needed for Y entries.
> > >
> > > I think we will do with something like 2G per 50M docs but I will need
> > > to test it out.
> > >
> > > If you get an answer in this matter please let me know.
> > >
> > > Kindly
> > >
> > > //Marcus
> > >
> > >
> > > On Fri, Jun 6, 2008 at 7:21 AM, Jeremy Hinegardner
> > > 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > This may be a bit rambling, but let see how it goes.  I'm not a
> > > > Lucene or Solr guru by any means, I have been prototyping with solr
> > > > and understanding how all the pieces and parts fit together.
> > > >
> > > > We are migrating our current document storage infrastructure to a
> > > > decent sized solr cluster, using 1.3-snapshots right now.
> > > > Eventually this will be in the
> > > > billion+ documents, with about 1M new documents added per day.
> > > >
> > > > Our main sticking point right now is that a significant number of
> > > > our documents will be updated, at least once, but possibly more than
> > > > once.  The volatility of a document decreases over time.
> > > >
> > > > With this in mind, we've been considering using a cascading series
> > > > of shard clusters.  That is :
> > > >
> > > >  1) a cluster of shards holding recent data ( most recent week or
> > > > two ) smaller
> > > >    indexes that take a small amount of time to commit updates and
> > optimise,
> > > >    since this will hold the most volatile documents.
> > > >
> > > >  2) Following that another cluster of shards that holds some
> > > > relatively recent
> > > >    ( 3-6 months ? ), but not super volatile, documents, these are
> > > > items that
> > > >    could potentially receive updates, but generally not.
> > > >
> > > >  3) A final set of 'archive' shards holding the final resting place for
> > > >    documents.  These would not receive updates.  These would be online
> > for
> > > >    searching and analysis "forever".
> > > >
> > > > We are not sure if this is the best way to go, but it is the
> > > > approach we are leaning toward right now.  I would like some
> > > > feedback from the folks here if you think that is a reasonable
> > > > approach.
> > > >
> > > > One of the other things I'm wondering about is how to manipulate
> > > > indexes We'll need to roll documents around between indexes over
> > > > time, or at least migrate indexes from one set of shards to another as
> > the documents 'age'
> > > > and
> > > > merge/aggregate them with more 'stable' indexes.   I know about merging
> > > > complete
> > > > indexes together, but what about migrating a subset of documents
> > > > from one index into another index?
> > > >
> > > > In addition, what is generally considered a 'manageable' index of
> > > > large size?  I was attempting to find some information on the
> > > > relationship between search response times, the amount of memory for
> > > > used for a search, and the number of documents in an index, but I
> > > > wasn't having much luck.
> > > >
> > > > I'm not sure if I'm making sense here, but just thought I would
> > > > throw this out there and see what people think.  Ther eis the
> > > > distinct possibility that I am not asking the right questions or
> > > > considering the right parameters, so feel free to correct me, or ask
> > > > questions as you see fit.
> > > >
> > > > And yes, I will report how we are doing things when we get this all
> > > > figured out, and if there are items that we can contribute back to
> > > > Solr we will.  If nothing else there will be a nice article of how
> > > > we manage TB of data with Solr.
> > > >
> > > > enjoy,
> > > >
> > > > -jeremy
> > > >
> > > > --
> > > >
> > ========================================================================
> > > >  Jeremy Hinegardner
> > [EMAIL PROTECTED]
> > > >
> > > >
> > >
> > >
> > > --
> > > Marcus Herou CTO and co-founder Tailsweep AB
> > > +46702561312
> > > [EMAIL PROTECTED]
> > > http://www.tailsweep.com/
> > > http://blogg.tailsweep.com/
> >
> > --
> > ========================================================================
> >  Jeremy Hinegardner                              [EMAIL PROTECTED]
> >
> >
> >
> 
> 
> -- 
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> [EMAIL PROTECTED]
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/

Re: scaling / sharding questions

Reply via email to