It sounds bad with a 600GB index, but the techniques in the UMass achieve a
substantial compression of the in-memory size (remember that only part of
the index needs to be memory resident).

If you assume that you get 2x compression from compression and elision then
you only need 3-5 fat-memory machines to handle this load.  These machines
will be moderately expensive, but memory is pretty cheap lately.

This configuration has a substantial benefit when the query rate is high
because the cost of memory is much smaller than the cost of sheet metal,
power supplies and CPU's.  When the query rate is low, then spinning rust
has an advantage because it is cheap per bit.

On Fri, Jan 20, 2012 at 7:23 AM, Peter Velikin <pe...@velobit.com> wrote:

> Ted, Otis,
>
>
>
> Thanks for the info. I’ll take a stab at answering your question.
>
>
>
> RAM:
>
> Both of you are correct that if you were able to keep your index in RAM,
> that would give you the fastest results. This works if you have a small
> enough index. At ZoomInfo, the index was 600 GB (they have multiple types
> of indexed data), so there was no way to keep it in RAM. Due to the size of
> the index, they have elected to "shard" the data across two sets of systems
> for manageability and performance reasons. So, while in theory performance
> would be fastest if you keep the entire index in RAM, this is not possible
> or at least not practical if you have a large index.
>
>
>
> All SSD:
>
> SSDs are a lot faster, so if you swap your HDDs with SSD, performance will
> go up. But that’s really expensive and is also disruptive. In Zoom’s case,
> they have a cluster of Dell 2970 servers with 8 cores, each with 6x 146GB,
> 15k rpm SAS drives. Going all SSD would be expensive for them and would
> also require a disruption to running servers.
>
>
>
> SSD as a cache only:
>
> Since they wanted to avoid the cost and disruption of upgrading the
> servers, Zoom added one OCZ Vertex 3 to each of the servers (at a cost of
> $230 per SSD) and ran it as an expansion of RAM (cache was a combination of
> RAM and SSD). All was configured on the running servers without any
> disruption to the running application. The result was an immediate 4x
> improvement in performance (responses per second went up from 12/sec to
> 48/sec, bandwidth went up from 500 KB/sec to 2.2 MB/sec). The VeloBit
> software acts as a driver that automatically configures and manages the
> RAM+SSD-combo cache; the value of SSD caching software is that it makes the
> whole process plug&play.
>
>
>
> So the argument is that adding 1 SSD to each server and using it as a
> cache (more precisely as cache expansion to the cache already in RAM) will
> give you the best price/performance benefit of all options you have.
>
>
>
> Does this clarify things? Was I able to answer your question?
>
>
>
> Best regards,
>
>
>
> Peter
>
>
>
>
>
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
> Sent: Friday, January 20, 2012 2:42 AM
> To: solr-user@lucene.apache.org
> Subject: Re: How to accelerate your Solr-Lucene appication by 4x
>
>
>
> Actually, for search applications there is a reasonable amount of evidence
> that holding the index in RAM is actually more cost effective than SSD's
> because the throughput is enough faster to make up for the price
> differential.  There are several papers out of UMass that describe this
> trade-off, although they are out-of-date enough to talk about 8GB memory as
> being big.  One interest aspect of the work is the way that they keep an
> index highly compressed yet still fast to search.
>
>
>
> As a point of reference, most of Google's searches are served out of
> memory in pretty much just this way.  Using SSD's would just slow them down.
>
>
>
> On Fri, Jan 20, 2012 at 5:16 AM, Fuad Efendi < <mailto:f...@efendi.ca>
> f...@efendi.ca> wrote:
>
>
>
> > I agree that SSD boosts performance... In some rare not-real-life
> scenario:
>
> > - super frequent commits
>
> > That's it, nothing more except the fact that Lucene compile time
>
> > including tests takes up to two minutes on MacBook with SSD, or
>
> > forty-fifty minutes on Windows with HDD.
>
> > Of course, with non-empty maven repository in both scenario, to be fair.
>
> >
>
> >
>
> > another scenario: imagine google file system is powered by SSD instead
>
> > of cheapest HDD... HAHAHA!!!
>
> >
>
> > Can we expect response time 0.1 milliseconds instead of 30-50?
>
> >
>
> >
>
> > And final question... Will SSD improve performance of fuzzy search?
>
> > Range queries? Etc
>
> >
>
> >
>
> >
>
> > I just want to say that SSD is faster than HDD but it doesn't mean
>
> > anything...
>
> >
>
> >
>
> >
>
> > -Fuad
>
> >
>
> >
>
> >
>
> >
>
> >
>
> > Sent from my iPad
>
> >
>
> > On 2012-01-19, at 9:40 AM, "Peter Velikin" < <mailto:pe...@velobit.com>
> pe...@velobit.com> wrote:
>
> >
>
> > > All,
>
> > >
>
> > > Point taken: my message should have been written more succinctly and
>
> > just stuck to the facts. Sorry for the sales pitch!
>
> > >
>
> > > However, I believe that adding SSD as a means to accelerate the
>
> > performance of your Solr cluster is an important topic to discuss on
>
> > this forum. There are many options for you to consider. I believe
>
> > VeloBit would be the best option for many, but you have choices, some
>
> > of them completely free. If interested, send me a note and I'll be
>
> > happy to tell you about the different options (free or paid) you can
> consider.
>
> > >
>
> > > Solr clusters are I/O bound. I am arguing that before you buy
>
> > > additional
>
> > servers, replace your existing servers with new ones, or swap your
>
> > hard disks, you should try adding SSD as a cache. If the promise is
>
> > that adding
>
> > 1 SSD could save you the cost of 3 additional servers, you should try it.
>
> > >
>
> > > Has anyone else tried adding SSDs as a cache to boost the
>
> > > performance of
>
> > Solr clusters? Can you share your results?
>
> > >
>
> > >
>
> > > Best regards,
>
> > >
>
> > > Peter Velikin
>
> > > VP Online Marketing, VeloBit, Inc.
>
> > >  <mailto:pe...@velobit.com> pe...@velobit.com
>
> > > tel. 978-263-4800
>
> > > mob. 617-306-7165
>
> > >
>
> > > VeloBit provides plug & play SSD caching software that dramatically
>
> > accelerates applications at a remarkably low cost. The software
>
> > installs seamlessly in less than 10 minutes and automatically tunes
>
> > for fastest application speed. Visit  <http://www.velobit.com>
> www.velobit.com for details.
>
> > >
>
> > >
>
> > >
>
> >
>
>

Reply via email to