RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-19 Thread Peter Velikin
All,

Point taken: my message should have been written more succinctly and just stuck 
to the facts. Sorry for the sales pitch!

However, I believe that adding SSD as a means to accelerate the performance of 
your Solr cluster is an important topic to discuss on this forum. There are 
many options for you to consider. I believe VeloBit would be the best option 
for many, but you have choices, some of them completely free. If interested, 
send me a note and I'll be happy to tell you about the different options (free 
or paid) you can consider.

Solr clusters are I/O bound. I am arguing that before you buy additional 
servers, replace your existing servers with new ones, or swap your hard disks, 
you should try adding SSD as a cache. If the promise is that adding 1 SSD could 
save you the cost of 3 additional servers, you should try it.

Has anyone else tried adding SSDs as a cache to boost the performance of Solr 
clusters? Can you share your results?


Best regards,

Peter Velikin
VP Online Marketing, VeloBit, Inc.
pe...@velobit.com
tel. 978-263-4800
mob. 617-306-7165

VeloBit provides plug & play SSD caching software that dramatically accelerates 
applications at a remarkably low cost. The software installs seamlessly in less 
than 10 minutes and automatically tunes for fastest application speed. Visit 
www.velobit.com for details.





RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Peter Velikin
Ted, Otis,

 

Thanks for the info. I’ll take a stab at answering your question.

 

RAM:

Both of you are correct that if you were able to keep your index in RAM, that 
would give you the fastest results. This works if you have a small enough 
index. At ZoomInfo, the index was 600 GB (they have multiple types of indexed 
data), so there was no way to keep it in RAM. Due to the size of the index, 
they have elected to "shard" the data across two sets of systems for 
manageability and performance reasons. So, while in theory performance would be 
fastest if you keep the entire index in RAM, this is not possible or at least 
not practical if you have a large index.

 

All SSD:

SSDs are a lot faster, so if you swap your HDDs with SSD, performance will go 
up. But that’s really expensive and is also disruptive. In Zoom’s case, they 
have a cluster of Dell 2970 servers with 8 cores, each with 6x 146GB, 15k rpm 
SAS drives. Going all SSD would be expensive for them and would also require a 
disruption to running servers. 

 

SSD as a cache only: 

Since they wanted to avoid the cost and disruption of upgrading the servers, 
Zoom added one OCZ Vertex 3 to each of the servers (at a cost of $230 per SSD) 
and ran it as an expansion of RAM (cache was a combination of RAM and SSD). All 
was configured on the running servers without any disruption to the running 
application. The result was an immediate 4x improvement in performance 
(responses per second went up from 12/sec to 48/sec, bandwidth went up from 500 
KB/sec to 2.2 MB/sec). The VeloBit software acts as a driver that automatically 
configures and manages the RAM+SSD-combo cache; the value of SSD caching 
software is that it makes the whole process plug&play.

 

So the argument is that adding 1 SSD to each server and using it as a cache 
(more precisely as cache expansion to the cache already in RAM) will give you 
the best price/performance benefit of all options you have.

 

Does this clarify things? Was I able to answer your question?

 

Best regards,

 

Peter

 

 

 

-Original Message-
From: Ted Dunning [mailto:ted.dunn...@gmail.com] 
Sent: Friday, January 20, 2012 2:42 AM
To: solr-user@lucene.apache.org
Subject: Re: How to accelerate your Solr-Lucene appication by 4x

 

Actually, for search applications there is a reasonable amount of evidence that 
holding the index in RAM is actually more cost effective than SSD's because the 
throughput is enough faster to make up for the price differential.  There are 
several papers out of UMass that describe this trade-off, although they are 
out-of-date enough to talk about 8GB memory as being big.  One interest aspect 
of the work is the way that they keep an index highly compressed yet still fast 
to search.

 

As a point of reference, most of Google's searches are served out of memory in 
pretty much just this way.  Using SSD's would just slow them down.

 

On Fri, Jan 20, 2012 at 5:16 AM, Fuad Efendi < <mailto:f...@efendi.ca> 
f...@efendi.ca> wrote:

 

> I agree that SSD boosts performance... In some rare not-real-life scenario:

> - super frequent commits

> That's it, nothing more except the fact that Lucene compile time 

> including tests takes up to two minutes on MacBook with SSD, or 

> forty-fifty minutes on Windows with HDD.

> Of course, with non-empty maven repository in both scenario, to be fair.

> 

> 

> another scenario: imagine google file system is powered by SSD instead 

> of cheapest HDD... HAHAHA!!!

> 

> Can we expect response time 0.1 milliseconds instead of 30-50?

> 

> 

> And final question... Will SSD improve performance of fuzzy search? 

> Range queries? Etc

> 

> 

> 

> I just want to say that SSD is faster than HDD but it doesn't mean 

> anything...

> 

> 

> 

> -Fuad

> 

> 

> 

> 

> 

> Sent from my iPad

> 

> On 2012-01-19, at 9:40 AM, "Peter Velikin" < <mailto:pe...@velobit.com> 
> pe...@velobit.com> wrote:

> 

> > All,

> >

> > Point taken: my message should have been written more succinctly and

> just stuck to the facts. Sorry for the sales pitch!

> >

> > However, I believe that adding SSD as a means to accelerate the

> performance of your Solr cluster is an important topic to discuss on 

> this forum. There are many options for you to consider. I believe 

> VeloBit would be the best option for many, but you have choices, some 

> of them completely free. If interested, send me a note and I'll be 

> happy to tell you about the different options (free or paid) you can consider.

> >

> > Solr clusters are I/O bound. I am arguing that before you buy 

> > additional

> servers, replace your existing servers with new ones, or swap your 

> hard disks, you should try adding SSD a

RE: How to accelerate your Solr-Lucene appication by 4x

2012-01-20 Thread Peter Velikin
Hi Erick,

 

This is correct. An additional benefit to configuring the SSD as cache vs
primary storage is that you don't have to change anything to your existing
indexes (the cache will just give a performance boost). 

 

In addition to configuring the system to utilize SSDs as the location where
pages go when swapped out of RAM, VeloBit does a few more performance
optimization tricks, which I will explain below.

 

 

(I am wary of the sensitivity to commercial messages: the following will
explain some of the differentiators of VeloBit. So, if you have an aversion
to vendors promoting themselves, please do not read further)

 

VeloBit 

-  configures SSDs to appear as cache expansion 

-  compresses data at the block level so you can hold much more data
in cache (cache will appear much bigger than the physical size of SSD and
RAM; you'll get higher performance since you'll less frequently have to read
data from slow HDD)

-  makes decisions on what data to go in cache based on the
popularity of the contents of each block (increases cache hit rates and
system performance)

-  optimizes how data is placed and managed on the SSD which takes
care of write, erase, and garbage collection limitations inherent to flash
based SSDs (increases the performance of SSDs and extends the life of the
SSD; this enables you to use a commodity SSD from Best Buy for enterprise
workloads instead of having to buy really expensive high-end SSDs)

-  automates the whole process making everything plug & play (no
need to deal with storage and server architecture issues)

 

(end of commercial)

 

Best regards,

 

Peter

 

 

 

 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, January 20, 2012 11:45 AM
To: solr-user@lucene.apache.org; pe...@velobit.com
Subject: Re: How to accelerate your Solr-Lucene appication by 4x

 

Peter:

 

I admit I've just scanned the thread, but it sounds like what you're really
doing under the covers is configuring your system to utilize the SSDs as
where your pages go when it's swapped out of RAM, is this correct?

 

Which would certainly speed things up substantially if swapping was
happening...

 

Best

Erick

 

On Fri, Jan 20, 2012 at 7:23 AM, Peter Velikin < <mailto:pe...@velobit.com>
pe...@velobit.com> wrote:

> Ted, Otis,

> 

> 

> 

> Thanks for the info. I'll take a stab at answering your question.

> 

> 

> 

> RAM:

> 

> Both of you are correct that if you were able to keep your index in RAM,
that would give you the fastest results. This works if you have a small
enough index. At ZoomInfo, the index was 600 GB (they have multiple types of
indexed data), so there was no way to keep it in RAM. Due to the size of the
index, they have elected to "shard" the data across two sets of systems for
manageability and performance reasons. So, while in theory performance would
be fastest if you keep the entire index in RAM, this is not possible or at
least not practical if you have a large index.

> 

> 

> 

> All SSD:

> 

> SSDs are a lot faster, so if you swap your HDDs with SSD, performance will
go up. But that's really expensive and is also disruptive. In Zoom's case,
they have a cluster of Dell 2970 servers with 8 cores, each with 6x 146GB,
15k rpm SAS drives. Going all SSD would be expensive for them and would also
require a disruption to running servers.

> 

> 

> 

> SSD as a cache only:

> 

> Since they wanted to avoid the cost and disruption of upgrading the
servers, Zoom added one OCZ Vertex 3 to each of the servers (at a cost of
$230 per SSD) and ran it as an expansion of RAM (cache was a combination of
RAM and SSD). All was configured on the running servers without any
disruption to the running application. The result was an immediate 4x
improvement in performance (responses per second went up from 12/sec to
48/sec, bandwidth went up from 500 KB/sec to 2.2 MB/sec). The VeloBit
software acts as a driver that automatically configures and manages the
RAM+SSD-combo cache; the value of SSD caching software is that it makes the
whole process plug&play.

> 

> 

> 

> So the argument is that adding 1 SSD to each server and using it as a
cache (more precisely as cache expansion to the cache already in RAM) will
give you the best price/performance benefit of all options you have.

> 

> 

> 

> Does this clarify things? Was I able to answer your question?

> 

> 

> 

> Best regards,

> 

> 

> 

> Peter

> 

> 

> 

> 

> 

> 

> 

> -Original Message-

> From: Ted Dunning [mailto:ted.dunn...@gmail.com]

> Sent: Friday, January 20, 2012 2:42 AM

> To: solr-user@lucene.apache.org

> Subject: Re: How to accelerate your Solr-Lucene appication by 4x

> 

>

RE: Solr Warm-up performance issues

2012-01-27 Thread Peter Velikin
Dan,

I can suggest a solution that should help. VeloBit enables you to add SSDs
to your servers as a cache (SSD will cost you $200, per server should be
enough). Then, assuming a 100MB/s read speed from your SAS disks, you can
read 50GB data into the VeloBit HyperCache cache in about 9 mins (this
happens automatically, all you need to do is add the SSD to your server and
install Velobit one time, which takes 2 minutes). Solr should run much
faster after that. The added benefit of the solution is that you would have
also boosted the steady state performance by 4x.

Let me know if you are interested in trying it out and I'll set you up to
talk with my engineers.


Best regards,

Peter Velikin
VP Online Marketing, VeloBit, Inc.
pe...@velobit.com
tel. 978-263-4800
mob. 617-306-7165

VeloBit provides plug & play SSD caching software that dramatically
accelerates applications at a remarkably low cost. The software installs
seamlessly in less than 10 minutes and automatically tunes for fastest
application speed. Visit www.velobit.com for details.



-Original Message-
From: dan sutton [mailto:danbsut...@gmail.com] 
Sent: Friday, January 27, 2012 9:44 AM
To: solr-user
Subject: Solr Warm-up performance issues

Hi List,

We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB.
Every day we produce a new dataset of 40 GB and have to switch one for the
other.

Once the index switch over has taken place, it takes roughly 30 min for Solr
to reach maximum performance. Are there any hardware or software solutions
to reduce the warm-up time ? We tried warm-up queries but it didn't change
much.

Our hardware specs is:
   * Dell Poweredge 1950
   * 2 x Quad-Core Xeon E5405 (2.00GHz)
   * 48 GB RAM
   * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror

One thing that does seem to take a long time is un-inverting a set of
multivalued fields, are there any optimizations we might be able to use
here?

Thanks for your help.
Dan