RE: How to accelerate your Solr-Lucene appication by 4x
All, Point taken: my message should have been written more succinctly and just stuck to the facts. Sorry for the sales pitch! However, I believe that adding SSD as a means to accelerate the performance of your Solr cluster is an important topic to discuss on this forum. There are many options for you to consider. I believe VeloBit would be the best option for many, but you have choices, some of them completely free. If interested, send me a note and I'll be happy to tell you about the different options (free or paid) you can consider. Solr clusters are I/O bound. I am arguing that before you buy additional servers, replace your existing servers with new ones, or swap your hard disks, you should try adding SSD as a cache. If the promise is that adding 1 SSD could save you the cost of 3 additional servers, you should try it. Has anyone else tried adding SSDs as a cache to boost the performance of Solr clusters? Can you share your results? Best regards, Peter Velikin VP Online Marketing, VeloBit, Inc. pe...@velobit.com tel. 978-263-4800 mob. 617-306-7165 VeloBit provides plug & play SSD caching software that dramatically accelerates applications at a remarkably low cost. The software installs seamlessly in less than 10 minutes and automatically tunes for fastest application speed. Visit www.velobit.com for details.
RE: How to accelerate your Solr-Lucene appication by 4x
Ted, Otis, Thanks for the info. I’ll take a stab at answering your question. RAM: Both of you are correct that if you were able to keep your index in RAM, that would give you the fastest results. This works if you have a small enough index. At ZoomInfo, the index was 600 GB (they have multiple types of indexed data), so there was no way to keep it in RAM. Due to the size of the index, they have elected to "shard" the data across two sets of systems for manageability and performance reasons. So, while in theory performance would be fastest if you keep the entire index in RAM, this is not possible or at least not practical if you have a large index. All SSD: SSDs are a lot faster, so if you swap your HDDs with SSD, performance will go up. But that’s really expensive and is also disruptive. In Zoom’s case, they have a cluster of Dell 2970 servers with 8 cores, each with 6x 146GB, 15k rpm SAS drives. Going all SSD would be expensive for them and would also require a disruption to running servers. SSD as a cache only: Since they wanted to avoid the cost and disruption of upgrading the servers, Zoom added one OCZ Vertex 3 to each of the servers (at a cost of $230 per SSD) and ran it as an expansion of RAM (cache was a combination of RAM and SSD). All was configured on the running servers without any disruption to the running application. The result was an immediate 4x improvement in performance (responses per second went up from 12/sec to 48/sec, bandwidth went up from 500 KB/sec to 2.2 MB/sec). The VeloBit software acts as a driver that automatically configures and manages the RAM+SSD-combo cache; the value of SSD caching software is that it makes the whole process plug&play. So the argument is that adding 1 SSD to each server and using it as a cache (more precisely as cache expansion to the cache already in RAM) will give you the best price/performance benefit of all options you have. Does this clarify things? Was I able to answer your question? Best regards, Peter -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Friday, January 20, 2012 2:42 AM To: solr-user@lucene.apache.org Subject: Re: How to accelerate your Solr-Lucene appication by 4x Actually, for search applications there is a reasonable amount of evidence that holding the index in RAM is actually more cost effective than SSD's because the throughput is enough faster to make up for the price differential. There are several papers out of UMass that describe this trade-off, although they are out-of-date enough to talk about 8GB memory as being big. One interest aspect of the work is the way that they keep an index highly compressed yet still fast to search. As a point of reference, most of Google's searches are served out of memory in pretty much just this way. Using SSD's would just slow them down. On Fri, Jan 20, 2012 at 5:16 AM, Fuad Efendi < <mailto:f...@efendi.ca> f...@efendi.ca> wrote: > I agree that SSD boosts performance... In some rare not-real-life scenario: > - super frequent commits > That's it, nothing more except the fact that Lucene compile time > including tests takes up to two minutes on MacBook with SSD, or > forty-fifty minutes on Windows with HDD. > Of course, with non-empty maven repository in both scenario, to be fair. > > > another scenario: imagine google file system is powered by SSD instead > of cheapest HDD... HAHAHA!!! > > Can we expect response time 0.1 milliseconds instead of 30-50? > > > And final question... Will SSD improve performance of fuzzy search? > Range queries? Etc > > > > I just want to say that SSD is faster than HDD but it doesn't mean > anything... > > > > -Fuad > > > > > > Sent from my iPad > > On 2012-01-19, at 9:40 AM, "Peter Velikin" < <mailto:pe...@velobit.com> > pe...@velobit.com> wrote: > > > All, > > > > Point taken: my message should have been written more succinctly and > just stuck to the facts. Sorry for the sales pitch! > > > > However, I believe that adding SSD as a means to accelerate the > performance of your Solr cluster is an important topic to discuss on > this forum. There are many options for you to consider. I believe > VeloBit would be the best option for many, but you have choices, some > of them completely free. If interested, send me a note and I'll be > happy to tell you about the different options (free or paid) you can consider. > > > > Solr clusters are I/O bound. I am arguing that before you buy > > additional > servers, replace your existing servers with new ones, or swap your > hard disks, you should try adding SSD a
RE: How to accelerate your Solr-Lucene appication by 4x
Hi Erick, This is correct. An additional benefit to configuring the SSD as cache vs primary storage is that you don't have to change anything to your existing indexes (the cache will just give a performance boost). In addition to configuring the system to utilize SSDs as the location where pages go when swapped out of RAM, VeloBit does a few more performance optimization tricks, which I will explain below. (I am wary of the sensitivity to commercial messages: the following will explain some of the differentiators of VeloBit. So, if you have an aversion to vendors promoting themselves, please do not read further) VeloBit - configures SSDs to appear as cache expansion - compresses data at the block level so you can hold much more data in cache (cache will appear much bigger than the physical size of SSD and RAM; you'll get higher performance since you'll less frequently have to read data from slow HDD) - makes decisions on what data to go in cache based on the popularity of the contents of each block (increases cache hit rates and system performance) - optimizes how data is placed and managed on the SSD which takes care of write, erase, and garbage collection limitations inherent to flash based SSDs (increases the performance of SSDs and extends the life of the SSD; this enables you to use a commodity SSD from Best Buy for enterprise workloads instead of having to buy really expensive high-end SSDs) - automates the whole process making everything plug & play (no need to deal with storage and server architecture issues) (end of commercial) Best regards, Peter -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, January 20, 2012 11:45 AM To: solr-user@lucene.apache.org; pe...@velobit.com Subject: Re: How to accelerate your Solr-Lucene appication by 4x Peter: I admit I've just scanned the thread, but it sounds like what you're really doing under the covers is configuring your system to utilize the SSDs as where your pages go when it's swapped out of RAM, is this correct? Which would certainly speed things up substantially if swapping was happening... Best Erick On Fri, Jan 20, 2012 at 7:23 AM, Peter Velikin < <mailto:pe...@velobit.com> pe...@velobit.com> wrote: > Ted, Otis, > > > > Thanks for the info. I'll take a stab at answering your question. > > > > RAM: > > Both of you are correct that if you were able to keep your index in RAM, that would give you the fastest results. This works if you have a small enough index. At ZoomInfo, the index was 600 GB (they have multiple types of indexed data), so there was no way to keep it in RAM. Due to the size of the index, they have elected to "shard" the data across two sets of systems for manageability and performance reasons. So, while in theory performance would be fastest if you keep the entire index in RAM, this is not possible or at least not practical if you have a large index. > > > > All SSD: > > SSDs are a lot faster, so if you swap your HDDs with SSD, performance will go up. But that's really expensive and is also disruptive. In Zoom's case, they have a cluster of Dell 2970 servers with 8 cores, each with 6x 146GB, 15k rpm SAS drives. Going all SSD would be expensive for them and would also require a disruption to running servers. > > > > SSD as a cache only: > > Since they wanted to avoid the cost and disruption of upgrading the servers, Zoom added one OCZ Vertex 3 to each of the servers (at a cost of $230 per SSD) and ran it as an expansion of RAM (cache was a combination of RAM and SSD). All was configured on the running servers without any disruption to the running application. The result was an immediate 4x improvement in performance (responses per second went up from 12/sec to 48/sec, bandwidth went up from 500 KB/sec to 2.2 MB/sec). The VeloBit software acts as a driver that automatically configures and manages the RAM+SSD-combo cache; the value of SSD caching software is that it makes the whole process plug&play. > > > > So the argument is that adding 1 SSD to each server and using it as a cache (more precisely as cache expansion to the cache already in RAM) will give you the best price/performance benefit of all options you have. > > > > Does this clarify things? Was I able to answer your question? > > > > Best regards, > > > > Peter > > > > > > > > -Original Message- > From: Ted Dunning [mailto:ted.dunn...@gmail.com] > Sent: Friday, January 20, 2012 2:42 AM > To: solr-user@lucene.apache.org > Subject: Re: How to accelerate your Solr-Lucene appication by 4x > >
RE: Solr Warm-up performance issues
Dan, I can suggest a solution that should help. VeloBit enables you to add SSDs to your servers as a cache (SSD will cost you $200, per server should be enough). Then, assuming a 100MB/s read speed from your SAS disks, you can read 50GB data into the VeloBit HyperCache cache in about 9 mins (this happens automatically, all you need to do is add the SSD to your server and install Velobit one time, which takes 2 minutes). Solr should run much faster after that. The added benefit of the solution is that you would have also boosted the steady state performance by 4x. Let me know if you are interested in trying it out and I'll set you up to talk with my engineers. Best regards, Peter Velikin VP Online Marketing, VeloBit, Inc. pe...@velobit.com tel. 978-263-4800 mob. 617-306-7165 VeloBit provides plug & play SSD caching software that dramatically accelerates applications at a remarkably low cost. The software installs seamlessly in less than 10 minutes and automatically tunes for fastest application speed. Visit www.velobit.com for details. -Original Message- From: dan sutton [mailto:danbsut...@gmail.com] Sent: Friday, January 27, 2012 9:44 AM To: solr-user Subject: Solr Warm-up performance issues Hi List, We use Solr 4.0.2011.12.01.09.59.41 and have a dataset of roughly 40 GB. Every day we produce a new dataset of 40 GB and have to switch one for the other. Once the index switch over has taken place, it takes roughly 30 min for Solr to reach maximum performance. Are there any hardware or software solutions to reduce the warm-up time ? We tried warm-up queries but it didn't change much. Our hardware specs is: * Dell Poweredge 1950 * 2 x Quad-Core Xeon E5405 (2.00GHz) * 48 GB RAM * 2 x 146 GB SAS 3 Gb/s 15K RPM disk configured in RAID mirror One thing that does seem to take a long time is un-inverting a set of multivalued fields, are there any optimizations we might be able to use here? Thanks for your help. Dan