RE: Solr performance issues

2014-12-29 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: > I've the same index with a bit different schema and 200M documents, > installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size > of index is about 1.5TB, have many updates every 5 minutes, complex queries > and faceting with resp

Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 12:07 PM, Mahmoud Almokadem wrote: > What do you mean with "important parts of index"? and how to calculate their > size? I have no formal education in what's important when it comes to doing a query, but I can make some educated guesses. Starting with this as a reference: http://

Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks Shawn. What do you mean with "important parts of index"? and how to calculate their size? Thanks, Mahmoud Sent from my iPhone > On Dec 29, 2014, at 8:19 PM, Shawn Heisey wrote: > >> On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: >> I've the same index with a bit different schema and

Re: Solr performance issues

2014-12-29 Thread Shawn Heisey
On 12/29/2014 2:36 AM, Mahmoud Almokadem wrote: > I've the same index with a bit different schema and 200M documents, > installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size > of index is about 1.5TB, have many updates every 5 minutes, complex queries > and faceting with respon

Re: Solr performance issues

2014-12-29 Thread Mahmoud Almokadem
Thanks all. I've the same index with a bit different schema and 200M documents, installed on 3 r3.xlarge (30GB RAM, and 600 General Purpose SSD). The size of index is about 1.5TB, have many updates every 5 minutes, complex queries and faceting with response time of 100ms that is acceptable for us.

RE: Solr performance issues

2014-12-28 Thread Toke Eskildsen
Mahmoud Almokadem [prog.mahm...@gmail.com] wrote: > We've installed a cluster of one collection of 350M documents on 3 > r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is > about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS > General purpose (1x1TB + 1x500G

Re: Solr performance issues

2014-12-28 Thread Shawn Heisey
On 12/26/2014 7:17 AM, Mahmoud Almokadem wrote: > We've installed a cluster of one collection of 350M documents on 3 > r3.2xlarge (60GB RAM) Amazon servers. The size of index on each shard is > about 1.1TB and maximum storage on Amazon is 1 TB so we add 2 SSD EBS > General purpose (1x1TB + 1x500GB)

Re: Solr performance issues

2014-12-26 Thread Otis Gospodnetic
Likely lots of disk + network IO, yes. Put SPM for Solr on your nodes to double check. Otis > On Dec 26, 2014, at 09:17, Mahmoud Almokadem wrote: > > Dears, > > We've installed a cluster of one collection of 350M documents on 3 > r3.2xlarge (60GB RAM) Amazon servers. The size of index on eac

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Thanks. Only question is how to transition to this model. Our facet (string) fields contain timestamp prefixes, that are reverse ordered starting from the freshest value. In theory, we could try computing the filter queries for those. But before doing so, we would need the matched ids from solr,

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Michael Della Bitta
I guess so, you'd have to use a filter query to page through the set of documents you were faceting against and sum them all at the end. It's not quite the same operation as paging through results, because facets are aggregate statistics, but if you're willing to go through the trouble, I bet it wo

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Michael, Interesting! Do (Can) you apply this to facet searches as well? Dmitry On Mon, Apr 29, 2013 at 4:02 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > We've found that you can do a lot for yourself by using a filter query > to page through your data if it has a natu

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Michael Della Bitta
We've found that you can do a lot for yourself by using a filter query to page through your data if it has a natural range to do so instead of start and rows. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Abhishek, There is a wiki regarding this: http://wiki.apache.org/solr/CommonQueryParameters search "pageDoc and pageScore". On Mon, Apr 29, 2013 at 1:17 PM, Abhishek Sanoujam wrote: > We have a single shard, and all the data is in a single box only. > Definitely looks like "deep-paging" is ha

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Abhishek Sanoujam
We have a single shard, and all the data is in a single box only. Definitely looks like "deep-paging" is having problems. Just to understand, is the searcher looping over the result set everytime and skipping the first "start" count? This will definitely take a toll when we reach higher "start

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Dmitry Kan
Jan, Would the same distrib=false help for distributed faceting? We are running into a similar issue with facet paging. Dmitry On Mon, Apr 29, 2013 at 11:58 AM, Jan Høydahl wrote: > Hi, > > How many shards do you have? This is a known issue with deep paging with > multi shard, see https://is

Re: Solr performance issues for simple query - q=*:* with start and rows

2013-04-29 Thread Jan Høydahl
Hi, How many shards do you have? This is a known issue with deep paging with multi shard, see https://issues.apache.org/jira/browse/SOLR-1726 You may be more successful in going to each shard, one at a time (with &distrib=false) to avoid this issue. -- Jan Høydahl, search solution architect Co

Re: Solr Performance Issues

2010-03-17 Thread Lance Norskog
Try cutting back Solr's memory - the OS knows how to manage disk caches better than Solr does. Another approach is to raise and lower the queryResultCache and see if the hitratio changes. On Wed, Mar 17, 2010 at 9:44 AM, Siddhant Goel wrote: > Hi, > > Apparently the bottleneck seem to be the tim

Re: Solr Performance Issues

2010-03-17 Thread Siddhant Goel
Hi, Apparently the bottleneck seem to be the time periods when CPU is waiting to do some I/O. Out of all the numbers I can see, the CPU wait times for I/O seem to be the highest. I've alloted 4GB to Solr out of the total 8GB available. There's only 47MB free on the machine, so I assume the rest of

Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
Sounds like you're pretty well on your way then. This is pretty typical of multi-threaded situations... Threads 1-n wait around on I/O and increasing the number of threads increases throughput without changing (much) the individual response time. Threads n+1 - p don't change throughput much, but i

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
Hi, Thanks for your responses. It actually feels good to be able to locate where the bottlenecks are. I've created two sets of data - in the first one I'm measuring the time took purely on Solr's end, and in the other one I'm including network latency (just for reference). The data that I'm posti

Re: Solr Performance Issues

2010-03-12 Thread Erick Erickson
You've probably already looked at this, but here goes anyway. The first question probably should have been "what are you measuring"? I've been fooled before by looking at, say, average response time and extrapolating. You're getting 20 qps if your response time is 1 second, but you have 20 threads

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk caching. I think that at any point of time, there can be a maximum of concurrent requests, which happens to make sense btw (does it?). As I increase the number of threads, the load average shown by top goes up to as high

Re: Solr Performance Issues

2010-03-11 Thread Mike Malloy
I dont mean to turn this into a sales pitch, but there is a tool for Java app performance management that you may find helpful. Its called New Relic (www.newrelic.com) and the tool can be installed in 2 minutes. It can give you very deep visibility inside Solr and other Java apps. (Full disclosure

Re: Solr Performance Issues

2010-03-11 Thread Tom Burton-West
How much of your memory are you allocating to the JVM and how much are you leaving free? If you don't leave enough free memory for the OS, the OS won't have a large enough disk cache, and you will be hitting the disk for lots of queries. You might want to monitor your Disk I/O using iostat an

Re: Solr Performance Issues

2010-03-11 Thread Siddhant Goel
Hi Erick, The way the load test works is that it picks up 5000 queries, splits them according to the number of threads (so if we have 10 threads, it schedules 10 threads - each one sending 500 queries). So it might be possible that the number of queries at a point later in time is greater than the

Re: Solr Performance Issues

2010-03-11 Thread Erick Erickson
How many outstanding queries do you have at a time? Is it possible that when you start, you have only a few queries executing concurrently but as your test runs you have hundreds? This really is a question of how your load test is structured. You might get a better sense of how it works if your te

Re: Solr performance issues

2008-06-20 Thread Sébastien Rainville
On Fri, Jun 20, 2008 at 8:32 AM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote: > >> 2. I use acts_as_solr and by default they only make "post" requests, even >>> for /select. With that setup the response time for most queries, simple >>> or >>> comple

Re: Solr performance issues

2008-06-20 Thread Erik Hatcher
On Jun 19, 2008, at 6:28 PM, Yonik Seeley wrote: 2. I use acts_as_solr and by default they only make "post" requests, even for /select. With that setup the response time for most queries, simple or complex ones, were ranging from 150ms to 600ms, with an average of 250ms. I changed the sele

Re: Solr performance issues

2008-06-19 Thread Yonik Seeley
On Thu, Jun 19, 2008 at 6:11 PM, Sébastien Rainville <[EMAIL PROTECTED]> wrote: > I've been using solr for a little without worrying too much about how it > works but now it's becoming a bottleneck in my application. I have a couple > issues with it: > > 1. My index always gets slower and slower wh