SolrJ Socket Leak
I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part of a web application which connects to the solr server via solrj using CloudSolrServer(); The web application is wired up with Guice, and there is a single instance of the CloudSolrServer class used by all inbound requests. All this is running on Amazon. Basically, everything looks and runs fine for a while, but even with moderate concurrency, solrj starts leaving sockets open. We are handling only about 250 connections to the web app per minute and each of these issues from 3 - 7 requests to solr. Over a 30 minute period of this type of use, we end up with many 1000s of lingering sockets. I can see these when running netstats tcp0 0 ip-10-80-14-26.ec2.in:41098 ip-10-99-145-47.ec2.i:glrpc TIME_WAIT All to the same target host, which is my solr server. There are no other pieces of infrastructure on that box, just solr. Eventually, the server just dies as no further sockets can be opened and the opened ones are not reused. The solr server itself is unphased and running like a champ. Average timer per request of 0.126, as seen in the solr web app admin UI query handler stats. Apache httpclient had a bunch of leakage from version 4.2.x that they cleaned up and refactored in 4.3.x, which is why I upgraded. Currently, solrj makes use of the old leaky 4.2 classes for establishing connections and using a connection pool. http://www.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.3.x.txt -- Jared Rodriguez
Re: SolrJ Socket Leak
Thanks Shawn, I just regressed to Solrj 4.6.1 with http client 4.2.6 and am trying to reproduce the problem. Using YourKit to profile and even just manually simulating a few users at once, I see the same problem of open sockets. 6 sockets opened to the solr server and 2 of them still open after all is done and there is no server activity. Although this could be sockets kept in a connection pool. On Thu, Feb 13, 2014 at 4:11 PM, Shawn Heisey wrote: > On 2/13/2014 1:38 PM, Jared Rodriguez wrote: > >> I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part >> of a web application which connects to the solr server via solrj >> using CloudSolrServer(); The web application is wired up with Guice, and >> there is a single instance of the CloudSolrServer class used by all >> inbound >> requests. All this is running on Amazon. >> >> Basically, everything looks and runs fine for a while, but even with >> moderate concurrency, solrj starts leaving sockets open. We are handling >> only about 250 connections to the web app per minute and each of these >> issues from 3 - 7 requests to solr. Over a 30 minute period of this type >> of use, we end up with many 1000s of lingering sockets. I can see these >> when running netstats >> >> tcp0 0 ip-10-80-14-26.ec2.in:41098ip-10-99-145-47.ec2.i:glrpc >> TIME_WAIT >> >> All to the same target host, which is my solr server. There are no other >> pieces of infrastructure on that box, just solr. Eventually, the server >> just dies as no further sockets can be opened and the opened ones are not >> reused. >> >> The solr server itself is unphased and running like a champ. Average >> timer >> per request of 0.126, as seen in the solr web app admin UI query handler >> stats. >> >> Apache httpclient had a bunch of leakage from version 4.2.x that they >> cleaned up and refactored in 4.3.x, which is why I upgraded. Currently, >> solrj makes use of the old leaky 4.2 classes for establishing connections >> and using a connection pool. >> > > This is something that I can look into. > > I have a SolrJ program with SolrJ 4.5.1 and HttpClient 4.3.1 that does not > leak anything. I thought it was migrated already to SolrJ 4.6.1, but now > that I know it's not, I will upgrade SolrJ first and then HttpClient, and > see whether I have the same problem with either upgrade. > > I am using HttpSolrServer, not CloudSolrServer, because the Solr servers > are not running SolrCloud. CloudSolrServer ultimately uses HttpSolrServer > for its communication, so my initial thought is that this is not important, > but we'll see. > > In version 4.7, Solr will include HttpClient 4.3.1. See SOLR-5590. > > https://issues.apache.org/jira/browse/SOLR-5590 > > A question for committers with a lot of experience: Do we have any tests > that check for connection leaks? > > Thanks, > Shawn > > -- Jared Rodriguez
Re: SolrJ Socket Leak
Thanks for the info, I will look into the open file count and try to provide more info on how this is occurring. Just to make sure that our scenarios were the same, in your tests did you simulate many concurrent inbound connections to your web app, with each connection sharing the same instance of HttpSolrServer for queries? On Thu, Feb 13, 2014 at 6:58 PM, Shawn Heisey wrote: > On 2/13/2014 3:17 PM, Jared Rodriguez wrote: > >> I just regressed to Solrj 4.6.1 with http client 4.2.6 and am trying to >> reproduce the problem. Using YourKit to profile and even just manually >> simulating a few users at once, I see the same problem of open sockets. 6 >> sockets opened to the solr server and 2 of them still open after all is >> done and there is no server activity. Although this could be sockets kept >> in a connection pool. >> > > I did two separate upgrade steps, SolrJ 4.5.1 to 4.6.1, and HttpClient > 4.3.1 to 4.3.2, and I'm not seeing any evidence of connection leaks. > > > On your connections, if they are in TIME_WAIT, I'm pretty sure that means > that the program is done with them because it's closed the connection and > it's the operating system that is in charge. See the answer with the green > checkmark here: > > http://superuser.com/questions/173535/what-are-close-wait-and-time-wait- > states > > I think the default timeout for WAIT states on a modern Linux system is 60 > seconds, not four minutes as described on that answer. > > With your connection rate and the default 60 second timeout for WAIT > states, another resource that might be in short supply is file descriptors. > > Thanks, > Shawn > > -- Jared Rodriguez
Re: SolrJ Socket Leak
Kiran & Shawn, Thank you both for the info and you are both absolutely correct. The issue was not that sockets were leaked, but that wait time thing is a killer. I ended up fixing the problem by changing the system property of "http.maxConnections" which is used internally to Apache httpclient to setup the PoolingClientConnectionManager. Previously, this had no value, and was defaulting to 5. That meant that any time there were more than 50 (maxConnections * maxperroute) concurrent connections to the Solr server, non reusable connections were opening and closing and thus sitting in that idle state .. too many sockets. The fix was simply tuning the pool and setting "http.maxConnections" to a higher value representing the number of concurrent users that I expect. Problem fixed, and a modest speed improvement simply by higher socket reuse. Thank you both for the help! Jared On Mon, Feb 17, 2014 at 3:03 AM, Kiran Chitturi < kiran.chitt...@lucidworks.com> wrote: > Jared, > > I faced a similar issue when using CloudSolrServer with Solr. As Shawn > pointed out the 'TIME_WAIT' status happens when the connection is closed > by the http client. HTTP client closes connection whenever it thinks the > connection is stale > ( > https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html > #d5e405). Even the docs point out the stale connection checking cannot be > all reliable. > > I see two ways to get around this: > > 1. Enable 'SO_REUSEADDR' > 2. Disable stale connection checks. > > Also by default, when we create CSS it does not explicitly configure any > http client parameters > ( > https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/a > pache/solr/client/solrj/impl/CloudSolrServer.java#L124). In this case, the > default configuration parameters (max connections, max connections per > host) are used for a http connection. You can explicitly configure these > params when creating CSS using HttpClientUtil: > > ModifiableSolrParams params = new ModifiableSolrParams(); > params.set(HttpClientUtil.PROP_MAX_CONNECTIONS, 128); > params.set(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 32); > params.set(HttpClientUtil.PROP_FOLLOW_REDIRECTS, false); > params.set(HttpClientUtil.PROP_CONNECTION_TIMEOUT, 3); > httpClient = HttpClientUtil.createClient(params); > > final HttpClient client = HttpClientUtil.createClient(params); > LBHttpSolrServer lb = new LBHttpSolrServer(client); > CloudSolrServer server = new CloudSolrServer(zkConnect, lb); > > > Currently, I am using http client 4.3.2 and building the client when > creating the CSS. I also use 'SO_REUSEADDR' option and I haven't seen the > 'TIME_WAIT' after this (may be because of better handling of stale > connections in 4.3.2 or because of 'SO_REUSEADDR' param enabled). My > current http client code looks like this: (works only with http client > 4.3.2) > > HttpClientBuilder httpBuilder = HttpClientBuilder.create(); > > Builder socketConfig = SocketConfig.custom(); > socketConfig.setSoReuseAddress(true); > socketConfig.setSoTimeout(1); > httpBuilder.setDefaultSocketConfig(socketConfig.build()); > httpBuilder.setMaxConnTotal(300); > httpBuilder.setMaxConnPerRoute(100); > > httpBuilder.disableRedirectHandling(); > httpBuilder.useSystemProperties(); > LBHttpSolrServer lb = new LBHttpSolrServer(httpClient, parser) > CloudSolrServer server = new CloudSolrServer(zkConnect, lb); > > > There should be a way to configure socket reuse with 4.2.3 too. You can > try different configurations. I am surprised you have 'TIME_WAIT' > connections even after 30 minutes because 'TIME_WAIT' connection should be > closed by default in 2 mins by O.S I think. > > > HTH, > > -- > Kiran Chitturi, > > > On 2/13/14 12:38 PM, "Jared Rodriguez" wrote: > > >I am using solr/solrj 4.6.1 along with the apache httpclient 4.3.2 as part > >of a web application which connects to the solr server via solrj > >using CloudSolrServer(); The web application is wired up with Guice, and > >there is a single instance of the CloudSolrServer class used by all > >inbound > >requests. All this is running on Amazon. > > > >Basically, everything looks and runs fine for a while, but even with > >moderate concurrency, solrj starts leaving sockets open. We are handling > >only about 250 connections to the web app per minute and each of these > >issues from 3 - 7 requests to solr. Over a 30 minute pe
Re: Using the Collections API
I have used both and they seem to work well for basic operations - create, delete, etc. Although newer operations like reload do not function as they should - the cores in the collection stay offline even if there are no material changes. On Wed, May 15, 2013 at 6:53 AM, A.Eibner wrote: > Hi, > > I just wanted to ask, if anyone is using the collections API to create > collections, > or if not how they use the coreAPI to create a collection with replication > ? > > Because I run into errors when creating a collection on an empty solr. > > Kind regards > Alexander > -- Jared Rodriguez
Re: Using the Collections API
Hi Mark, Yes, I am using reload. Here is the jira that I filed. https://issues.apache.org/jira/browse/SOLR-4805 Please let me know if there is any additional data that you need. On Wed, May 15, 2013 at 12:53 PM, Mark Miller wrote: > > On May 15, 2013, at 12:26 PM, Jared Rodriguez > wrote: > > > the cores in the collection stay offline even if there are no > > material changes. > > I've used reload - if you are having trouble with it, please post more > details or file a JIRA issue. > > - Mark -- Jared Rodriguez
Re: Using the Collections API
Hi Alexander, So it sounds like you want the collection created with a master and a replica and you want one to be on each node? If so, I believe that you can get that effect by specifying maxShardsPerNode=1 as part of your url line. This will tell solr to create the master and replica that you desire but to not put them on the same node. Your url would look like: http://app02:9985/solr/admin/collections?action=CREATE&name=storage&numShards=1&replicationFactor=2&collection.configName=storage-conf&maxShardsPerNode=1 The SolrCloud wiki does a good job of explaining the params and how they function. http://wiki.apache.org/solr/SolrCloud Jared On Fri, May 17, 2013 at 4:57 AM, A.Eibner wrote: > Hi, sorry for the delay. > > I have two live nodes (also zookeeper knows these two > [app02:9985_solrl,app03:9985_**solr]) > > But when I want to create a collection via: > > http://app02:9985/solr/admin/**collections?action=CREATE&** > name=storage&numShards=1&**replicationFactor=2&** > collection.configName=storage-**conf<http://app02:9985/solr/admin/collections?action=CREATE&name=storage&numShards=1&replicationFactor=2&collection.configName=storage-conf> > > Both replicas will be created on app02. > > Any clues ? > Should I post anything else? > > Regards > Alexander > > Am 2013-05-15 14:48, schrieb Mark Miller: > > Yeah, I use both on an empty Solr - what is the error? >> >> - Mark >> >> On May 15, 2013, at 6:53 AM, A.Eibner wrote: >> >> Hi, >>> >>> I just wanted to ask, if anyone is using the collections API to create >>> collections, >>> or if not how they use the coreAPI to create a collection with >>> replication ? >>> >>> Because I run into errors when creating a collection on an empty solr. >>> >>> Kind regards >>> Alexander >>> >> >> >> > -- Jared Rodriguez