If this is reproducible, I would run the comparison under Wireshark (used to be called Ehtereal) https://www.wireshark.org/ . It would capture full network traffic and can even be run on a machine separate from either client or server (in promiscuous mode).
Then, I would look at number of connections differences between HTTP and HTTPS for the same test. Perhaps HTTP is doing request pipelining and HTTPS does not. This would lead to more sockets (and more CLOSE_WAITs) for the same content. If the number of connection is the same, then I would pick a similar transaction and see the delays between the closing sequence FIN/SYN/whatever packets. If, after the server sends the closing packet, the client does not reply as fast with its own closing packet under HTTPS, then the problem is socket closing code. Obviously, SSL establishment of the connection is more painful/expensive than non-SSL, but the issue here is closing of one. This was the way I troubleshooted these scenarios many years ago as Weblogic senior tech support. I still think approaching this from network up is the most viable approach. Regards, Alex. ---- Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 10 July 2016 at 17:05, Shai Erera <ser...@gmail.com> wrote: > There is no firewall and the CLOSE_WAITs are between Solr-to-Solr nodes > (the origin and destination IP:PORT belong to Solr). > > Also, note that the same test runs fine on 5.4.1, even though there are > still few hundreds of CLOSE_WAITs. I'm looking at what has changed in the > code between 5.4.1 and 5.5.1. It's also only reproducible when Solr is run > in SSL mode, so the problem might lie in HttpClient/Jetty too. > > Shai > > On Fri, Jul 8, 2016 at 11:59 AM Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > >> Is there a firewall between a client and a server by any chance? >> >> CLOSE_WAIT is not a leak, but standard TCP step at the end. So the question >> is why sockets are reopened that often or why the other side does not >> acknowledge TCP termination packet fast. >> >> I would run Ethereal to troubleshoot that. And truss/strace. >> >> Regards, >> Alex >> On 8 Jul 2016 4:56 PM, "Mads Tomasgård Bjørgan" <m...@dips.no> wrote: >> >> FYI - we're using Solr-6.1.0, and the leak seems to be consequent (occurs >> every single time when running with SSL). >> >> -----Original Message----- >> From: Anshum Gupta [mailto:ans...@anshumgupta.net] >> Sent: torsdag 7. juli 2016 18.14 >> To: solr-user@lucene.apache.org >> Subject: Re: File Descriptor/Memory Leak >> >> I've created a JIRA to track this: >> https://issues.apache.org/jira/browse/SOLR-9290 >> >> On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote: >> >> > Shalin, we're seeing that issue too (and actually actively debugging >> > it these days). So far I can confirm the following (on a 2-node cluster): >> > >> > 1) It consistently reproduces on 5.5.1, but *does not* reproduce on >> > 5.4.1 >> > 2) It does not reproduce when SSL is disabled >> > 3) Restarting the Solr process (sometimes both need to be restarted), >> > the count drops to 0, but if indexing continues, they climb up again >> > >> > When it does happen, Solr seems stuck. The leader cannot talk to the >> > replica, or vice versa, the replica is usually put in DOWN state and >> > there's no way to fix it besides restarting the JVM. >> > >> > Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that >> > looked suspicious (SOLR-8451 and SOLR-8578), even though the changes >> > look legit. That did not help, and honestly I've done that before we >> > suspected it might be the SSL. Therefore I think those are "safe", but >> just FYI. >> > >> > When it does happen, the number of CLOSE_WAITS climb very high, to the >> > order of 30K+ entries in 'netstat'. >> > >> > When I say it does not reproduce on 5.4.1 I really mean the numbers >> > don't go as high as they do in 5.5.1. Meaning, when running without >> > SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I >> > would separately like to understand why we have any in that state at >> > all). When running with SSL and 5.4.1, they stay low at the order of >> > hundreds the most. >> > >> > Unfortunately running without SSL is not an option for us. We will >> > likely roll back to 5.4.1, even if the problem exists there, but to a >> > lesser degree. >> > >> > I will post back here when/if we have more info about this. >> > >> > Shai >> > >> > On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar < >> > shalinman...@gmail.com> >> > wrote: >> > >> > > I have myself seen this CLOSE_WAIT issue at a customer. I am running >> > > some tests with different versions trying to pinpoint the cause of this >> leak. >> > > Once I have some more information and a reproducible test, I'll open >> > > a >> > jira >> > > issue. I'll keep you posted. >> > > >> > > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no> >> > > wrote: >> > > >> > > > Hello there, >> > > > Our SolrCloud is experiencing a FD leak while running with SSL. >> > > > This is occurring on the one machine that our program is sending >> > > > data too. We >> > > have >> > > > a total of three servers running as an ensemble. >> > > > >> > > > While running without SSL does the FD Count remain quite constant >> > > > at around 180 while indexing. Performing a garbage collection also >> > > > clears almost the entire JVM-memory. >> > > > >> > > > However - when indexing with SSL does the FDC grow polynomial. The >> > count >> > > > increases with a few hundred every five seconds or so, but reaches >> > easily >> > > > 50 000 within three to four minutes. Performing a GC swipes most >> > > > of the memory on the two machines our program isn't transmitting >> > > > the data >> > > directly >> > > > to. The last machine is unaffected by the GC, and both memory nor >> > > > FDC doesn't reset before Solr is restarted on that machine. >> > > > >> > > > Performing a netstat reveals that the FDC mostly consists of >> > > > TCP-connections in the state of "CLOSE_WAIT". >> > > > >> > > > >> > > > >> > > >> > > >> > > -- >> > > Regards, >> > > Shalin Shekhar Mangar. >> > > >> > >> >> >> >> -- >> Anshum Gupta >>