Re: Getting False Tomcat Down Alert
On 03.01.24 07:55, Chaudhary, Mohit wrote: Hello Team, We have RHEL 6.10 server and configured custom script in crontab to check 8080 port is up or not, if 8080 is down then getting email alert. But some time we are facing the false alert for 2 to 3 min. When we are checking the tomcat services it was up and running fine and nothing was written in logs also. So is it possible that 8080 port will be down for few minutes when tomcat is facing heavy traffic? Or what will be the reason for facing the false alert?. To me, the question is rather: *How* does your custom script check port 8080 being up? Because you say that you're checking as well, and this result disagrees with the result of the manual check. With two different results, I'd first inspect the custom script for its behavior, and suspect it to generate a false positive. It being a script implies that you might be able to share it (or the relevant parts of it) here (?) Olaf
RE: Getting False Tomcat Down Alert
Hi, Please find below script code which has been written. STAT=`netstat -luptn | grep 8080 | awk '{print $6}'` if [[ "$STAT" != "LISTEN" ]]; then echo "Tomcat instance down" >> $MESSAGE mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE Thanks & Regards, Mohit Chaudhary -Original Message- From: Olaf Kock Sent: Wednesday, January 3, 2024 2:03 PM To: users@tomcat.apache.org Subject: Re: Getting False Tomcat Down Alert [You don't often get email from tom...@olafkock.de. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On 03.01.24 07:55, Chaudhary, Mohit wrote: > > Hello Team, > > We have RHEL 6.10 server and configured custom script in crontab to > check 8080 port is up or not, if 8080 is down then getting email > alert. But some time we are facing the false alert for 2 to 3 min. > When we are checking the tomcat services it was up and running fine > and nothing was written in logs also. > > So is it possible that 8080 port will be down for few minutes when > tomcat is facing heavy traffic? Or what will be the reason for facing > the false alert?. > To me, the question is rather: *How* does your custom script check port 8080 being up? Because you say that you're checking as well, and this result disagrees with the result of the manual check. With two different results, I'd first inspect the custom script for its behavior, and suspect it to generate a false positive. It being a script implies that you might be able to share it (or the relevant parts of it) here (?) Olaf - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Getting False Tomcat Down Alert
Here's an option: On 03.01.24 09:41, Chaudhary, Mohit wrote: Hi, Please find below script code which has been written. STAT=`netstat -luptn | grep 8080 | awk '{print $6}'` if [[ "$STAT" != "LISTEN" ]]; then echo "Tomcat instance down" >> $MESSAGE mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE Thanks & Regards, Mohit Chaudhary netstat -luptn contains the "p" option (which lists the PID/Program name). It also contains -u, which includes UDP ports. Both most likely not helpful in your case, and a sign that this script was quickly gobbled together and not well designed. grep 8080 does not just match for a port, but will also trigger for any listed PID (or other output) containing those 4 characters. As you'll also see numeric IPV6 addresses (local or foreign ones) - and even local IPV6 addresses can change over time - there's another possibility for unintended matches. So, assuming that there is some other output from netstat that somewhere contains "8080", but not "LISTEN" (or maybe if the output is multi-line), you'll get a false positive hit. To validate that you're running into such an issue, you can add the grepped netstat output to the mail (before applying awk) - so either cache that output, or simply execute it again, piping it to $MESSAGE. Olaf - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Getting False Tomcat Down Alert
On 03.01.24 10:18, Olaf Kock wrote: Here's an option: On 03.01.24 09:41, Chaudhary, Mohit wrote: Hi, Please find below script code which has been written. STAT=`netstat -luptn | grep 8080 | awk '{print $6}'` if [[ "$STAT" != "LISTEN" ]]; then echo "Tomcat instance down" >> $MESSAGE mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE Thanks & Regards, Mohit Chaudhary netstat -luptn contains the "p" option (which lists the PID/Program name). It also contains -u, which includes UDP ports. Both most likely not helpful in your case, and a sign that this script was quickly gobbled together and not well designed. grep 8080 does not just match for a port, but will also trigger for any listed PID (or other output) containing those 4 characters. As you'll also see numeric IPV6 addresses (local or foreign ones) - and even local IPV6 addresses can change over time - there's another possibility for unintended matches. So, assuming that there is some other output from netstat that somewhere contains "8080", but not "LISTEN" (or maybe if the output is multi-line), you'll get a false positive hit. To validate that you're running into such an issue, you can add the grepped netstat output to the mail (before applying awk) - so either cache that output, or simply execute it again, piping it to $MESSAGE. One more nail in the coffin: Validate by just executing "netstat -luptn | grep 8" on the command line (to simulate multiple hits), and look at the output that you (would) get from awk: The -u option doesn't even result in netstat to print LISTEN in column 6, so awk prints the PID column. And indeed, if multiple lines match your grep, you're comparing (e.g.) "LISTEN LISTEN 34523/java" to "LISTEN", which obviously does not match. Which means: Your script is definitely wrong. Tomcat is not to blame. Best, Olaf - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Tomcat/Java starts using too much memory and not by the heap or non-heap memory
Brian, On 12/30/23 15:42, Brian Braun wrote: At the beginning, this was the problem: The OOM-killer (something that I never knew existed) killing Tomcat unexpectedly and without any explanation The explanation is always the same: some application requests memory from the kernel, which always grants the request(!). When the application tries to use that memory, the kernel scrambles to physically allocate the memory on-demand and, if all the memory is gone, it will pick a process and kill it. There are ways to prevent this from happening, but the best way to not to over-commit your memory. Not knowing how much memory would I need to satisfy the JVM, and not willing to migrate to more expensive Amazon instances just because I don't know why this is happening. And not knowing if the memory requirement would keep growing and growing and growing. It might. But if your symptom is Linux oom-killer and not JVM OOME, then the better technique is to *reduce* your heap space in the JVM. Then I activated the SWAP file, and I discovered that this problem stops at 1.5GB of memory used by the JVM. At least I am not getting more crashes anymore. But I consider the SWAP file as a palliative and I really want to know what is the root of this problem. If I don't, then maybe I should consider another career. I don't enjoy giving up. Using a swap file is probably going to kill your performance. What happens if you make your heap smaller? Yes, the memory used by the JVM started to grow suddenly one day, after several years running fine. Since I had not made any changes to my app, I really don't know the reason. And I really think this should not be happening without an explanation. I don't have any Java OOME exceptions, so it is not that my objects don't fit. Even if I supply 300MB to the -Xmx parameter. In fact, as I wrote, I don't think the Heap and non-heap usage is the problem. I have been inspecting those and their usage seems to be normal/modest and steady. I can see that using the Tomcat Manager as well as several other tools (New Relic, VisualVM, etc). Okay, so what you've done then is to allow a very large heap that you mostly don't need. If/when the heap grows a lot -- possibly suddenly -- the JVM is lazy and just takes more heap space from the OS and ultimately you run out of main memory. The solution is to reduce the heap size. Regarding the 1GB I am giving now to the -Xms parameter: I was giving just a few hundreds and I already had the problem. Actually I think it is the same if I give a few hundreds of MBs or 1GB, the JVM still starts using more memory after 3-4 days of running until it takes 1.5GB. But during the first 1-4 days it uses just a few hundred MBs. My app has been "static" as you say, but probably I have upgraded Tomcat and/or Java recently. I don't really remember. Maybe one of those upgrades brought this issue as a result. Actually, If I knew that one of those upgrades causes this huge pike in memory consumption and there is no way to avoid it, then I would accept it as a fact of life and move on. But since I don't know, it really bugs me. I have the same amount of users and traffic as before. I also know how much memory a session takes and it is fine. I have also checked the HTTP(S) requests to see if somehow I am getting any attempts to hack my instance that could be the root of this problem. Yes, I get hacking attempts by those bots all the time, but I don't see anything relevant there. No news. I agree with what you say now regarding the GC. I should not need to use those switches since I understand it should work fine without using them. And I don't know how to use them. And since I have never cared about using them for about 15 years using Java+Tomcat, why should I start now? I have also checked all my long-lasting objects. I have optimized my DB queries recently as you suggest now, so they don't create huge amounts of objects in a short period of time that the GC would have to deal with. The same applies to my scheduled tasks. They all run very quickly and use modest amounts of memory. All the other default Tomcat threads create far more objects. I have already activated the GC log. Is there a tool that you would suggest to analyze it? I haven't even opened it. I suspect that the root of my problem comes from the GC process indeed. The GC logs are just text, so you can eyeball them if you'd like, but to really get a sense of what's happening you should use some kind of visualization tool. It's not pretty, but gcviewer (https://github.com/chewiebug/GCViewer) gets the job done. If you run with a 500MiB heap and everything looks good and you have no crashes (Linux oom-killer or Java OOME), I'd stick with that. Remember that your total OS memory requirements will be Java heap + JVM overhead + whatever native memory is required by native libraries. In production, I have an application with a 2048MiB heap whose "resident size" in `ps` shows as
Re: Getting False Tomcat Down Alert
Olaf, On 1/3/24 04:18, Olaf Kock wrote: Here's an option: On 03.01.24 09:41, Chaudhary, Mohit wrote: Hi, Please find below script code which has been written. STAT=`netstat -luptn | grep 8080 | awk '{print $6}'` if [[ "$STAT" != "LISTEN" ]]; then echo "Tomcat instance down" >> $MESSAGE mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE Thanks & Regards, Mohit Chaudhary netstat -luptn contains the "p" option (which lists the PID/Program name). It also contains -u, which includes UDP ports. Both most likely not helpful in your case, and a sign that this script was quickly gobbled together and not well designed. grep 8080 does not just match for a port, but will also trigger for any listed PID (or other output) containing those 4 characters. As you'll also see numeric IPV6 addresses (local or foreign ones) - and even local IPV6 addresses can change over time - there's another possibility for unintended matches. So, assuming that there is some other output from netstat that somewhere contains "8080", but not "LISTEN" (or maybe if the output is multi-line), you'll get a false positive hit. To validate that you're running into such an issue, you can add the grepped netstat output to the mail (before applying awk) - so either cache that output, or simply execute it again, piping it to $MESSAGE. +1 The regular expression used with grep should be improved a lot. I would recommend at least the following: STAT=`netstat -luptn 2>/dev/null | grep '^tcp.*:8080[^:0-9]' | awk '{print $6}'` In my Linux environment, if this check isn't being run by root, it will print this message on stderr: (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) This is why I've added the 2>/dev/null to the netstat command. The improved regexp will ignore non-TCP ports and will only match on a proper port-number by requiring the presence of a : and being followed by anything other than a : (which would indicate it's an IPv6 address) or more numbers (which could be a port number like 80800 or more of an IPv6 address). netstat is a pretty crude tool to be used, here. Why not just connect to the service on port 8080 and see if it responds? The process listening on the port doesn't guarantee it's actually able to serve any requests... -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Getting False Tomcat Down Alert
On 03.01.24 15:34, Christopher Schultz wrote: Olaf, +1 The regular expression used with grep should be improved a lot. I would recommend at least the following: STAT=`netstat -luptn 2>/dev/null | grep '^tcp.*:8080[^:0-9]' | awk '{print $6}'` ...or omit the UDP output by using "netstat -lptn" to begin with ;) The improved regexp will ignore non-TCP ports and will only match on a proper port-number by requiring the presence of a : and being followed by anything other than a : (which would indicate it's an IPv6 address) or more numbers (which could be a port number like 80800 or more of an IPv6 address). wouldn't the :8080[^:0-9] still hit on an IPV6 address /ending/ in 8080? Still an improvement, as it'd be 1 false alarm per 4294967296 IPV6 addresses ;) netstat is a pretty crude tool to be used, here. Why not just connect to the service on port 8080 and see if it responds? The process listening on the port doesn't guarantee it's actually able to serve any requests... Now I'm +1'ing, and can add: As you said this, you triggered my memory: My toolbox has this gem, checking for http-status 200: |#!/bin/bash status_code=$(curl --write-out %{http_code} --silent --output /dev/null http://localhost:8080/) if [[ "$status_code" -ne 200 ]] ; then echo "Tomcat Status: $status_code" | mail -s "Tomcat Down?" "someb...@example.com" -r "STATUS_CHECKER" else exit 0 fi | Olaf
Re: Getting False Tomcat Down Alert
Olaf, On 1/3/24 09:52, Olaf Kock wrote: On 03.01.24 15:34, Christopher Schultz wrote: Olaf, +1 The regular expression used with grep should be improved a lot. I would recommend at least the following: STAT=`netstat -luptn 2>/dev/null | grep '^tcp.*:8080[^:0-9]' | awk '{print $6}'` ...or omit the UDP output by using "netstat -lptn" to begin with ;) The improved regexp will ignore non-TCP ports and will only match on a proper port-number by requiring the presence of a : and being followed by anything other than a : (which would indicate it's an IPv6 address) or more numbers (which could be a port number like 80800 or more of an IPv6 address). wouldn't the :8080[^:0-9] still hit on an IPV6 address /ending/ in 8080? Still an improvement, as it'd be 1 false alarm per 4294967296 IPV6 addresses ;) Unlikely, as a LISTENing socket must be listening on a port number. The port number is always at the end, so you wouldn't see a bare IPv6 address ending in :8080 with no trailing port number. :) netstat is a pretty crude tool to be used, here. Why not just connect to the service on port 8080 and see if it responds? The process listening on the port doesn't guarantee it's actually able to serve any requests... Now I'm +1'ing, and can add: As you said this, you triggered my memory: My toolbox has this gem, checking for http-status 200: |#!/bin/bash status_code=$(curl --write-out %{http_code} --silent --output /dev/null http://localhost:8080/) if [[ "$status_code" -ne 200 ]] ; then echo "Tomcat Status: $status_code" | mail -s "Tomcat Down?" "someb...@example.com" -r "STATUS_CHECKER" else exit 0 fi | Yeah, HTTP-based status-checks are indeed pretty basic. You don't have to over-think them ;) -chris - To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org