Re: Getting False Tomcat Down Alert

2024-01-03 Thread Olaf Kock


On 03.01.24 07:55, Chaudhary, Mohit wrote:


Hello Team,

We have RHEL 6.10 server and  configured custom script in crontab to 
check 8080 port is up or not, if 8080 is down then getting email 
alert. But some time we are facing the false alert for 2 to 3 min. 
When we are checking the tomcat services it was up and running fine 
and nothing was written in logs also.


So is it possible that 8080 port will be down for few minutes when 
tomcat is facing heavy traffic? Or what will be the reason for facing 
the false alert?.


To me, the question is rather: *How* does your custom script check port 
8080 being up? Because you say that you're checking as well, and this 
result disagrees with the result of the manual check. With two different 
results, I'd first inspect the custom script for its behavior, and 
suspect it to generate a false positive.


It being a script implies that you might be able to share it (or the 
relevant parts of it) here (?)


Olaf


RE: Getting False Tomcat Down Alert

2024-01-03 Thread Chaudhary, Mohit
Hi,

Please find below script code which has been written.

STAT=`netstat -luptn | grep 8080 | awk '{print $6}'`
if [[ "$STAT" != "LISTEN" ]];
then
echo "Tomcat instance down" >> $MESSAGE
mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE

Thanks & Regards,
Mohit Chaudhary


-Original Message-
From: Olaf Kock 
Sent: Wednesday, January 3, 2024 2:03 PM
To: users@tomcat.apache.org
Subject: Re: Getting False Tomcat Down Alert

[You don't often get email from tom...@olafkock.de. Learn why this is important 
at https://aka.ms/LearnAboutSenderIdentification ]

On 03.01.24 07:55, Chaudhary, Mohit wrote:
>
> Hello Team,
>
> We have RHEL 6.10 server and  configured custom script in crontab to
> check 8080 port is up or not, if 8080 is down then getting email
> alert. But some time we are facing the false alert for 2 to 3 min.
> When we are checking the tomcat services it was up and running fine
> and nothing was written in logs also.
>
> So is it possible that 8080 port will be down for few minutes when
> tomcat is facing heavy traffic? Or what will be the reason for facing
> the false alert?.
>
To me, the question is rather: *How* does your custom script check port
8080 being up? Because you say that you're checking as well, and this result 
disagrees with the result of the manual check. With two different results, I'd 
first inspect the custom script for its behavior, and suspect it to generate a 
false positive.

It being a script implies that you might be able to share it (or the relevant 
parts of it) here (?)

Olaf



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Getting False Tomcat Down Alert

2024-01-03 Thread Olaf Kock

Here's an option:

On 03.01.24 09:41, Chaudhary, Mohit wrote:

Hi,

Please find below script code which has been written.

STAT=`netstat -luptn | grep 8080 | awk '{print $6}'`
if [[ "$STAT" != "LISTEN" ]];
then
echo "Tomcat instance down" >> $MESSAGE
mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE

Thanks & Regards,
Mohit Chaudhary


netstat -luptn contains the "p" option (which lists the PID/Program 
name). It also contains -u, which includes UDP ports. Both most likely 
not helpful in your case, and a sign that this script was quickly 
gobbled together and not well designed.


grep 8080 does not just match for a port, but will also trigger for any 
listed PID (or other output) containing those 4 characters. As you'll 
also see numeric IPV6 addresses (local or foreign ones) - and even local 
IPV6 addresses can change over time - there's another possibility for 
unintended matches.


So, assuming that there is some other output from netstat that somewhere 
contains "8080", but not "LISTEN" (or maybe if the output is 
multi-line), you'll get a false positive hit.


To validate that you're running into such an issue, you can add the 
grepped netstat output to the mail (before applying awk) - so either 
cache that output, or simply execute it again, piping it to $MESSAGE.


Olaf



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Getting False Tomcat Down Alert

2024-01-03 Thread Olaf Kock



On 03.01.24 10:18, Olaf Kock wrote:

Here's an option:

On 03.01.24 09:41, Chaudhary, Mohit wrote:

Hi,

Please find below script code which has been written.

STAT=`netstat -luptn | grep 8080 | awk '{print $6}'`
if [[ "$STAT" != "LISTEN" ]];
then
echo "Tomcat instance down" >> $MESSAGE
mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE

Thanks & Regards,
Mohit Chaudhary


netstat -luptn contains the "p" option (which lists the PID/Program 
name). It also contains -u, which includes UDP ports. Both most likely 
not helpful in your case, and a sign that this script was quickly 
gobbled together and not well designed.


grep 8080 does not just match for a port, but will also trigger for 
any listed PID (or other output) containing those 4 characters. As 
you'll also see numeric IPV6 addresses (local or foreign ones) - and 
even local IPV6 addresses can change over time - there's another 
possibility for unintended matches.


So, assuming that there is some other output from netstat that 
somewhere contains "8080", but not "LISTEN" (or maybe if the output is 
multi-line), you'll get a false positive hit.


To validate that you're running into such an issue, you can add the 
grepped netstat output to the mail (before applying awk) - so either 
cache that output, or simply execute it again, piping it to $MESSAGE.



One more nail in the coffin:

Validate by just executing "netstat -luptn | grep 8" on the command line 
(to simulate multiple hits), and look at the output that you (would) get 
from awk: The -u option doesn't even result in netstat to print LISTEN 
in column 6, so awk prints the PID column. And indeed, if multiple lines 
match your grep, you're comparing (e.g.) "LISTEN LISTEN 34523/java" to 
"LISTEN", which obviously does not match.


Which means: Your script is definitely wrong. Tomcat is not to blame.

Best,

Olaf



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Tomcat/Java starts using too much memory and not by the heap or non-heap memory

2024-01-03 Thread Christopher Schultz

Brian,

On 12/30/23 15:42, Brian Braun wrote:

At the beginning, this was the problem: The OOM-killer (something that I
never knew existed) killing Tomcat unexpectedly and without any
explanation


The explanation is always the same: some application requests memory 
from the kernel, which always grants the request(!). When the 
application tries to use that memory, the kernel scrambles to physically 
allocate the memory on-demand and, if all the memory is gone, it will 
pick a process and kill it.


There are ways to prevent this from happening, but the best way to not 
to over-commit your memory.



Not knowing how much memory would I need to satisfy the JVM, and not
willing to migrate to more expensive Amazon instances just because I
don't know why this is happening. And not knowing if the memory
requirement would keep growing and growing and growing.
It might. But if your symptom is Linux oom-killer and not JVM OOME, then 
the better technique is to *reduce* your heap space in the JVM.



Then I activated the SWAP file, and I discovered that this problem stops at
1.5GB of memory used by the JVM. At least I am not getting more crashes
anymore. But I consider the SWAP file as a palliative and I really want to
know what is the root of this problem. If I don't, then maybe I should
consider another career. I don't enjoy giving up.


Using a swap file is probably going to kill your performance. What 
happens if you make your heap smaller?



Yes, the memory used by the JVM started to grow suddenly one day, after
several years running fine. Since I had not made any changes to my app, I
really don't know the reason. And I really think this should not be
happening without an explanation.

I don't have any Java OOME exceptions, so it is not that my objects don't
fit. Even if I supply 300MB to the -Xmx parameter. In fact, as I wrote, I
don't think the Heap and non-heap usage is the problem. I have been
inspecting those and their usage seems to be normal/modest and steady. I
can see that using the Tomcat Manager as well as several other tools (New
Relic, VisualVM, etc).


Okay, so what you've done then is to allow a very large heap that you 
mostly don't need. If/when the heap grows a lot -- possibly suddenly -- 
the JVM is lazy and just takes more heap space from the OS and 
ultimately you run out of main memory.


The solution is to reduce the heap size.


Regarding the 1GB I am giving now to the -Xms parameter: I was giving just
a few hundreds and I already had the problem. Actually I think it is the
same if I give a few hundreds of MBs or 1GB, the JVM still starts using
more memory after 3-4 days of running until it takes 1.5GB. But during the
first 1-4 days it uses just a few hundred MBs.

My app has been "static" as you say, but probably I have upgraded Tomcat
and/or Java recently. I don't really remember. Maybe one of those upgrades
brought this issue as a result. Actually, If I knew that one of those
upgrades causes this huge pike in memory consumption and there is no way to
avoid it, then I would accept it as a fact of life and move on. But since I
don't know, it really bugs me.

I have the same amount of users and traffic as before. I also know how much
memory a session takes and it is fine.  I have also checked the HTTP(S)
requests to see if somehow I am getting any attempts to hack my instance
that could be the root of this problem. Yes, I get hacking attempts by
those bots all the time, but I don't see anything relevant there. No news.

I agree with what you say now regarding the GC. I should not need to use
those switches since I understand it should work fine without using them.
And I don't know how to use them. And since I have never cared about using
them for about 15 years using Java+Tomcat, why should I start now?

I have also checked all my long-lasting objects. I have optimized my DB
queries recently as you suggest now, so they don't create huge amounts of
objects in a short period of time that the GC would have to deal with. The
same applies to my scheduled tasks. They all run very quickly and use
modest amounts of memory. All the other default Tomcat threads create far
more objects.

I have already activated the GC log. Is there a tool that you would suggest
to analyze it? I haven't even opened it. I suspect that the root of my
problem comes from the GC process indeed.


The GC logs are just text, so you can eyeball them if you'd like, but to 
really get a sense of what's happening you should use some kind of 
visualization tool.


It's not pretty, but gcviewer (https://github.com/chewiebug/GCViewer) 
gets the job done.


If you run with a 500MiB heap and everything looks good and you have no 
crashes (Linux oom-killer or Java OOME), I'd stick with that. Remember 
that your total OS memory requirements will be Java heap + JVM overhead 
+ whatever native memory is required by native libraries.


In production, I have an application with a 2048MiB heap whose "resident 
size" in `ps` shows as 

Re: Getting False Tomcat Down Alert

2024-01-03 Thread Christopher Schultz

Olaf,

On 1/3/24 04:18, Olaf Kock wrote:

Here's an option:

On 03.01.24 09:41, Chaudhary, Mohit wrote:

Hi,

Please find below script code which has been written.

STAT=`netstat -luptn | grep 8080 | awk '{print $6}'`
if [[ "$STAT" != "LISTEN" ]];
then
echo "Tomcat instance down" >> $MESSAGE
mail -s "Tomcat Instance Down on $HOSTNAME" $mailto < $MESSAGE

Thanks & Regards,
Mohit Chaudhary


netstat -luptn contains the "p" option (which lists the PID/Program 
name). It also contains -u, which includes UDP ports. Both most likely 
not helpful in your case, and a sign that this script was quickly 
gobbled together and not well designed.


grep 8080 does not just match for a port, but will also trigger for any 
listed PID (or other output) containing those 4 characters. As you'll 
also see numeric IPV6 addresses (local or foreign ones) - and even local 
IPV6 addresses can change over time - there's another possibility for 
unintended matches.


So, assuming that there is some other output from netstat that somewhere 
contains "8080", but not "LISTEN" (or maybe if the output is 
multi-line), you'll get a false positive hit.


To validate that you're running into such an issue, you can add the 
grepped netstat output to the mail (before applying awk) - so either 
cache that output, or simply execute it again, piping it to $MESSAGE.


+1

The regular expression used with grep should be improved a lot.

I would recommend at least the following:

STAT=`netstat -luptn 2>/dev/null | grep '^tcp.*:8080[^:0-9]' | awk 
'{print $6}'`


In my Linux environment, if this check isn't being run by root, it will 
print this message on stderr:


(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)

This is why I've added the 2>/dev/null to the netstat command.

The improved regexp will ignore non-TCP ports and will only match on a 
proper port-number by requiring the presence of a : and being followed 
by anything other than a : (which would indicate it's an IPv6 address) 
or more numbers (which could be a port number like 80800 or more of an 
IPv6 address).


netstat is a pretty crude tool to be used, here. Why not just connect to 
the service on port 8080 and see if it responds? The process listening 
on the port doesn't guarantee it's actually able to serve any requests...


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Getting False Tomcat Down Alert

2024-01-03 Thread Olaf Kock


On 03.01.24 15:34, Christopher Schultz wrote:

Olaf,

+1

The regular expression used with grep should be improved a lot.

I would recommend at least the following:

STAT=`netstat -luptn 2>/dev/null | grep '^tcp.*:8080[^:0-9]' | awk 
'{print $6}'`


...or omit the UDP output by using "netstat -lptn" to begin with ;)

The improved regexp will ignore non-TCP ports and will only match on a 
proper port-number by requiring the presence of a : and being followed 
by anything other than a : (which would indicate it's an IPv6 address) 
or more numbers (which could be a port number like 80800 or more of an 
IPv6 address).


wouldn't the :8080[^:0-9] still hit on an IPV6 address /ending/ in 8080? 
Still an improvement, as it'd be 1 false alarm per 4294967296 IPV6 
addresses ;)


netstat is a pretty crude tool to be used, here. Why not just connect 
to the service on port 8080 and see if it responds? The process 
listening on the port doesn't guarantee it's actually able to serve 
any requests...


Now I'm +1'ing, and can add:

As you said this, you triggered my memory: My toolbox has this gem, 
checking for http-status 200:


|#!/bin/bash status_code=$(curl --write-out %{http_code} --silent 
--output /dev/null http://localhost:8080/) if [[ "$status_code" -ne 200 
]] ; then echo "Tomcat Status: $status_code" | mail -s "Tomcat Down?" 
"someb...@example.com" -r "STATUS_CHECKER" else exit 0 fi |


Olaf

Re: Getting False Tomcat Down Alert

2024-01-03 Thread Christopher Schultz

Olaf,

On 1/3/24 09:52, Olaf Kock wrote:

On 03.01.24 15:34, Christopher Schultz wrote:

Olaf,

+1

The regular expression used with grep should be improved a lot.

I would recommend at least the following:

STAT=`netstat -luptn 2>/dev/null | grep '^tcp.*:8080[^:0-9]' | awk 
'{print $6}'`


...or omit the UDP output by using "netstat -lptn" to begin with ;)

The improved regexp will ignore non-TCP ports and will only match on a 
proper port-number by requiring the presence of a : and being followed 
by anything other than a : (which would indicate it's an IPv6 address) 
or more numbers (which could be a port number like 80800 or more of an 
IPv6 address).


wouldn't the :8080[^:0-9] still hit on an IPV6 address /ending/ in 8080? 
Still an improvement, as it'd be 1 false alarm per 4294967296 IPV6 
addresses ;)


Unlikely, as a LISTENing socket must be listening on a port number. The 
port number is always at the end, so you wouldn't see a bare IPv6 
address ending in :8080 with no trailing port number. :)


netstat is a pretty crude tool to be used, here. Why not just connect 
to the service on port 8080 and see if it responds? The process 
listening on the port doesn't guarantee it's actually able to serve 
any requests...


Now I'm +1'ing, and can add:

As you said this, you triggered my memory: My toolbox has this gem, 
checking for http-status 200:


|#!/bin/bash status_code=$(curl --write-out %{http_code} --silent 
--output /dev/null http://localhost:8080/) if [[ "$status_code" -ne 200 
]] ; then echo "Tomcat Status: $status_code" | mail -s "Tomcat Down?" 
"someb...@example.com" -r "STATUS_CHECKER" else exit 0 fi |


Yeah, HTTP-based status-checks are indeed pretty basic. You don't have 
to over-think them ;)


-chris

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org