Subject: Issue with Non-Printable Characters in Java API Response

2025-02-20 Thread Ramavtar Pareek
Hello Tomcat Community,

I am facing an issue where some specific keys in my API response
contain non-printable
characters instead of the expected Hindi characters. The issue occurs in
our production environment, which has the following architecture:
System Flow:

   1.

   A Varnish server receives the request.
   2.

   Varnish forwards the request to our Ensemble API (hosted on Tomcat).
   3.

   Ensemble API calls Core API, which returns a response.
   4.

   Ensemble API processes the response and sends it back to Tomcat.
   5.

   Tomcat returns the final response to Varnish, which then sends it to the
   client.

Issue Observed:

   -

   Some keys in the Ensemble API’s response contain non-printable characters
   instead of Hindi text.
   -

   This is not happening for all Hindi characters, only for some specific
   keys.
   -

   The issue is not reproducible in local environments, only occurring
in production
   servers.

What Has Been Configured:

Tomcat Response Settings in Ensemble API:
res.setContentType("text/html");

res.setCharacterEncoding("UTF-8");


Tomcat setenv.sh Configuration:

export CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8"

Core API is correctly returning Hindi characters when tested independently.

No issues found in JSON serialization (Jackson) when logging the response
in Ensemble API before sending it.
Possible Causes & Questions:

   -

   Could Tomcat's encoding settings still be affecting this, even though
   file.encoding=UTF-8 is set?
   -

   The only missing thing in Production Server tomcat is *URIEncoding="UTF-8"
   /> in server.xml file*. But it is only applicable to correctly
   encode/decode input params. Can it affect the response too?
   -

   Could there be an issue with how Jackson serializes the JSON, causing
   certain Hindi characters to break?
   -

   Could the underlying OS locale affect this behavior? (Checked with locale
   but didn’t find any obvious issues.The locale of both stage and production
   servers is LANG=en_US.UTF-8)
   -

   Are there specific headers we need to check to ensure UTF-8 is
   maintained throughout the request/response cycle?

Next Steps:

I would really appreciate any insights into what might be causing this
issue. Please let me know if there are any specific logs, tests, or
additional configurations I should check.

Thanks


Re: Subject: Issue with Non-Printable Characters in Java API Response

2025-02-20 Thread Christopher Schultz

Ramavtar,

On 2/20/25 6:46 AM, Ramavtar Pareek wrote:

I am facing an issue where some specific keys in my API response
contain non-printable
characters instead of the expected Hindi characters. The issue occurs in
our production environment, which has the following architecture:
System Flow:

1.

A Varnish server receives the request.
2.

Varnish forwards the request to our Ensemble API (hosted on Tomcat).
3.

Ensemble API calls Core API, which returns a response.
4.

Ensemble API processes the response and sends it back to Tomcat.
5.

Tomcat returns the final response to Varnish, which then sends it to the
client.

Issue Observed:

-

Some keys in the Ensemble API’s response contain non-printable characters
instead of Hindi text.
-

This is not happening for all Hindi characters, only for some specific
keys.
-

The issue is not reproducible in local environments, only occurring
in production
servers.

What Has Been Configured:

Tomcat Response Settings in Ensemble API:
res.setContentType("text/html");

res.setCharacterEncoding("UTF-8");


Tomcat setenv.sh Configuration:

export CATALINA_OPTS="$CATALINA_OPTS -Dfile.encoding=UTF-8"


Note that file.encoding doesn't change how Tomcat encodes any responses. 
It may change how the JVM reads files where no specific character 
encoding has been specified by the code reading the file.



Core API is correctly returning Hindi characters when tested independently.


What do you mean "Core API"? Is this your Java-based code responding to 
API requests from Tomcat, or is this confirmed using debugging/logging 
within your own application?



No issues found in JSON serialization (Jackson) when logging the response
in Ensemble API before sending it.
Possible Causes & Questions:

-

Could Tomcat's encoding settings still be affecting this, even though
file.encoding=UTF-8 is set?


Unlikely.


The only missing thing in Production Server tomcat is *URIEncoding="UTF-8"
/> in server.xml file*. But it is only applicable to correctly
encode/decode input params. Can it affect the response too?


This will only affect the character encoding when reading request 
parameters from a URL. Note that some web browsers do not provide a 
Content-Type when sending HTTP POST parameters, and the URIEncoding can 
be used as a default for this.



Could there be an issue with how Jackson serializes the JSON, causing
certain Hindi characters to break?


Unlikely, but possible.


Could the underlying OS locale affect this behavior? (Checked with locale
but didn’t find any obvious issues.The locale of both stage and production
servers is LANG=en_US.UTF-8)


Unlikely.


Are there specific headers we need to check to ensure UTF-8 is
maintained throughout the request/response cycle?


You should first verify that your services through Tomcat are working 
(or not) as you expect. Make your API calls directly to Tomcat without 
Varnish in the mix and report back.



I would really appreciate any insights into what might be causing this
issue. Please let me know if there are any specific logs, tests, or
additional configurations I should check.


What version of Tomcat are you using?

It looks like you may already have read this, but I'll post it here 
anyway just in case you haven't seen it:

https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding

-chris


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org