Mitch,
First some general comments.
Projects at the ASF generally operate using lazy consensus meaning if
no-one objects after a reasonable amount of time (72 hours is a good
starting point for reasonable) then assume you have agreement to
proceed. Note that it is ApacheCon NA this week so a number of the
committers may be distracted and/or travelling.
It sounds like a good next step would be to create a Bugzilla
enhancement request and attach your patch.
Mark
On 27/09/2018 11:41, Mitch Claborn wrote:
Any further thoughts or comments on this? I think my patch is ready for
prime time now.
Mitch
On 09/22/2018 11:23 AM, Mitch Claborn wrote:
See below for answers to your questions.
Status update: I've been running my patch in production for about 16
hours with no problems. I've restarted each Tomcat (3) once and had no
problems, but also detected no errors, either on send or receive. I
have some code that I used in dev to force an error on a specific
combination of session attribute name and value. I'm going to put
that in prod so that I can test how it behaves with a large volume of
sessions and at least one error.
Mitch
On 09/21/2018 05:00 PM, Mark Thomas wrote:
On 21/09/18 18:02, Mitch Claborn wrote:
Please forgive me if this is the incorrect place or format for
discussing this. I'm new to trying to develop for Tomcat.
This is the right place. Welcome to the Tomcat community.
I'm developing a patch for DeltaManager and I'd like to discuss with
you
developers if it could be considered for inclusion in the base code.
Please see details below and comment.
Will do. Please note that session replication is not an area I am
particularly familiar with so if some of my comments are a little
off-base I apologise.
Problem: When the "all sessions" message is sent from one node to
another, when the receiving node is first starting up, I often run into
various errors with one of the sessions and it fails to deserialize.
This causes all the remaining sessions in that chunk
(sendAllSessionsSize) to be lost by the receiver.
Oops.
The problem with the
sessions is totally an application problem, but until I can figure
those
problems out and solve them I need a way to limit the impact of these
problems to just the one session that is in error. I could set
sendAllSessionsSize="1" but that would take a LONG time to transmit,
and
we have many thousands of sessions at any given time.
That seems like a reasonable problem to try and solve.
Change details:
1. Update
org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
and
org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
to produce a more detailed error message when a session is in
error. New error message includes: the session index in the
list of
sessions, the session ID, the last field or attribute that was
attempted to be read.
I'm not sure how useful the index will be but the other information
makes sense to me.
The index gives me an indication of how many sessions were discarded
because of the error.
2. Introduce new XML attribute verifySerializedSessions for
DeltaManager.
Why would a user not want to enable this feature? The performance hit of
the additional deserialization on send?
That is the only reason I can think of.
3. If verifySerializedSessions="true",
org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
will first serialize each session then immediately deserialize it.
If all is good, send the session as usual. If any errors are
encountered, create and send a dummy session with a known
session ID
instead. (This keeps the session count, which has already been put
in the output stream, correct for the receiving node.)
Ah. Is the issue that serialization works but deserialization does not?
That seems a little odd. Can you give an example of how this might go
wrong? I am trying to understand the root cause(s) of the problem to
determine if the proposed solution is appropriate. I thought
DeltaSession simply skipped over attributes that it could not
deserialize.
DeltaSession does skip attributes that are not serializable. I've had
three identifiable errors, none of which I could reproduce at will.
1. A session with a Vector<Long> that might have contained nulls.
This should not be an issue, but I fixed my code to eliminate nulls in
that Vector, since they should not be there anyway.
2. In some of my own objects where I do my own serialization with
JSON, there were some fields that I don't serialize that were not
marked transient that should have been. Some of those embedded objects
were thus serialized by the native serialization and caused some
problems. I fixed those.
3. In another of my objects that I serialize with JSON, the JSON
string in the serialized session was obviously corrupted and was not a
valid JSON hash. I went over the serialization code with a fine tooth
come and it appears to be correct. That same code works hundreds of
thousands of times a day without error.
Especially in the case of #3, I suspect that there might be a
concurrency issue - a session being modified in one request while it
is being serialized in another.
FYI, bordering on TMI: I just recently switched to DeltaManager from a
custom session sharing solution where I was doing my own persistence
to a database, with no in-memory storage. Concurrency was not an issue
in that setup because each request received an independent copy of the
session content. I could have had concurrency issues all along and not
known it.
4. Update
org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
to discard any received session that has the known dummy session
ID.
This certainly looks like a problem that needs solving. I don't see any
obvious issues with the approach taken but I would like a better
understand of the root causes of the deserialization failures as I am
wondering if there are alternative solutions that are worth considering.
Understood. My goal with this patch is a) limit the negative effects
of a serialization/deserialization error, and b) give more information
about those errors so that the application can be fixed.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org