Hi Rainer & Mark,

> -----Original Message-----
> From: Rainer Jung [mailto:rainer.j...@kippdata.de]
> Sent: Friday, February 14, 2014 4:32 PM
> To: Tomcat Developers List
> Subject: Re: Special requirements on session id generator
> 
> On 14.02.2014 14:27, Mark Thomas wrote:
> > On 14/02/2014 13:15, Rainer Jung wrote:
> >> I ran into a special requirement for the session id generator
> >> (uniqueness of session id for a longer time interval). While I think
> >> that the requirement needed isn't a useful general purpose extension, I
> >> stumbled over the fact, that the SessionIdGenerator class is not
> pluggable.
> >>
> >> Would it make sense to introduce a new interface for the session Id
> >> generation, probably including setJvmRoute(), setSessionIdLength() and
> >> generateSessionId(), letting the current implementation be the base
> >> implementation, probably switching getRandomBytes() to protected, and
> >> allowing to set the implementation class in ManagerBase - or the Manager
> >> interface (trunk)?
> >>
> >> I haven't worked it out in detail, but it looks easy to do and useful.
> >>
> >> WDYT?
> >
> > I'm struggling to understand the use case. Are you saying the current
> > implementation generates collisions?
> 
> Yes. But only over time, not concurrently. So some session ID used some
> time were again generated on later days. The rate was low, about one out
> of 100.000 IDs were generated again during the next 30 days. I don't see
> a problem here per se, but the application required the IDs to be unique
> over an extended period of time.
> 
> > That would be bad given that
> > SecureRandom is being used.
> 
> Not sure, whether the above rate is totally unexpected. It could have
> used /dev/urandom, so might depend on total system behavior.

I don't think that this is unexpected, even with a "secure" Random generator. 
My understanding is that a secure random generator is guaranteed to produce 
non-predictable output (so it is OK for security purposes like generating 
Session IDs), but that shouldn't mean anything about the "randomness" of the 
generated bytes, as e.g. the Javadoc says, most implementations are still 
pseudo-random, but use a true random seed.

However, even with a true random generator, one can expect to get collisions 
over time, even with a 16 byte number (Session-ID). I think this is an 
application of the birthday problem / birthday paradox:
If you chose 23 random people, then the likehood that two of them have the same 
birthday is > 50 % (when not considering the birth year).

The Wikipedia article also talks about the cast as a collision problem [2]:
"The birthday problem can be generalized as follows: given n random integers 
drawn from a discrete uniform distribution with range [1,d], what is the 
probability p(n;d) that at least two numbers are the same? (d=365 gives the 
usual birthday problem.)"

In the case of a 16 byte session id, "d" would be 256^16 and "n" would be the 
number of generated Session-IDs which one wants to monitor for collisions (e.g. 
if the application produces 10 million session-IDs over 30 days, one could take 
10 million for "n" to get the likehood of a collision of the session ID in 30 
days).


> > Ignoring the previous question, why can't the requirement be met with a
> > custom implementation of SecureRandom?
> 
> Just because I haven't thought of that :) Will check. One thing that
> comes to my mind is that the session id generator encodes the random
> bytes. Making IDs more unique for an extended period of time will end up
> replacing some of the random bytes by other data. That limits the
> randomness. I was hoping to compensate for that by using an extended
> setof characters to encode the ID instead of simply hex like it is now.
> So at the end, I can squeeze the additional info plus enough random
> bytes in it. AFAIR that changed encoding would mean I have to change the
> session id generator.

>From my understanding, if you don't watch for collisions, then the number of 
>possible values for a session-ID (consisting of 16 random bytes) would be 
>256^16 at any time. However, if you would like to prevent collisions, then you 
>would need to memorize each previously generated session-ID, and remove them 
>from the pool of available session-IDs that can be generated. E.g., for the 
>first Session-ID, the amount of possible values is 256^16, for the second it's 
>(256^16-1), for the third it's (256^16-2) and so on, until you include some 
>older session-IDs back into the pool of available session-IDs.

Can you explain what you mean with "using an extended setof characters to 
encode the ID instead of simply hex"? AFAIK, a Random Number generator produces 
random bytes (as the simplest random value is a bit, so it can produce a series 
of 2^n random bits). Using hex characters is just another way to describe those 
generated bytes.

A simple way that I would use to prevent collisions between a number of 
generated session-IDs would be to remember the already generated session-ID, 
and when a new one is generated that is the same as an already generated 
session-ID, just create a new one until it is a really new one.
That does not quite fit into the mathematical way of generating random numbers 
without "putting it back" (I don't know the correct English term for this), as 
you would have to reduce the number of possible Session-ID values that will be 
generated by the RNG, so that instead of 256^16, you can only get 256^16-2000 
different session-IDs where the already generated IDs are not included, but I 
would think this is far more expensive than just re-generating a session-ID if 
the previous was a collision.


Regards,
Konstantin Preißer

> 
> > If the requirement is to be certain that a duplicate session ID is not
> > generated I'd use a custom Manager and check the returned ID against
> > those currently in use request a new one in the highly unlikely event of
> > a duplicate being returned.
> 
> That's already done by our StandardManager. It is not about the ones
> currently in use but about ones that had been used during the last say
> 30 days. There's a database I could check against, but since very few
> "duplicates" seem acceptable I prefer to reduce the number of duplicates
> by adding time information to the custom session id shrinking the window
> of possible duplicates.
> 
> Regards,
> 
> Rainer
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org


[1] http://en.wikipedia.org/wiki/Birthday_problem
[2] http://en.wikipedia.org/wiki/Birthday_problem#Cast_as_a_collision_problem


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to