Hi Konstantin,

On 14.02.2014 17:19, Konstantin Preißer wrote:
> Hi Rainer & Mark,
> 
>> -----Original Message-----
>> From: Rainer Jung [mailto:rainer.j...@kippdata.de]
>> Sent: Friday, February 14, 2014 4:32 PM
>> To: Tomcat Developers List
>> Subject: Re: Special requirements on session id generator
>>
>> On 14.02.2014 14:27, Mark Thomas wrote:
>>> On 14/02/2014 13:15, Rainer Jung wrote:
>>>> I ran into a special requirement for the session id generator
>>>> (uniqueness of session id for a longer time interval). While I think
>>>> that the requirement needed isn't a useful general purpose extension, I
>>>> stumbled over the fact, that the SessionIdGenerator class is not
>> pluggable.
>>>>
>>>> Would it make sense to introduce a new interface for the session Id
>>>> generation, probably including setJvmRoute(), setSessionIdLength() and
>>>> generateSessionId(), letting the current implementation be the base
>>>> implementation, probably switching getRandomBytes() to protected, and
>>>> allowing to set the implementation class in ManagerBase - or the Manager
>>>> interface (trunk)?
>>>>
>>>> I haven't worked it out in detail, but it looks easy to do and useful.
>>>>
>>>> WDYT?
>>>
>>> I'm struggling to understand the use case. Are you saying the current
>>> implementation generates collisions?
>>
>> Yes. But only over time, not concurrently. So some session ID used some
>> time were again generated on later days. The rate was low, about one out
>> of 100.000 IDs were generated again during the next 30 days. I don't see
>> a problem here per se, but the application required the IDs to be unique
>> over an extended period of time.
>>
>>> That would be bad given that
>>> SecureRandom is being used.
>>
>> Not sure, whether the above rate is totally unexpected. It could have
>> used /dev/urandom, so might depend on total system behavior.
> 
> I don't think that this is unexpected, even with a "secure" Random generator. 
> My understanding is that a secure random generator is guaranteed to produce 
> non-predictable output (so it is OK for security purposes like generating 
> Session IDs), but that shouldn't mean anything about the "randomness" of the 
> generated bytes, as e.g. the Javadoc says, most implementations are still 
> pseudo-random, but use a true random seed.
> 
> However, even with a true random generator, one can expect to get collisions 
> over time, even with a 16 byte number (Session-ID). I think this is an 
> application of the birthday problem / birthday paradox:
> If you chose 23 random people, then the likehood that two of them have the 
> same birthday is > 50 % (when not considering the birth year).
> 
> The Wikipedia article also talks about the cast as a collision problem [2]:
> "The birthday problem can be generalized as follows: given n random integers 
> drawn from a discrete uniform distribution with range [1,d], what is the 
> probability p(n;d) that at least two numbers are the same? (d=365 gives the 
> usual birthday problem.)"
> 
> In the case of a 16 byte session id, "d" would be 256^16 and "n" would be the 
> number of generated Session-IDs which one wants to monitor for collisions 
> (e.g. if the application produces 10 million session-IDs over 30 days, one 
> could take 10 million for "n" to get the likehood of a collision of the 
> session ID in 30 days).

Nice article.

The table in the article says for 16 bytes and 500000 session ids the
probability of a duplicate is far below 10^-18. In my case they used 13
bytes, but even with only 8 bytes the table says the probability is
between 10^-6 and 10^-9 so still pretty small. I expect the higher
duplicate occurance on the target system being strongly influenced by
the quality of /dev/urandom. But I must confess that I don't have enough
data about the original situation.

>>> Ignoring the previous question, why can't the requirement be met with a
>>> custom implementation of SecureRandom?
>>
>> Just because I haven't thought of that :) Will check. One thing that
>> comes to my mind is that the session id generator encodes the random
>> bytes. Making IDs more unique for an extended period of time will end up
>> replacing some of the random bytes by other data. That limits the
>> randomness. I was hoping to compensate for that by using an extended
>> setof characters to encode the ID instead of simply hex like it is now.
>> So at the end, I can squeeze the additional info plus enough random
>> bytes in it. AFAIR that changed encoding would mean I have to change the
>> session id generator.
> 
> From my understanding, if you don't watch for collisions, then the number of 
> possible values for a session-ID (consisting of 16 random bytes) would be 
> 256^16 at any time. However, if you would like to prevent collisions, then 
> you would need to memorize each previously generated session-ID, and remove 
> them from the pool of available session-IDs that can be generated. E.g., for 
> the first Session-ID, the amount of possible values is 256^16, for the second 
> it's (256^16-1), for the third it's (256^16-2) and so on, until you include 
> some older session-IDs back into the pool of available session-IDs.

> Can you explain what you mean with "using an extended setof characters to 
> encode the ID instead of simply hex"? AFAIK, a Random Number generator 
> produces random bytes (as the simplest random value is a bit, so it can 
> produce a series of 2^n random bits). Using hex characters is just another 
> way to describe those generated bytes.

We have to encode the id as a cookie or url encoded value. The current
session id generator always does it using hex digits. In my case the
total length of the session id is limited to 26 encoded chars. So if I
have to remove some of them for the new additional data I want to add to
the ID, the remaining chars might be to few for good
randomness/security. If I instead of using hex digits use all chars
(lower plus upper case) and digits, I can encode 60 values in one byte
instead of only 16. Thus more randomness fits into the same encoded id
length.

> A simple way that I would use to prevent collisions between a number of 
> generated session-IDs would be to remember the already generated session-ID, 
> and when a new one is generated that is the same as an already generated 
> session-ID, just create a new one until it is a really new one.
> That does not quite fit into the mathematical way of generating random 
> numbers without "putting it back" (I don't know the correct English term for 
> this), as you would have to reduce the number of possible Session-ID values 
> that will be generated by the RNG, so that instead of 256^16, you can only 
> get 256^16-2000 different session-IDs where the already generated IDs are not 
> included, but I would think this is far more expensive than just 
> re-generating a session-ID if the previous was a collision.

Fortunately I don't have to prevent any long term collisions, just
reduce the rate without increasing session id length. Therefore I prefer
not to keep state for a long time including tomcat restarts, or do
remote (database) calls to check ids whenever a new one is generated.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org

Reply via email to