Thanks a lot for the detailed explanation Robert - much appreciated.
Please see my comments in line, some stuff deleted for brevity.
On 01/12/2011 11:38 PM, Robert Relyea wrote:
--- snip ---
331569088[1bd1610]: C_UnwrapKey 331569088[1bd1610]: hSession =
0x6 331569088[1bd1610]: pMechanism = 0x7fffcd592ea0
331569088[1bd1610]: hUnwrappingKey = 0x8 331569088[1bd1610]:
pWrappedKey = 0x36ac9bc 331569088[1bd1610]: ulWrappedKeyLen = 48
331569088[1bd1610]: pTemplate = 0x7fffcd592cf0
331569088[1bd1610]: ulAttributeCount = 6 331569088[1bd1610]:
phKey = 0x36bfd58 331569088[1bd1610]: CKA_SIGN = CK_TRUE [1]
331569088[1bd1610]: CKA_VERIFY = CK_TRUE [1]
331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
331569088[1bd1610]: CKA_KEY_TYPE = 0x10 [8] 331569088[1bd1610]:
CKA_DERIVE = CK_TRUE [1] 331569088[1bd1610]: CKA_VALUE_LEN =
3000000000000000 [8] 331569088[1bd1610]: mechanism =
CKM_DES3_ECB 331569088[1bd1610]: *phKey = 0x54
The extra data here is dumping out interesting values inside the
parameters passed to C_UnwrapKey. The list of CKA_XXXX = XXXXX are
dumping out the template sent in pTemplate. CKA_VALUE_LEN looks wrong
to me, I don't know if tehre is some byteswap issue here.
I've checked nearly every C_UnwrapKey function trace in the log ... each
of them is showing 'CKA_VALUE_LEN = 3000000000000000 [8]'.
Every C_DeriveKey function trace shows 'CKA_VALUE_LEN = 1000000000000000
[8]'
mechansim CKM_DES3_ECB means you are using DES to do unwrap, so this
appears to be a restart operation, rather than a full handshake.
Looking at the SSLTrace file this seems to be the case. I don't know if
I can correlate the SSLTRACE file with the PKCS#11 log file, but it
seems the issue only appears in restart operations. Full handshakes
don't show the issue.
from SSLTRACE log
=================
10891: SSL[107778448]: sending client-hello
SSL: Lookup1: sid=0x36a9080
10891: SSL[107778448]: client, found session-id: [Len: 16]
20 4d 2d a6 3a 3c 67 1a 60 c8 e5 18 0d ea d4 e2 M-.:<g.`.......
10891: SSL3[107778448]: send client_hello handshake
10891: SSL3[107778448]: start handshake hashes
10891: SSL[107778448]: SecureSend: returning -1 count, error -8023
10891: SSL[107778448]: closing, rv=0 errno=-8023
10891: SSL[107778448]: sending client-hello
SSL: Lookup1: sid=0x36a9080
10891: SSL[107778448]: client, found session-id: [Len: 16]
20 4d 2d a6 3a 3c 67 1a 60 c8 e5 18 0d ea d4 e2 M-.:<g.`.......
10891: SSL3[107778448]: send client_hello handshake
10891: SSL3[107778448]: start handshake hashes
10891: SSL[107778448]: SecureSend: returning -1 count, error -8023
10891: SSL[107778448]: closing, rv=0 errno=-5966
This looks also interesting: PR_NO_ACCESS_RIGHTS_ERROR ??
It might be helpfull if SSLTRACE and PKCS#11 could log a timestamp to
help in correlation. Or is there already a way to correlate it?
331569088[1bd1610]: rv = CKR_OK 331569088[1bd1610]: C_DeriveKey
331569088[1bd1610]: slotID = 0x1 331569088[1bd1610]: flags =
0x4 331569088[1bd1610]: pApplication = 0331569088
331569088[1bd1610]: Notify = 0x13231f333 331569088[1bd1610]:
phSession = 0x7fffc 331569088[1bd1610]: phKey = 0x36c1618
331569088[1bd1610]: CKA_CLASS = CKO_SECRET_KEY [8]
331569088[1bd1610]: CKA_KEY_TYPE = 0x12 [8] 331569088[1bd1610]:
*phSession = 0x132317f8 331569088[1bd1610]: rv =
CKR_DEVICE_ERROR 331569088[1bd1610]: mechanism =
CKM_TLS_KEY_AND_MAC_DERIVE
Here you are deriving the SSL session keys. A session handle of
0x132317f8 is pretty large, but not unexpected in a server that has
been running a while.
Well the server is only running 22 minutes until the issue arises ...
might this be 'a while'?
During this time only 28 'send client_hello handshake' messages appear
in the SSLTRACE log. 19 of them show 'closing, rv=0 errno=0', 1
'closing, rv=0 errno=-5966', the others 'closing, rv=0 errno=-8023'.
So no much connection at all.
CKR_DEVICE_ERROR can happen in a CKM_TLS_KEY_AND_MAC_DERIVE in a
couple of pathological cases:
1) if the underlying object structure is hosed. 2) if the key object
looks bogus (a token Object that isn't known or stored, for
instance).
This object should be a token object. Unfortunately it looks like
the logging program did not out put hKey (or it did, but we ran into
some overwrite do to threading... the missing new lines after the
pApplication, Nofiy, and phSession) seems to indicate that. (In fact
Notify should be pNotify, and that looks like part of a
C_OpenSession() call, not a C_DeriveKey()... which makes some of the
conclusions we are trying to draw questionable.
331569088[1bd1610]: C_DigestInit 331569088[1bd1610]: hSession =
0x1 331569088[1bd1610]: pMechanism = 0x7fffcd593290
331569088[1bd1610]: mechanism = CKM_MD5 331569088[1bd1610]:
rv = CKR_DEVICE_ERROR
I can see no place where NSC_Digest can even return CKR_DEVICE_ERROR
So I can not trust the log?
331569088[1bd1610]: C_OpenSession 331569088[1bd1610]: slotID =
0x1 331569088[1bd1610]: flags = 0x4 331569088[1bd1610]:
pApplication = 0x4bdff70 331569088[1bd1610]: Notify = 0x13231f30
331569088[1bd1610]: phSession = 0x7fffcd593228
331569088[1bd1610]: *phSession = 0x132317f8 331569088[1bd1610]:
rv = CKR_DEVICE_ERROR
This looks like a normal C_OpenSession call...).
OS is CentOS 5.5 64bit.
Arg, I just remembered. CentOS 5.5 will likely be following RHEL 5.5
in having an NSS 3.11 softoken.
This is only the OS ... the library is not using NSS/NSPR from the OS.
It is using a bundled one.
strings libsoftokn3.so | grep -i header
$Header: NSS 3.12.4.5 (debug) Jan 10 2011 17:14:10
I've just checked and NSC_DigestInit
still cannot return CKR_DEVICE_ERROR in NSS 3.11 -- unless you are
running in FIPS mode. This is possible, but seem extremely unlikely
since you are using slotid=1 (FIPS slot ID is 3). In FIPS mode, any
call can fail with CKR_DEVICE_ERROR if the token is in 'fatalError'
mode. That happens if any of the following fail: 1) the Random number
generator. 2) the Seeding the random number generator. 3) generating
a key pair fails with CKR_GENERAL_ERROR 4) power on self check fails
at init time. It can be reset by reinitializing the token. If this is
what is happening, you should be getting NSS messages in your syslog.
I don't think FIPS is your issue here looking at the logs, however.
I don't think it's running in FIPS mode .. this has to be configured
AFAIK and noone did it.
What could be a next step to narrow it down? Give 3.12.9 a try?
TIA,
Bernhard
--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto