Re: Questions regarding the qualifications and competency of TUVIT

Jakob Bohm via dev-security-policy Wed, 14 Nov 2018 16:27:56 -0800

Once again, you snipped most of what I wrote.

Also not sure why your post has double reply marking.

On 13/11/2018 18:20, Ryan Sleevi wrote:
>>
>>
>>
>> On Tue, Nov 13, 2018 at 11:26 AM Jakob Bohm via dev-security-policy <
>> [email protected]> wrote:
>>
>>> Furthermore the start of the thread was off-list.  Also neither I, nor
>>> some other participants have access to the audit reports etc. in CCADB.
>>>
>>
>> Sure you do. That information is publicly available through
>> https://wiki.mozilla.org/CA/Included_Certificates
>>

I am quite surprised that a supposed report of included certificates 
hides access to Audit report links.  The CCADB entrypoints I had 
previously looked at did not provide those deep details.  But thanks 
for the link.

Although at least for the first T-Systems CA in the table, the PDF was 
just a summary attestation, apparently covering the period from before 
Bug#1391074 was reported until a time overlapping the time when Qc-
Statements were misencoded (Though not necessarily under that root, 
once again the lack of specifics makes it hard to check things).

One thing not provided by the public report is the history of T-Systems 
problem reporting contact point, and thus if it included (at that time) 
the specific contact points used in issue U2.

>>
>>> This basic combination of noise and missing data is why I asked for a
>>> one-stop overview of your complaints against TUVIT, similar to the lists
>>> compiled for previous situations with multiple complaints against one
>>> party.
>>>
>>
>> Those are the output of these discussions, not the input or structure to
>> them. There are certainly broader complaints, but if you'll note, my focus
>> has been on attempting to satisfactorily resolve the current set of issues.
>> Several times you've attempted to move it to the meta discussion, while
>> I've tried to again focus on the specific lack of resolution for those
>> initially identified issues. The reference to the other issues is precisely
>> because the explanation and resolution of *these* issues can inform or be
>> compared with the *past* issues, which would be used to build the list
>> seemingly so desired.
>>

However you seem to consistently refuse to clearly state, in one place, 
what those current issues are.

>>
>>> "Misconfiguration and misapplication of the relevant rules..." is so
>>> broad as to describe the majority of CA failures without giving any
>>> useful specifics to assess the situation.  It's like saying someone's
>>> crime was to "violate and break the relevant laws" (which would apply to
>>> anything from jaywalking to mass murder).
>>>
>>
>> While sympathetic to your frustration, I think that's a rather extreme
>> interpretation. For example, CAs seem to believe that the majority of their
>> failures are "human error" and that human error is corrected by "additional
>> training". Perhaps you would like to propose a better wording to
>> distinguish between the "Guaranteed to produce the wrong result, 100% of
>> the time" configuration issues, in which a certificate profile is
>> functionally unable to meet the stated configuration, and those which are
>> tied to, for example, data validation issues (or lack thereof). My intent
>> was to capture the former, while acknowledging that the latter is something
>> that is primarily accounted for through design review, sampling, and
>> testing.
>>

This was in direct reply to where you used that meaningless phrase.  I 
was simply telling you it was meaningless.  Congrats on once again 
removing context.

>>
>>> It would also be useful to quantify the word substantial: Of all the
>>> certificates issued by the audited CA organization, how large a
>>> percentage suffered from each flaw, how many from none.  This is a key
>>> number when assessing if statistical sampling by the auditor should have
>>> caught an issue.  It is also a key number when assessing the level of
>>> incompetence of the CA (but the CA is not the subject of this thread).
>>>
>>
>> I already responded to this previously, and again in my more recent
>> messages. In the issue that started this thread, we can see it's 100%. In
>> the past issuance examples, we can see that it was 100% of certificates
>> going through certain systems. While that is less than 100% of total
>> volume, sampling methodology also must consider variances and other
>> factors. For example, if a CA issues DV, OV, and EV, a sampling methodology
>> would approach each profile distinctly for sample selection, rather than
>> overall issuance. A sampling method for a CA may involve 100+ such samples
>> (each representing a percentage), based on the design review that
>> identifies variations and permutations relevant to the service provide.
>> Similarly, the selection of 3% is relevant to CA self-audits primarily.
>>

The issue in this thread is TUVIT competence and trustworthiness, which 
cannot possibly be judged solely on ONE small CA incident (the U1 qc-
statement misencoding incident).  I must therefore operate on the basis 
that this applies to TUVIT's auditing of the totality of the T-Systems 
failures mentioned so far.

So far, only issue U1 has been claimed to affect 100% of a certificate 
category for an issuing CA, unless you create so many categories as to 
cause a combinatorial explosion in the presumed audit work.

As for the 3% number, I thought it also applied to external audits, but 
this is less important until there is a specific accusation of TUVIT 
using a too low percentage for a specific audit activity.

For U1, the audit issue was not the sample percentage, but that TUVIT 
did not check the qc-statement closely enough in the samples collected 
until the bug was reported.

>> This is where the initial request for the discussion about methodology - a
>> discussion about how a CAB can miss 100% of certificates being misissued -
>> is relevant. And, as of yet, unaddressed.
>>

There were a few responses to this by TUVIT earlier in the thread, but 
you dismissed their explanations (as you do again below).

>>> Issue U1 (Qc-statement misencoding) apparently affected all certificates
>>> from one issuing CA, and should thus have been caught by sampling by the
>>> auditor.  The auditor has (according to earlier posts) admitted that the
>>> bug was present in the sampled certificates from that issuing CA, but
>>> that this was overlooked because that particular extension was not one
>>> they had specific experience looking at.  Once the problem was pointed
>>> out the auditor looked at the previously collected evidence and
>>> confirmed the problem by checking that detail from first principles
>>> (similar to software developer hand-executing a function with pen and
>>> paper to confirm a bug).
>>>
>>
>> I don't believe that is a correct summary. The auditor reported things
>> were correct - i.e. no bug - and only after pushing further to state very
>> clearly that there was a bug did the auditor confirm that, oh yes, there
>> was a bug, we just overlooked it. Now, I can understand that the favorable
>> reading for the auditor was simply that they were busy and on the road and
>> favoring expediency over correctness - but we've seen CAs using this same
>> reasoning for years. Multiple times now, CAs have faced serious
>> misissuances, confidently and repeatedly stated they've identified all of
>> them, and then be presented with an example certificate not identified by
>> the CA that demonstrates the exact problem. Do you disagree that auditors
>> should be aware of the perception of such responses and the harm to trust?
>> Do you believe auditors should be held to a different standard?
>>

While there are vaguely similar cases of incorrect denials by CAs, the 
message in which a TUVIT auditor said the wrong thing about the U1 
incident is in the part of the thread that is not public and thus we as 
a community cannot see if the auditor made it clear that this was a 
preliminary answer to be completed later.  Nor can we see if the second 
look once back in the office required additional prompting or was self-
initiated as soon as practical.

>>
>>> S10. Then there is the issue if TUVIT was right or wrong in accepting a
>>>    slow phase out of the certificates affected by U1.  This involves both
>>>    the principal issue if there should be zero tolerance for incorrect
>>>    certificates, the practical issue of how much harm this specific
>>>    standards validation can cause, and how much time should be allowed
>>>    for an orderly replacement process.  Multiple months seems a quite
>>>    long time.  1 day quite a short time.
>>>
>>
>> I've snipped a majority of your statements, mostly because I don't find
>> them correct or helpful framing. To the extent I'm signalling specific
>> things, it's because they are particularly egregious. I think Wayne has
>> already responded in a way that conflicts with your statement of S10; you
>> pose several extremes or absolutes, but that's not the issue. If we take a
>> step back, the issue here is that TUVIT has taken the view that the ETSI
>> specification supersedes the requirements of the Baseline Requirements;
>> where the BRs are quite prescriptive in their requirements (24 hours - 5
>> days), TUVIT has taken a position that any period not exceeding three
>> months is acceptable. This, combined with the lack of reporting - which is
>> not supported, in practice, by other CABs - creates a situation where, for
>> audits conducted by TUVIT, there is zero community assurance that the
>> provisions of 4.9.1.1 have been followed.
>>

Your massive snipping seems more about creating false narratives and 
meaningless responses.

The one paragraph above is my only specific statement of S10 and 
contains no extremes or absolutes, just lots of questions of balance and 
opinion.  It is a statement that there is a disagreement between you as a 
Mozilla representative and a TUVIT representative as to that issue of 
interpretation.  It is also an assignment of a number to that issue so it 
can be referenced separately from other related issues, like T2, T3, T7, 
T8 and S9 .

In terms of the BRs, issue S10 is a disagreement with TUVIT if a 
misencoding that has (concretely) no severe technical effects should 
fall under BR 4.9.1.1 #9 (24 hour revocation for BR violation) or if 
this requirement should be softened in light of BR 4.9.5 #1 (considering 
the nature of the problem when determining the time to process a 
revocation request) and the fact that BR 4.9.4 (Revocation Grace Period) 
is a "no stipulation".  Plus a disagreement of how long the grace period, 
if any, should be in the specific case of issue U1.

That TUVIT is citing ETSI equivalents to those BRs should be of much 
less importance.

Below you are quoting me far out of context again:

>>
>>> It is of cause the purpose of any audit scheme to check for the absence
>>> of irregularities, and to report if any were found.
>>
>>
>> Except that's not the point of the ETSI audit, which is at least why some
>> discussion of the scheme is relevant. The "report if any were found" is
>> functionally absent. The 'reporting' is done based on whether or not the
>> certificate was issued, and the certificate is issued provided that any
>> irregularities were resolved, to the auditor's satisfaction, within 90
>> days. This is something 'unique' to ETSI audits, and not part of the
>> underlying ISO/IEC 17065.
>>
>> Other ETSI-based auditors have recognized this gap, and thus ensured
>> they've reported on irregularities. That shows how they can address the gap
>> from the bare minimum required of ETSI and instead meet the expectations of
>> the browsers consuming the reports.
>>

Ok, that is a new issue, lets number it as follows:

S11. Disagreement with TUVIT about how much the public audit summary 
   documents should say about detected incidents rather than blankly 
   stating (directly or indirectly) that all such were satisfactorily 
   resolved in less than 90 days.  Minimum ETSI requirements may be too 
   weak in this area.

>>
>>> But it is quite
>>> rare for the audit to essentially redo every piece of administrative
>>> work done by the audited company.
>>>
>>
>> It's unclear your intended remark. During the sampling process, it is
>> indeed a cross-check of all the administrative work - ensuring that
>> sufficient evidence of all of the controls that exist to ensure the correct
>> functioning of that administrative work were followed, with detailed
>> analysis.

My intention is that a cross-check is not a 100% check.  Some of your 
past arguments seem to presume that anything slipping past the auditor 
means the audit wasn't proper.

>>
>>> The debate in bug #1391074 about the template used for ECDSA
>>> certificates is a good example.  According to the bug, the ECDSA
>>> certificate profile/template was correct, but some piece of software
>>> mishandled approved ECDSA certificate requests and used the RSA
>>> certificate profile/template, for at least some of the issued
>>> certificates.  An incorrect ECDSA profile/template saying to set the
>>> KeyEncryption bit should have been spotted by a configuration audit and
>>> review (by TUVIT).  But code bugs are notoriously harder to spot.
>>>
>>
>> I've got no idea where you got that summary from, but that's certainly not
>> consistent with 3.3 of
>> https://bug1391074.bmoattachments.org/attachment.cgi?id=8915934
>>

I was basing my extraction on what was on the Bugzilla page itself, I 
don't have the capacity to do infinite recursion into every link and 
attachment from your lack of providing your own list of issues.

>> An EC profile/template was not configured. All certificates, regardless of
>> key type, were configured to use the same profile.
>>
>> It is expected, of all CAs, that profiles are distinct per key type, least
>> of all due to the necessity to ensure both the input keys are appropriate
>> (e.g. correct strength) and the output certificate is correct from RFC 5280.
>>

The text on the Bugzilla page itself says that the issue wasn't a missing
template, but that a bug caused the PKI system to not use the right template 
for at least 3 ECDSA certificates.  This was in Comment #23 and more detailed 
in Comment #28 (as issue C) at 
  https://bugzilla.mozilla.org/show_bug.cgi?id=1391074#c23
  https://bugzilla.mozilla.org/show_bug.cgi?id=1391074#c28
Both of those bug comments were after the report in 8911534 and were 
specifically clarifying that report.

I have not found a statement as to what the bug was or if it affected 
all ECDSA certificates in a logical part of the system.  There are 
multiple possible guesses.

>>
>>>> In the context of ETSI, each of these configuration changes -
>>> particularly
>>>> once qualified - undergoes some review; whether after the fact
>>>> (pre-qualification) or prior to such change.
>>>
>>> This is why it is interesting to look at each issue to determine if it
>>> was subject to such review by or notification to TUVIT.
>>>
>>
>> I'm not sure what you're saying. The model of certification requires this.
>> Is your framing of the question whether or not T-Systems notified TUVIT of
>> these issues? If they didn't, they were contractually negligent under the
>> ETSI model, and TUVIT should, as part of their explanation, indicate how
>> they addressed those failures. I'm taking Occam's Razor here, and presuming
>> that TUVIT was notified, as they were required to be, and that the failure
>> rests with TUVIT.
>>

As I said elsewhere in previous messages, I find it unlikely that the ETSI 
scheme requires auditor involvement for every possible Administrative action, 
thus each technical incident at T-Systems (the ones with U numbers) need to 
be individually classified to determine what, if any, related actions should 
have been notified to TUVIT and if so, why this did not result in audit 
actions by TUVIT.

For example, which of the actions or inactions that lead to issue U5 (double 
dots in DNS names) involved a required review by TUVIT?  The answer affects 
how much blame for U5 can be laid on TUVIT, but does not affect if TUVIT can 
be blamed for its post hoc auditing of the U5 incident.

Same questions for each U number.

>>
>>>> Similarly, misissuance
>>>> involves a degree of notification to the CAB.
>>>
>>> Only once known (e.g. around the time of the bug reports).  Because I
>>> don't think you expect a CA to notify the auditor that it is about to
>>> misissue, and then proceed to actually misissue instead of stopping
>>> itself.
>>>
>>
>> Yes, for all configuration changes post-misissuance, the CAB would have
>> been notified. One would thus reasonably expect that, as these
>> notifications increase, the CAB would take a more careful look at both
>> patterns of problems and specific root causes to ensure that future issues
>> are premptively identified, rather than retroactively remediated.

Thanks for clarifying your past statement, which was a bit confusing.

>> This is
>> why 100% misissuance is particularly concerning *given* past issues.
>>
>> Yet the substance of the discussion - the "current" issue if you will -
>> can be discussed without the lengthy debate and dissection being had in
>> your message here. That's because TUVIT can respond with regards to their
>> methodology and approach used. If that methodology *didn't* consider the
>> past incidents, then we can go through that retroactive dissection. If they
>> did, however, then we can allow TUVIT to respond as to what they did and
>> how it impacted.

Unfortunately, most of the the thread has been meaningless grand standing 
rather than specific questions about methodology and approach.  It is small 
wonder that the responses have been of the fire extinguisher and asbestos 
suits kind.

>>
>> As such, I can see limited value in the present conversation for
>> continuing this belabored dissection - we should instead focus on the
>> methodology and approach used for the most recent issue, and based on that,
>> analyze retroactively, to evaluate whether or not adequate assurance is
>> being provided.
>>

You often see limited value in any conversation that disagrees with you.

>>
>>>
>>>> As such, it is entirely
>>>> reasonable to expect a degree of supervision, as that is the value of
>>> the
>>>> certification scheme. All of this information would have been available
>>> at
>>>> the time of configuring qualified certificates, including the pattern of
>>>> issues existing when configuring profiles and templates.
>>>
>>> Should have been available doesn't mean it was available.  There is
>>> always a limit to the depth of audits, and thus we need to assess if
>>> TUVIT was being sloppy, complacent, complicit or just unlucky.
>>>
>>
>> As captured above, I disagree with that, not on principle, but in
>> execution. We need more information from TUVIT, rather than attempting to
>> divine through first principles as this thread tries to do. If no further
>> information is provided, then we don't need to bother with that assessment
>> at all - an auditor who is unwilling or unable to provide reasonable
>> information is an auditor that should not be trusted.
>>

I am not suggesting we divine from first principles.  I suggest we actually 
find out, rather than presume the worst possible situation and then proceed 
to demand punishment according to such presumed violations.

>>
>>> It is relevant to ask, but it takes a considerable level of certainty
>>> before starting formal proceedings to disqualify an auditor due to the
>>> failings of a single audit subject.
>>>
>>> In comparison, E&Y was involved in auditing multiple bad CAs and RAs by
>>> the time some E&Y branches where disqualified by Mozilla.
>>
>>
>>> In the world of technical review and testing, TUV SUD is a major player,
>>> reviewing the safety of many technical products far outside Germany,
>>> they rank in the same league as UL in the US.
>>>
>>
>> This gets so close to understanding the issue, but then radically misses
>> the point. Multiple E&Y branches were disqualified on the basis of a single
>> audit, but E&Y is not blanket disqualified.

I seem to recall that E&Y Hong Kong was involved with both the WoSign audit 
and the audit of the Korean RA that misissued under a Symantec CA.

>>
>> TUV IT is a specific entity, the equivalent to an E&Y "branch". Mentioning
>> TUV SUD in the context of TUV IT is akin to mentioning KPMG in a discussion
>> about E&Y. TUV IT is part of the TUV NORD group, which is itself distinct
>> from TUV SUD.
>>

Oh, I thought it was TUV SUD.

But still, TUV IT seems to be the name for all of that TUV half's PKI 
related audits, across all locations, similar to E&Y's global WebTrust 
activities (as opposed to the WebTrust activities of E&Y's Hong Kong 
branch office).

This Mozilla group has not had any reason (or purpose) to distrust E&Y 
Hong Kong's auditing of financials etc., even though that is presumable 
the same "entity" as the one doing bad WebTrust audits.

The difference between horizontal and vertical org charts should not 
be the main dispositive issue in deciding if Mozilla should distrust 
a global company or a local branch.

Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded 
_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy

Re: Questions regarding the qualifications and competency of TUVIT

Reply via email to