* Could you highlight a bit more your proposal here? My understanding is
that, despite the Handelsregister ("Commercial Register") being available at a
country level, it's further subdivided into a list of couunty or region - e.g.
the Amtsgericht Herne ("Local Court Herne").
* It sounds like you're still preparing to allow for manual/human input,
and simply consistency checking. Is there a reason to not use an
allowlist-based approach, in which your Registration Agents may only select
from an approved list of County/Region/Locality managed by your Compliance Team?
* That, of course, still allows for human error. Using the excellent
example of the Handelsregister, perhaps you could describe a bit more the flow
a Validation Specialist would go through. Are they expected to examine a faxed
hardcopy? Or do they go to handelsregister.de<http://handelsregister.de> and
look up via the registration code?
* I ask, because it strikes me that this could be an example where a CA
could further improve automation. For example, it's not difficult to imagine
that a locally-developed extension could know the webpages used for validation
of the information, and extract the salient info, when that information is not
easily encoded in a URL. For those not familiar, Handelsregister encodes the
parameters via form POST, a fairly common approach for these company registers,
and thus makes it difficult to store a canonical resource URL for, say, a
server-to-server retrieval. This would help you quickly and systematically
identify the relevant jurisdiction and court, and in a way that doesn't involve
human error.
I did not know that about Handelsregister. So that’s good info. Right now, the
validation staff selects Handelsregister as the source, the system retrieves
the information, the staff then selects the jurisdiction information and enters
the registration information. Germany is locked in as the country of
verification (because Handelsregister is the source), but the staff enters the
locality/state type information as the system doesn’t know which region is
correct.
The idea is that everywhere we can, the process should automatically fill in
jurisdiction information for the validation staff so no typing is required.
This is being done in three parts:
1. Immediate (aka Stop the Hurt): The first step is to put the GeoCode check
in place to ensure that no matter what there will be valid non-mispelled
information in the certificate. There will still be user-typed information
during the phase since this phase is Aug 18 2019. The system will work exactly
as it does now except that the JOI information will run through the GeoCode
system to verify that yes, this information isn’t wrong. If wrong, the system
won’t allow the cert to be approved. At this point, no new issues should
occur, but I won’t be satisfied as its way too manual – and the registration
number is still a manual entry. That needs to change.
2. Intermediate (aka Neuter the Staff): During this process we plan to
eliminate typing of sources. Instead, the sources will be picklists based on
jurisdiction. This means that if you select Germany and the company type is an
LLC, you get a list of available sources. Fool proof-ish. There’s still a
copy/paste or manual entry of the registration number. For those sources that
do provide an API, we can tie into the API, retrieve the documentation, and
populate the information. We want to do that as well, provided it doesn’t
throw off phase 3. Since the intermediate solution is also a stop-gap to the
final solution, we want it to be a substantial improvement but one that doesn’t
impede our final destination.
3. The refactor (aka Documents r Us): This is still very much being specc’ed
but we’re currently thinking we want to evolve the system to a document system.
Right now the system works on checklists. For JOI, you enter the JOI part,
select a document (or two) that you’ll to verify JOI and then transfer
information to the system from the document. The revamp moves it to where you
have the document and specify on the document which parts of the document apply
to the organization. For example, you specify on the document that a number is
a registration number or that a name is an org name, highlighting the info.
With auto-detection of the fields (just based on key words), you end up with a
pretty dang automated system. The validation staff is there to review for
accuracy and highlight things that might be missed. Hence, no typing or
specifying any information. It’s all directly from the source.
Naming conventions also not approved yet. Since the engineers watch this forum,
they’ll probably throw things at me when they see the code names.
* I'm curious how well that approach generalizes, and/or what challenges
may exist. I totally understand that for registries which solely use hard
copies, this is a far more difficult task than it needs to be, and thus an
element of human review. However, depending on how prevalent the hardcopy vs
online copy is, we might be able to pursue automation for more, and thus
increase the stringency for the exceptions that do involve physical copies.
Right now we get the hard copies and turn them into a PDF to store in the audit
system for review during internal and external audits. During validation, all
documentation must be present and reviewed. Using OCR better, we can always at
least copy and paste information instead of typing it.
The more interesting part (in my opinion) is how to find and address these
certs. Right now, every time we have an issue or whenever a guideline changes
we write a lot of code, pull a lot of certs, and spend a lot of time reviewing.
Instead of doing this every time, we're going to develop a tool that will run
automatically every time we change a validation rule to find everything else
that will fail the new update rules. IN essence, building unit tests on the
data. What I like about this approach is it ends up building a system that lets
us see how all the rule changes interplay since sometimes they may intercept in
weird ways. It'll also let us easier measure impact of changes on the system.
Anyway, I like the idea. Thought I'd share it here to get feedback and
suggestions for improvement. Still in spec phase, but I can share more info as
it gets developed.
* This sounds like a great idea, and would love to know more details here.
For example, what's the process now for identifying these
jurisdictionOfIncorporation issues? How would it improve or change with this
system?
The process right now is we right a script based on things we can think of that
might be wrong (abbreviated states, the word “some” in the state field, etc).
We usually pull a sampling of a couple thousand certs and review those to see
if we can find anything wrong that can help identify other patterns. We’re in
the middle of doing that for the JOI issues. What would be WAY better is if we
had rule sets for validation information (similar to cablint) that checked
validation information and how it is stored in our system and made these rule
sets run on the complete data every time we change something in validation.
Right now, we build quick and dirty checks that run one time when we have an
incident. That’s not great as it’s a lot of stuff we can’t reuse. What we
should do is build something (that crossing my fingers we can open source and
share) that will be a library of checks on validation information. Sure, it’ll
take a lot of configuration to work with how other CAs store data, but one
thing we’ve seen problems with is that changes in one system lead to
un-expected potential non-compliances in others. Having something that works
cross-functionally throughout the system helps.
A better example in some-state. We scanned for values not listed as states and
cities that have “some”, “any”, “none”, etc. That only finds a limited set of
the problem, and obviously missed the JOI information (not part of the same
data set. Going forward, I want a rule set that says, is this a state? If so,
then check this source to see if it’s a real state. Then check this to see if
it also exists in the country specified. Then check to see if the locality
specified exists in the state. Then see if there is a red flag from a map that
says the org doesn’t exist. (The map check is coming – not there yet….) Instead
of finding small one off problems people report, find them on a global scale
with a rule we run every time something in the CAB forum, Mozilla policy, or
our own system changes.
* You describe it as "validation rule" changes - and I'm not sure if you're
talking about the BRs (i.e. "we validated this org at time X") or something
else. I'm not sure whether you're adding additional data, or formalizing checks
on existing data. More details here could definitely help try and generalize
it, and might be able to formalize it as a best practice. Alternatively, even
if we can't formalize it as a requirement, it may be able to use as the basis
when evaluating potential impact or cost of changes (to policy or the BRs) in
the future. That is, "any CA that has implemented (system you describe) should
be able to provide quantifiable data about the impact of (proposed change X).
If CAs cannot do so (because they did not implement the change), their feedback
and concerns will not be considered."
Validation rule meaning our own system, the CAB forum, mozilla policy.
Basically, anything that could call into question the integrity of some data
piece within our system. The point is to catch all changes that may happen
proactively, not just when someone pings me with a problem. The requirement I
think we’re trying to meet is “never have the same problem again, even if a
rule changes” because the system will take that one problem, log it as a unit
test, and run that unit test ever time we change the internal rule set to
detect all data that violates that rule as modified. Illustrative example:
Assume we decide we want all states abbreviated. Note this would contradict
the rule in the EV guidelines that requires JOI states to be written out. Right
now, this contradiction could pass undetected by a lot of CA systems I think.
However, if you have a rule set that can be enforced globally across the entire
data set, you end up instantly detecting that no valid EV cert could ever
issue. Danger! Anyway, the value of this is pretty huge internally IMO. And for
compliance, it’ll make our job easier. No more 3% audits trying to catch
mistakes.
_______________________________________________
dev-security-policy mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-security-policy