[
https://issues.apache.org/jira/browse/ATLAS-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002261#comment-16002261
]
David Radley commented on ATLAS-1690:
-------------------------------------
Hi [~madhan.neethiraj]
Great feedback - thanks for your thoughtful and open response.
I will change the array to and endpoint1 and 2. I think that is clearer.
I am keen that we propagate tags, this is very powerful.
I thought I would explain how we could do classifications and then see how this
option fits in.
The classifications our customers are working with include confidentiality. The
confidentiality scheme might have C1,C2, C3 and C4 levels. C1 might be public
and C3 top secret. Different companies name these Cn levels differently. But in
all these cases there is an order, C4 being the highest classification level.
Though it is possible to have more complex classifications schemes- many / most
cases can work with this list sort of classification schema. A particular
glossary term or column might be associated with one of these classifications.
We were thinking that the Classification levels (C1, C2 etc) would be new
system Types (I suggest an EntityDef) A classification use is the relationship
between the classification level and the thing it is classifying. By default
the classification will be that from the level , but a rule can run and
increase the classification; this would be calculated (and stored?) in the
relationship instance.
So to address your proposal:
- I think the propagated classifications would be derived at query time and
could be useful -do we need an effective classification? I am not convinced
with the proposed mechanism.
- Your example around tables and columns and PII assumes that PII is a binary
flag (or one tag), I am suggesting that this is not the way that
classifications are normally implemented - these should be an ordered list of
levels. I see in some of your recent demos you use v1 terms to implement these
classification levels for this classification ordering. If a table is
classified as public and has a PII column , we would not want the public
classification to override the PII column. As a query brings together 2 public
fields, line name and salary, the combination becomes PII, in this case we need
the rule to drive this.
This implementation would encourage the use of bidirectional relationships to
be implemented purely to propagate tags. I suggest many propagations would not
be on one relationship, but could flow much further would be to all has-a terms
- following all the has-a links.
I am also concerned that the role who authors the relationship is not the right
role to make classification propagation decisions.
I wonder whether a smarter approach would be to tag the relationship as
"propagate-1-to-2" (hopefully something more meaningful like
propogate-table-to-column") and Ranger picks up this hint. Ranger could decide
to run a simple rule of propagating all the tags from 1 to 2 or a more complex
rule taking other conditions into account.
I suggest that we explicitly implement these classification levels and uses, I
hope there is a simple case where there are some classifications that should be
propagated for all consumer cases, and rules can run to override the
classifications and we can find a way of doing this using a governance role and
we could make this work. Maybe some supplied Ranger rules and tags that Atlas
used out of the box. GDPR rules and tags would be a good use case here.
> Introduce top level relationships
> ---------------------------------
>
> Key: ATLAS-1690
> URL: https://issues.apache.org/jira/browse/ATLAS-1690
> Project: Atlas
> Issue Type: Improvement
> Reporter: David Radley
> Assignee: David Radley
> Labels: VirtualDataConnector
> Attachments: Atlas_RelationDef_Json_Structure_v1.pdf, Atlas
> Relationships proposal v1.0.pdf, Atlas Relationships proposal v1.1.pdf, Atlas
> Relationships proposal v1.2.pdf, Atlas Relationships proposal v1.3.pdf, Atlas
> Relationships proposal v1.4.pdf, Atlas Relationships proposal v1.5.pdf, Atlas
> Relationships proposal v1.6.pdf, Atlas Relationships proposal v1.7.pdf
>
>
> Introduce top level relationships including support for
> -many to many relationships
> - relationship names including the name for both ends and the relationship.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)