[
https://issues.apache.org/jira/browse/ATLAS-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026177#comment-16026177
]
Nigel Jones commented on ATLAS-1821:
------------------------------------
Srikanth,
I have some questions as to how tag propogation might work in the following
scenario
For governance purposes I have a security classification "confidentiality"
This can be one of four values - public, internal, confidential, topsecret
I would apply it to a Database column ie "location"
An asset cannot be classified with two different confidentialities
I see a few approaches to this 1a: preventing such a relationship being
created
1b: defining precedence when the data is retrieved (consumer centric
interfaces might return only the "winner" whilst repository level generic
interfaces might return a full list of relationships). The precendence could
be based on closeness to the entity and/or characteristics of the
classification values (ie order)
1c: allowing it and not specifying the meaning - this concerns me as different
consumers may infer different things
Also how are we representing this characteristic of the classification?
Such a relationship could be created through the repository APIs. Recently
there has also been a jira opened to discuss collections. This could also
easily lead to the scenario above if tag propogation is allowed and IF
collections are a first class object & can have classifications (as opposed to
just being used to support set based operations whereby the actual columns
would be updated... Which I'm ok with)
- add entity to a collection with confidentiality=public
- add entity to a a different collection with confidentiality=topsecret
The same situation exists where classifications are associated with terms, if
multiple terms are associated with the column.. Even if that isn't "likely
"from a business perspective we need defined behaviour.
I see in the proposal above that
* A union of all classifications is always presented. This does seem the
simpler approach, but could lead to a dual classification of topsecret and
public in the example above. If so we need to be aware of this and agree what
it means. Purely down to the application (or higher api in the stack) to
resolve?
* conflict resolution is defined as a manual process. At the API level would
this mean APIs would fail until a conflict is resolved. For example the term
association causing the conflicting propogation would fail? The adding to a
collection would fail?
David,
In your example can I check I understand your scenario * There is a
"national insurance number" glossary term * this is classified as
"confidential" (presumably using a confidentiality classification which can be
one of the values I listed above)
* the same "national insurance number" is mapped to two columns. * One of
these columns is a clear-text representation of the national insurance number
* One of these columns (so persisted in the database rather than computed at
access time) is a masked form - perhaps just the first two characters
* you assert we need rules
I'm not so sure.... Surely in this case those two columns are different
business meanings? Either I would a) Create a different glossary term called
"redacted national insurance number" & classify this as non-confidential b)
Not even store the redacted form, and allow policies (ie in ranger) to do the
masking at runtime
Even this scenario brings up tag propogation questions though... In a) above,
should "redacted national insurance number" be related to "national insurance
number". Yes I think it should. But should it inherit the classification. No.
It could as a default of course, but then overriden to public. This brings us
back to the original question of needing to control propogation.I wonder if
all inbound and outbound links need to the ability to * Allow outbound
propogation" * Allow inbound propogation
Since both parties in the relationship need to have some say in this
Going back to the original point, some classifications may require to be
unique, this is an additional attribute of the classification. Further it will
automatically prevent inbound propogation of other instances of classifications
of the same type.
Apologies if I've not used the right terms as per the design
doc/implementation, but hopefully you get the idea :-)
> Classification propagation from entity to a derivative or child entity
> ----------------------------------------------------------------------
>
> Key: ATLAS-1821
> URL: https://issues.apache.org/jira/browse/ATLAS-1821
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core, atlas-webui
> Reporter: Srikanth Venkat
> Fix For: 0.9-incubating
>
>
> User Story:
> As a data steward, I need a scalable way to quickly and efficiently propagate
> classification across the information supply chain to support efficient
> searches and classification based security for compliance and audit purposes.
> This requires:
> 1. Classifications for derivative entities should be inherited from the
> originator and to child entities from parent.
> For example, if a Hive column is classified "Confidential" then resulting
> column created from a CTAS operation should also be tagged "Confidential" to
> maintain the classification of the original entity. In the case where 2 or
> more entities are composed, the derivative entity should have the union of
> all classifications of each source entity.
> 2. Business Terms:
> a. Child business terms should inherit the classifications associated with
> the parent term.
> b. The option to propagate classification to child business terms in a
> hierarchy should be provided
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a term should propagate to data assets that are already attached
> to that business term as well
> 3. Data assets
> a. For all supported data asset types in Atlas, if a derivative asset is
> created it should inherit the tags and attributes from the original asset.
> b. the option to propagate tags to child entities should be provided (e.g. if
> you tag a folder in HDFS optionally tag all the files within it)
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a parent object should be inherited after child creation
> dynamically (unless a flag is set not to do this)
> e. Derived data assets should have the tags of the original data asset.
> Conflict resolution - if there are different values for attributes on tags
> (classifications) on upstream or parent entities used to derive a data asset
> then user needs to be prompted for action to resolve the conflict. Once
> resolved, the resolved value should be carried forth to derived assets.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)