[ 
https://issues.apache.org/jira/browse/ATLAS-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026177#comment-16026177
 ] 

Nigel Jones commented on ATLAS-1821:
------------------------------------

Srikanth,

I have some questions as to how tag propogation might work in the following 
scenario

For governance purposes I have a security classification "confidentiality"
This can be one of four values - public, internal, confidential, topsecret

I would apply it to a Database column ie "location" 


An asset cannot be classified with two different confidentialities

I see a few  approaches to this 1a: preventing such a relationship being 
created
 1b: defining precedence when the data is retrieved (consumer centric 
interfaces might return only the "winner" whilst repository level generic 
interfaces might return a full list of relationships).  The precendence could 
be based on closeness to the entity and/or characteristics of the 
classification values (ie order)
 1c: allowing it and not specifying the meaning - this concerns me as different 
consumers may infer different things

Also how are we representing this characteristic of the classification?

Such a relationship could be created through the repository APIs. Recently 
there has also been a jira opened to discuss collections. This could also 
easily lead to the scenario above if tag propogation is allowed and IF 
collections are a first class object & can have classifications (as opposed to 
just being used to support set based operations whereby the actual columns 
would be updated... Which I'm ok with)

 - add entity to a collection with confidentiality=public
 - add entity to a a different  collection with confidentiality=topsecret

The same situation exists where classifications are associated with terms, if 
multiple terms are associated with the column.. Even if that isn't "likely 
"from a business perspective we need defined behaviour.

I see in the proposal above that
 * A union of all classifications is always presented. This does seem the 
simpler approach, but could lead to a dual classification of topsecret and 
public in the example above. If so we need to be aware of this and agree what 
it means. Purely down to the application (or higher api in the stack) to 
resolve?
 * conflict resolution is defined as a manual process. At the API level would 
this mean APIs would fail until a conflict is resolved. For example the term 
association causing the conflicting propogation would fail? The adding to a 
collection would fail? 


David,
 In your example can I check I understand your scenario  * There is a 
"national insurance number" glossary term  * this is classified as 
"confidential" (presumably using a confidentiality classification which can be 
one of the values I listed above)
  * the same "national insurance number" is mapped to two columns.  * One of 
these columns is a clear-text representation of the national insurance number  
* One of these columns (so persisted in the database rather than computed at 
access time) is a masked form - perhaps just the first two characters
  * you assert we need rules

I'm not so sure.... Surely in this case those two columns are different 
business meanings? Either I would a) Create a different glossary term called 
"redacted national insurance number" & classify this as non-confidential b) 
Not even store the redacted form, and allow policies (ie in ranger) to do the 
masking at runtime

Even this scenario brings up tag propogation questions though... In a) above, 
should "redacted national insurance number" be related to "national insurance 
number". Yes I think it should. But should it inherit the classification. No. 
It could as a default of course, but then overriden to public. This brings us 
back to the original question of needing to control propogation.I wonder if 
all inbound and outbound links need to the ability to * Allow outbound 
propogation" * Allow inbound propogation

Since both parties in the relationship need to have some say in this

Going back to the original point, some classifications may require to be 
unique, this is an additional attribute of the classification. Further it will 
automatically prevent inbound propogation of other instances of classifications 
of the same type. 

Apologies if I've not used the right terms as per the design 
doc/implementation, but hopefully you get the idea :-)


> Classification propagation from entity to a derivative or child entity
> ----------------------------------------------------------------------
>
>                 Key: ATLAS-1821
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1821
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core, atlas-webui
>            Reporter: Srikanth Venkat
>             Fix For: 0.9-incubating
>
>
> User Story:
> As a data steward, I need a scalable way to quickly and efficiently propagate 
> classification across the information supply chain to support efficient 
> searches and classification based security for compliance and audit purposes. 
> This requires:
> 1. Classifications for derivative entities should be inherited from the 
> originator and to child entities from parent. 
> For example, if a Hive column is classified "Confidential" then resulting 
> column created from a CTAS operation should also be tagged "Confidential" to 
> maintain the classification of the original entity. In the case where 2 or 
> more entities are composed, the derivative entity should have the union of 
> all classifications of each source entity.
> 2. Business Terms:
> a. Child business terms should inherit the classifications associated with 
> the parent term.
> b. The option to propagate classification to child business terms in a 
> hierarchy should be provided
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a term should propagate to data assets that are already attached 
> to that business term as well
> 3. Data assets
> a. For all supported data asset types in Atlas, if a derivative asset is 
> created it should inherit the tags and attributes from the original asset.
> b. the option to propagate tags to child entities should be provided (e.g. if 
> you tag a folder in HDFS optionally tag all the files within it)
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a parent object should be inherited after child creation 
> dynamically (unless a flag is set not to do this)
> e. Derived data assets should have the tags of the original data asset.
> Conflict resolution - if there are different values for attributes on tags 
> (classifications) on upstream or parent entities used to derive a data asset 
> then user needs to be prompted for action to resolve the conflict. Once 
> resolved, the resolved value should be carried forth to derived assets.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to