[ 
https://issues.apache.org/jira/browse/ATLAS-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037040#comment-16037040
 ] 

Graham Wallis commented on ATLAS-1821:
--------------------------------------

Can I suggest that there are two parts to what's needed: a flexible 
classification model and a system for controlled derivation. 
The first part - the flexible model - should be a multi-dimensional 
classification scheme, with an arbitrary number of dimensions, although most 
users would probably not need more than about half a dozen dimensions. 
Dimensions are orthogonal. Each dimension contains values, which can be either 
categorical or continuous (ordered). An example of a categorical dimension 
might be: 'region' with values 'north-america', 'south-america', etc. The 
values are mutually exclusive and there is no implied order or precedence of 
values. A categorical dimension can be used to support access control decisions 
based on rules or policy. An example of a continuous dimension might be a 
'document-classification' dimension with values 'public', 'internal-use', 
'confidential', 'secret', etc. On an continuous dimension there is an implied 
order, which would be used to support precedence rules. Each data item can be 
thought of as occupying some point in the multi-dimensional space.
The second part - the derivation system - needs to be tied to the relationships 
between data items and provide a process model for how the dimensions of a 
source item are transformed into, or affect, the dimensions of a target item. A 
pair of data items could be related in one of a number of ways: one item may be 
derived from the other as a literal copy (duplicate), or it could be that one 
item is a transformed (e.g. redacted) copy of the other. It is also possible 
that one item is an aggregation of other items, or a filtered selection of 
other a collection of items. Depending on the relationship, it would be 
possible to either propagate the same classification dimensions and values from 
source to target, or to 'downgrade' or 'escalate' a continuous dimension due to 
the redaction or aggregation of information between source and target. A 
dimension could be omitted or a dimension and value could be introduced, as a 
result of a transformation relationship. The modification of classification 
dimensions and values would need to be tightly controlled as part of the 
definition of the relationship. For example, a data owner would need to inspect 
a redacting transformation and specify/certify the process by which 
classification settings are derived for the redacted item.  
I think it's essential that the second part (propagations and transformations) 
is closely (auditably) tied to the first part.

> Classification propagation from entity to a derivative or child entity
> ----------------------------------------------------------------------
>
>                 Key: ATLAS-1821
>                 URL: https://issues.apache.org/jira/browse/ATLAS-1821
>             Project: Atlas
>          Issue Type: Improvement
>          Components:  atlas-core, atlas-webui
>            Reporter: Srikanth Venkat
>             Fix For: 0.9-incubating
>
>
> User Story:
> As a data steward, I need a scalable way to quickly and efficiently propagate 
> classification across the information supply chain to support efficient 
> searches and classification based security for compliance and audit purposes. 
> This requires:
> 1. Classifications for derivative entities should be inherited from the 
> originator and to child entities from parent. 
> For example, if a Hive column is classified "Confidential" then resulting 
> column created from a CTAS operation should also be tagged "Confidential" to 
> maintain the classification of the original entity. In the case where 2 or 
> more entities are composed, the derivative entity should have the union of 
> all classifications of each source entity.
> 2. Business Terms:
> a. Child business terms should inherit the classifications associated with 
> the parent term.
> b. The option to propagate classification to child business terms in a 
> hierarchy should be provided
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a term should propagate to data assets that are already attached 
> to that business term as well
> 3. Data assets
> a. For all supported data asset types in Atlas, if a derivative asset is 
> created it should inherit the tags and attributes from the original asset.
> b. the option to propagate tags to child entities should be provided (e.g. if 
> you tag a folder in HDFS optionally tag all the files within it)
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a parent object should be inherited after child creation 
> dynamically (unless a flag is set not to do this)
> e. Derived data assets should have the tags of the original data asset.
> Conflict resolution - if there are different values for attributes on tags 
> (classifications) on upstream or parent entities used to derive a data asset 
> then user needs to be prompted for action to resolve the conflict. Once 
> resolved, the resolved value should be carried forth to derived assets.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to