[
https://issues.apache.org/jira/browse/ATLAS-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038474#comment-16038474
]
Nigel Jones commented on ATLAS-1821:
------------------------------------
Graham,
first part -- makes sense, in addition we need a cardinality, so for example
you can't assign your example of document-classification to the same entity
more than once (at least directly). In fact I don't think any classification
should be able to be assigned more than once? region for example, though
there's probably examples where it would make sense - what about "applicable"
regions where we have a list of areas where something (like a regulation, or
usage) applies? Perhaps that should be a case of allowing single or multi
valued/list. Makes no sense for a continuous dimension, but could for a
category?
second part - are you referring to data processing here? I think we do need to
model that as that we can understand the classification of derived data. In
some cases we may be able to understand how to promote/demote the
classification automatically, in others we won't, though we may be able to
provide hints of a valid range which could then be confirmed through a dev or
stewardship process? We also though have derivation that applies between terms
and entities which I think was the original discussion around propogation (such
as salary(asset:column) -> salary [term] -> spi [classification] as a simple
case) ... but also applies to structural relationships like salary
(asset:column) -> salaryinfo (asset:database) -> spi [classification) .. where
knowing that there's a containment relationship between the column and the
database is what we use to use the right classification for column. In that
example if there was ALSO as classification on the column itself that could
take precendence, even if weaker (some tools could determine all possible
derived classifications and offer a stewardship process to help a customer
verify/check anomalies)
Going back to your second statement again, was your intent that the description
applies to this case too? Actually I think it was ;-) I was thinking process as
"data movement" or similar, but I think your general concept still applies
> Classification propagation from entity to a derivative or child entity
> ----------------------------------------------------------------------
>
> Key: ATLAS-1821
> URL: https://issues.apache.org/jira/browse/ATLAS-1821
> Project: Atlas
> Issue Type: Improvement
> Components: atlas-core, atlas-webui
> Reporter: Srikanth Venkat
> Fix For: 0.9-incubating
>
>
> User Story:
> As a data steward, I need a scalable way to quickly and efficiently propagate
> classification across the information supply chain to support efficient
> searches and classification based security for compliance and audit purposes.
> This requires:
> 1. Classifications for derivative entities should be inherited from the
> originator and to child entities from parent.
> For example, if a Hive column is classified "Confidential" then resulting
> column created from a CTAS operation should also be tagged "Confidential" to
> maintain the classification of the original entity. In the case where 2 or
> more entities are composed, the derivative entity should have the union of
> all classifications of each source entity.
> 2. Business Terms:
> a. Child business terms should inherit the classifications associated with
> the parent term.
> b. The option to propagate classification to child business terms in a
> hierarchy should be provided
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a term should propagate to data assets that are already attached
> to that business term as well
> 3. Data assets
> a. For all supported data asset types in Atlas, if a derivative asset is
> created it should inherit the tags and attributes from the original asset.
> b. the option to propagate tags to child entities should be provided (e.g. if
> you tag a folder in HDFS optionally tag all the files within it)
> c. Ability to update the propagated tags manually via UI or through the API
> d. Tagging a parent object should be inherited after child creation
> dynamically (unless a flag is set not to do this)
> e. Derived data assets should have the tags of the original data asset.
> Conflict resolution - if there are different values for attributes on tags
> (classifications) on upstream or parent entities used to derive a data asset
> then user needs to be prompted for action to resolve the conflict. Once
> resolved, the resolved value should be carried forth to derived assets.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)