[ 
https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205748#comment-16205748
 ] 

Steve Loughran commented on HADOOP-14876:
-----------------------------------------

h4. Privacy scope

* add: sometimes things are marked as private when they end up being essential 
(example: UserGroupInformation). In situations,  raise issues with the team to 
see if we can't add some form of @public tag  \cite{HADOOP-10776}. 

Now, what about the fact that distributed shell example has (or at least did 
when I last looked) use of private code?
e.g org.apache.hadoop.io.DataOutputBuffer, the timeline plugin, 
NMClientAsyncImpl, ...  You can look at the imports and probably 20% of the 
class imports (not interfaces, yarn records) are tagged as private/limited 
private. We are not in a position to tell people not to use @Private, not given 
we consider doing so essential even for basic example yarn apps.

* What does it mean if something is tagged as LimitedPrivate for one app (esp 
HBase & Hive, which aren't within our own codebase)? to me, that says "we know 
these things get used downstream, or we've added them as a special secret 
back-door". But who gets to choose which apps can actually use it? Limited 
private+outside our codebase == public, which is something we should 
acknowledge when scoping things. And LimitedPrivate(Mapreduce) often means 
"every YARN app needs these".

* What does it mean if a release removes/changes something you depended on 
which was tagged private/limited private. Complain. It may get ignored, but it 
may have been done without awareness of wide use.

h4. Semantics

I take this bit very seriously, having been deeply involved in the original 
paragraphs, and an aficionado of all D.L. Parnas's writings on the notion of 
"interface".  As far as I'm concerned, the defacto definitions of semantics are 
defined in our unit tests "what we expect" and in those of widely used 
applications "what HBase and Hive expect". We know if we break the latter then 
people complain, and, while we may do so, its not something want to. B

L113. yeah right. It's usually the first port of call, & if you think 
otherwise, you're not writing enough downstream code. 

the original Compatibility.md calls out that some bits of the system have 
non-normative specifications; eg fileystem. I would consider that significantly 
more normative than the javadocs, most of which are vague aspirations of 
functionality. Usually the javadocs don't have any mention of concurrency, 
which matters a lot; for that you do end up delving into the source and/or 
using it in a way which appears to work (HDFS's use of input streams), when in 
fact they'r just using accidental bits of the semantics which we are now 
expected to maintain.


+maybe mention StreamCapabilities.hasCapability as a way of determining if FS 
streams offer a feature, say it's more to support variants in back ends rather 
than a way for us to remove things. But do mention: good practice to check for 
new things rather than assume that if HDFS implements it, it works everywhere.

L160" The audit log format may not change incompatibly between major releases." 
?? "may change?" or "must not"

L189. Need to explain how to differentiate log chaff from "real" output. 
Indeed, I'm curious myself.


L208. We don't require log4j though; other back ends may be supportable.

L229. Nothing called "s3.*: no more

L298 "No new (exposed) dependency will be added to Hadoop between major 
releases."

Can't make that guarantee. Qualify "via the shaded clients"


Things that we've glossed over

* No statement on supported operating systems, filesystems, x86 parts IPv4 vs 
v6, If I code for Windows, how long will hadoop-client work there? What if I 
target SPARC?

* Concurrency: say "we try not to make things worse"?; degradations are 
considered defects except when its just some accidental side effect of 
excessive logging?


> Create downstream developer docs from the compatibility guidelines
> ------------------------------------------------------------------
>
>                 Key: HADOOP-14876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14876
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 3.0.0-beta1
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: HADOOP-14876.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to