[ 
https://issues.apache.org/jira/browse/HADOOP-14876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226300#comment-16226300
 ] 

Anu Engineer commented on HADOOP-14876:
---------------------------------------

[~templedf] Thanks for putting in an effort to get this done. I really 
appreciate all the thought that you have put into this document. I have some 
minor suggestions.

* Use cases Matrix: We have nine states, It would be nice to have a matrix that 
defines what changes in what release. 
For example (based on InterfaceClassification.html), not suggesting that these 
are the definitions, but add something that makes sense.
1. Public-Stable  - Changes only in a major release.
2. Public-Evolving - Changes possible in major, minor release.
3. Public-Unstable - Only for Web-UI.
4. Limited-Stable   -  Changes possible in major, minor release.
5. Limited-Evolving - Changes possible in major, minor release.
6. Limited-Unstable -  Changes possible in major, minor and maintenance release.
7. Private-*  - Changes possible in major, minor and maintenance release.
* It would be good to define which kind of releases are possible -- major, 
minor, and maintenance.
* Semantic compatibility
The semantics of the cluster is also defined by config files. The default 
values of the settings and some new settings can change the semantics. We 
should not break the compatibility in maintenance releases.
Currently, I am assuming that all Configs are public, but there are many that 
do not have definitions in the default XML. We should mandate that these values 
are not modified in maintenance releases.
Perhaps we should add a clause that states 
"No new configuration shall be added which can change the behavior of an 
existing cluster. For any new settings that are defined, care should be taken 
to ensure that it does not change the behavior of existing clusters."
* "The list of client artifacts is as follows:" -- may I suggest that we add 
the word "current" -- since someone could add new jar without breaking compact. 
IMHO, The guarantee should be that we will not break existing code, if we 
wanted to add a new JAR, it should be possible. 
*Hadoop Env Vars: "that are meaningful to Hadoop" -- This is a very loose 
definition. We should list out what will not change. Otherwise, all Hadoop 
variables are game. If that is the intention, I suggest that we state that 
explicitly. 
* Native Dependencies: As a non-native English language speaker, I wonder if 
this statement is ambiguous.
"Changes to the minimum required versions SHOULD NOT increase between minor 
releases within a major version, though updates because of security issues, 
license issues, or other reasons may occur."
Would we rewrite this as: 
"Hadoop will strive to maintain the minimum required versions of external 
dependencies stable during the lifetime of a major version. It is possible that 
due to reasons like security, license or end-of-life of a component, etc. We 
may be forced to upgrade."
* Protocol Dependencies:  "The components of Apache Hadoop may have 
dependencies that include their own protocols, such as Zookeeper, S3, Kerberos, 
etc. These protocol dependencies SHALL be treated as internal protocols and 
governed by the
same policy." 
I don't think that we can treat S3 or Kerberos as internal protocols. I suggest 
that we rewrite this as  "To the extent possible, We will strive to maintain 
same policies for external protocols(S3, Kerberos, etc.) that is used by 
Hadoop."
* Transports:  "Fixed service port numbers MUST be kept consistent to prevent 
breaking clients." Did you mean to write, default service ports instead of 
fixed?
* New transport mechanisms MUST only be introduced with minor or major version 
changes.
Not sure why this constraint is placed, I am trying to understand how 
introducing a new transport(assuming that older transports are stable) affects 
compatibility?
* Log output: "Log messages are intended for human consumption, though 
automation use cases are also supported." Not sure if this is intended, but 
"automation use cases are also supported" seems to imply that log will be 
parsable and stable. I am sure that is not what we want to offer. Should we 
just remove the automation phrase?
* All log output SHALL be considered Public and Evolving
I worry this is not sustainable. Let me provide an example-- let us say I 
search for a word, say block -- and now use that in a script which greps and 
identifies an event. Someone adds a statement, which has the same word. My 
parser stops working, even in a maintenance release. So in my mind, we should 
tag all log output as private and unstable, and used only for human 
consumption. If the intent is to specify that the log format will not change, 
then we should specify the log format is the one not changing.
* HDFS Metadata: HDFS data nodes store data in a private directory structure. 
The schema of that directory structure must remain stable to retain 
compatibility.
If we have an upgrade path, I submit that this should be possible. In fact, I 
think we should simply say, Upgrade and rollback of data stored in data node 
should be possible.
* Command Line Interface -- More of a question. Are we sure that 3.0 release is 
entirely complaint to this spec? For example, is the slaves.txt change covered 
by this ? and if so is that change fully compatible?
* Hadoop Configuration Files: Please see my comment in the semantics section.
* Directory Structure: Changing the directory structure of these 
user-accessible files can break compatibility, even in cases where the original 
path is preserved via symbolic links.
Do you have a case where this has happened? If not, we should allow this 
change. "user-accessible" is an extensive term. Does it mean all users along 
with Admins? If it is admins, all files that we ship with Hadoop will fall into 
the scope of this statement. So perhaps, we should define what this means, or 
say that files accessed via protocols offered by HDFS (RPC and HTTP) will 
remain stable.
* Operating Systems: We should have a full list of supported version documented 
somewhere. Is there such a link? If so can you please add a pointer to this 
document?

> Create downstream developer docs from the compatibility guidelines
> ------------------------------------------------------------------
>
>                 Key: HADOOP-14876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14876
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 3.0.0-beta1
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: Compatibility.pdf, DownstreamDev.pdf, 
> HADOOP-14876.001.patch, HADOOP-14876.002.patch, HADOOP-14876.003.patch, 
> HADOOP-14876.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to