[ 
https://issues.apache.org/jira/browse/HADOOP-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046860#comment-15046860
 ] 

Steve Loughran commented on HADOOP-12620:
-----------------------------------------

I'm trying to make sense of this: I think there's too much detail on marketing 
stuff, listing of things up the stack. 

JIRAs are for technical issues. We can discuss the engineering aspects, without 
worrying about the business plan merits. 

Similarly, we don't need citations and references to things like BigTable and 
HTTP1.1. You can assume the audience knows how to telnet to port 80 and type in 
a GET request by hand, has had time during test runs to read the many google 
papers, may even have spent time with the authors of some of them —perhaps were 
even former colleagues. And with things like HBase being based off BigTable, 
you really don't need to go there.

>From what I do understand

h3. You are proposing HDFS adds {{write(offset, data)}} (or more specifically 
{{seek(offset); write(data)}}). 

Everyone recognises the merits of this; it's the key feature of a POSIX FS 
which HDFS lacks. It's certainly something we've discussed in the past in a 
wistful "wouldn't it be nice if..." kind of way. Though that can go the other 
way , "wouldn't it be nice if all we offered clients was a blobstore API", as 
that has other benefits.

h3. you are proposing Multi Version Concurrency Control as the update mechanism.

MVCC is a way of delivering a view of data to clients which are consistent over 
a sequence of operations. 

Actually, we don't need to worry about that. The consistency model of HDFS 
already says, "there is no guarantee when or whether changes to the contents of 
a file (or its metadata) becomes visible to current readers ([filesystem 
specification|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md]).

that means today, append, rename, delete: they may become visible to callers 
with existing open streams, they may not, they may become visible at some point 
as the caller reads or seeks through the FS.

Posix does sort of say "changes should be visible", but even in NFS, the cache 
model delayed changes  (see, _The design and implementation of the Sun Network 
Filesystem_). Inconsistency significantly aids implementation and performance 
of a distributed FS.

So: we don't need to worry about providing a consistent view of data to 
clients. This is good, because if there was, say, a 30GB file and one client 
went {{write(offset=1GB, data-3GB)}}, HDFS would suddenly have to snapshot 3GB 
of data to serve up to callers. And then if another client did exactly the same 
operation, there'd be another snapshot, etc, etc... and before long you get to 
implement a DOS attack against the storage capacity of HDFS.

What does need to be addressed is:

# How to implement offset overwrites without threatening the integrity of data 
stored in HDFS. That is, the existing write-chain needs to be set up to now 
have replicated overwrite operations. The append code shows the beginning of 
what needs to be done there —though the fact that they were adding entirely new 
blocks made this possible.

# How to implement post-EOF writes. That is, for a 30GB file, how to handle 
{{write(offset=50GB, data='a')}} by writing a small number of bytes, rather 
than having to save 20GB of zeros.  Effectively that means HDFS has to 
implement sparse files. That's both in generating them, and having clients work 
with them efficiently, for both reading and further updates. We also have to 
consider whether writing to a sparse file fills up quotas based on the actual 
or theoretical size. Theoretical would be the easiest, and avoid quirks like 
quotas being exceeded if you go back to offset=35GB and writing 10GB of data.

The impact on the layers above, they would be tangible, but the foundational 
feature: seek+write, is what everything depends on. Without that, there's no 
point worrying about dependencies in other projects, even filing the JIRAs.

Please then, come up with your proposal for this. A PDF attached to this JIRA 
would be a start. It should cover the details of how this can be implemented 
within the Hadoop distributed FS as it stands today: with the core write chain, 
plus the new complexities of encrypted storage, erasure coding, multi-tier 
storage with tier-specific quotas. That's more than just theoretical details, 
you're going to have to look at the code and make suggestions. Ideally, an 
initial proof of concept on your own fork of the codebase, code + basic tests, 
with all the existing regression tests verifying nothing appears to have 
broken. [How to Contribute|https://wiki.apache.org/hadoop/HowToContribute] 
covers the process here.

Is that a lot to ask? Yes, but HDFS is the most critical part of the Hadoop 
stack; data integrity is the one thing the team cares about more than anything 
else. Something at the YARN layer could impact availability or performance —but 
it shouldn't lose or corrupt data. Things at the HDFS layer do, and every time 
something has gone in there have been surprises downstream. Certainly it's why 
the HDFS team doesn't trust me to changes in their code.

To close then: having a tangible proposal of how to implement this on the 
existing HDFS codebase would be the best way to start this work —text and 
initial PoC.


> Advanced Hadoop Architecture (AHA) - Common
> -------------------------------------------
>
>                 Key: HADOOP-12620
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12620
>             Project: Hadoop Common
>          Issue Type: New Feature
>            Reporter: Dinesh S. Atreya
>            Assignee: Dinesh S. Atreya
>
> Advance Hadoop Architecture (AHA) / Advance Hadoop Adaptabilities (AHA):
> See 
> https://issues.apache.org/jira/browse/HADOOP-12620?focusedCommentId=15046300&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15046300
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to