[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172771#comment-14172771
 ] 

Byron Wong commented on HADOOP-6857:
------------------------------------

*Scenario 2*: we still have "snapshottable directory "/test" with same file 
"a". We then create a fresh snapshot "ss1". We then run {{hadoop fs -rm 
-skipTrash /test/a}}.
{{hadoop fs -du /test}} gives an empty output, as expected.
{{hadoop fs -du -s /test}} outputs:
{code}
41  123  /test
{code}
which makes sense, given that we know about the existence of the snapshot.
However, when we run {{hadoop fs -du -s /test/.snapshot/ss1}}, we get:
{code}
0  0  /test/.snapshot/ss1
{code}
This is inconsistent with the numbers we get when we run {{hadoop fs -du 
/test/.snapshot/ss1}}:
{code}
41  123  /test/.snapshot/ss1/a
{code}
Upon further investigation, we see that running {{hadoop fs -du -s 
/test/.snapshot/anySnapshot}} gives us the information about the current state 
of the real directory. This means that {{hadoop fs -du -s 
/test/.snapshot/anySnapshot}} is equivalent to running {{hadoop fs -du /test/}} 
and adding the numbers up, which is non-intuitive.
For example, let's add a 2 byte, 3 replication file "/test/1" (/test/a is still 
deleted). Now {{hadoop fs -du -s /test/.snapshot/ss1}} gives us:
{code}
2  6  /test/.snapshot/ss1
{code}
whereas the results of {{hadoop fs -du /test/.snapshot/ss1}} remains the same:
{code}
41  123  /test/.snapshot/ss1/a
{code}

> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
>                 Key: HADOOP-6857
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6857
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Alex Kozlov
>            Assignee: Byron Wong
>         Attachments: HADOOP-6857.patch, show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.  
> Since replication level is per file level, it would be nice to add raw disk 
> usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?). 
>  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to