[
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172771#comment-14172771
]
Byron Wong commented on HADOOP-6857:
------------------------------------
*Scenario 2*: we still have "snapshottable directory "/test" with same file
"a". We then create a fresh snapshot "ss1". We then run {{hadoop fs -rm
-skipTrash /test/a}}.
{{hadoop fs -du /test}} gives an empty output, as expected.
{{hadoop fs -du -s /test}} outputs:
{code}
41 123 /test
{code}
which makes sense, given that we know about the existence of the snapshot.
However, when we run {{hadoop fs -du -s /test/.snapshot/ss1}}, we get:
{code}
0 0 /test/.snapshot/ss1
{code}
This is inconsistent with the numbers we get when we run {{hadoop fs -du
/test/.snapshot/ss1}}:
{code}
41 123 /test/.snapshot/ss1/a
{code}
Upon further investigation, we see that running {{hadoop fs -du -s
/test/.snapshot/anySnapshot}} gives us the information about the current state
of the real directory. This means that {{hadoop fs -du -s
/test/.snapshot/anySnapshot}} is equivalent to running {{hadoop fs -du /test/}}
and adding the numbers up, which is non-intuitive.
For example, let's add a 2 byte, 3 replication file "/test/1" (/test/a is still
deleted). Now {{hadoop fs -du -s /test/.snapshot/ss1}} gives us:
{code}
2 6 /test/.snapshot/ss1
{code}
whereas the results of {{hadoop fs -du /test/.snapshot/ss1}} remains the same:
{code}
41 123 /test/.snapshot/ss1/a
{code}
> FsShell should report raw disk usage including replication factor
> -----------------------------------------------------------------
>
> Key: HADOOP-6857
> URL: https://issues.apache.org/jira/browse/HADOOP-6857
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs
> Reporter: Alex Kozlov
> Assignee: Byron Wong
> Attachments: HADOOP-6857.patch, show-space-consumed.txt
>
>
> Currently FsShell report HDFS usage with "hadoop fs -dus <path>" command.
> Since replication level is per file level, it would be nice to add raw disk
> usage including the replication factor (maybe "hadoop fs -dus -raw <path>"?).
> This will allow to assess resource usage more accurately. -- Alex K
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)