All,

I'm working on a cluster that is running Hadoop 2.7.3. I have one folder in 
particular where the command hdfs dfs -du is giving me strange results. If I 
query the folder and ask for a summary, it tells me 10 GB. If I don't ask for a 
summary, all of the folders underneath don't even add up to 1 GB, much less 10 
GB.

I've verified this is true over time and is true using the hdfs user or any 
other user. We are on an HDP cluster, so we are using Ranger for HDFS security, 
and Kerberos for authentication. We see similar results in -count, where the 
size and counts are both different. We have not seen this behavior in any other 
folders.

See below for a sample output we are seeing. I've replaced the full path with a 
fake path to protect the data we have on the cluster. Does anyone know anything 
that would cause this behavior? Thanks!

$ hdfs dfs -du -h /randomFolder
119.9 M  /randomFolder/bug
1.0 M    /randomFolder/commitment
86.8 K   /randomFolder/customfield
31.3 M   /randomFolder/epic
10.3 M   /randomFolder/feature
4.0 M    /randomFolder/insprintbug
372.9 K  /randomFolder/project
15.1 K   /randomFolder/projectstatus
330.9 M  /randomFolder/story
256.3 M  /randomFolder/subtask
74.7 K   /randomFolder/subtemplate
89.6 M   /randomFolder/task
7.4 M    /randomFolder/techdebt
117.7 K  /randomFolder/template
617.9 K  /randomFolder/tempomember
8.2 K    /randomFolder/tempoteam
1.4 M    /randomFolder/tempoworklog

$ hdfs dfs -du -h -s /randomFolder
10.6 G  /randomFolder

David McGinnis

Reply via email to