"^A" is used as delimiter in the file. However, I don't think this is the reason causing the problem, because there are files also using "^A" as delimiter but with no problem. BTW, the reason using "^A" as delimiter is these files are hive data.
On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <[email protected]> wrote: > Is there a carriage return / new line / some other whitespace which `cat` > may be appending? > > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <[email protected]> wrote: >> >> Hello, >> >> Suppose, I name the HDFS file which cause the problem as A. >> >> hdfs dfs -ls A >> -rw-r--r-- 3 web_admin hdfs 868003931 2017-01-04 09:05 A >> >> hdfs dfs -get A AFromGet >> hdfs dfs -cat A > AFromCat >> >> ls -l >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan 5 18:32 AFromGet >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan 5 18:32 AFromCat >> >> hdfs dfs -put AFromGet >> >> diff <(hdfs dfs -cat A) <(hdfs dfs -cat AFromGet) >> (no output, which means the contents of two files are same. At least, >> after "cat") >> >> hdfs dfs -checksum A >> A MD5-of-262144MD5-of-512CRC32C >> 000002000000000000040000e667fb4f0dda78101feb2b689af8260b >> >> hdfs dfs -checksum AFromGet >> AFromGet MD5-of-262144MD5-of-512CRC32C >> 0000020000000000000400007284759249ff98c7395e6a4bb59343dc >> >> As I listed some results above. I wonder why is the size of the file >> changed. >> Any help will be GREAT! >> >> Thank you. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
