BTW, these files are collected using apache flume. On Tue, Jan 10, 2017 at 10:11 AM, Mungeol Heo <[email protected]> wrote: > Yes, that's the reason I wonder why is the specific one file cause the > problem while other data files of a hive table are not. > > On Tue, Jan 10, 2017 at 3:42 AM, Ravi Prakash <[email protected]> wrote: >> I have not been able to reproduce this: >> >> [raviprak@ravi ~]$ hdfs dfs -put HuckleberryFinn.txt / >> [raviprak@ravi ~]$ cd /tmp >> [raviprak@ravi tmp]$ hdfs dfs -get /HuckleberryFinn.txt >> [raviprak@ravi tmp]$ hdfs dfs -cat /HuckleberryFinn.txt > hck >> [raviprak@ravi tmp]$ md5sum hck >> 8dc8966178cc1bf4eb95a5b31780269c hck >> [raviprak@ravi tmp]$ md5sum HuckleberryFinn.txt >> 8dc8966178cc1bf4eb95a5b31780269c HuckleberryFinn.txt >> [raviprak@ravi tmp]$ hdfs dfs -put hck / >> [raviprak@ravi tmp]$ hdfs dfs -checksum /HuckleberryFinn.txt >> /HuckleberryFinn.txt MD5-of-0MD5-of-512CRC32C >> 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8 >> [raviprak@ravi tmp]$ hdfs dfs -checksum /hck >> /hck MD5-of-0MD5-of-512CRC32C >> 000002000000000000000000c99e8741a1f3d311513df9d9e73b0bc8 >> >> This is on trunk. >> >> On Sun, Jan 8, 2017 at 6:52 PM, Mungeol Heo <[email protected]> wrote: >>> >>> "^A" is used as delimiter in the file. >>> However, I don't think this is the reason causing the problem, because >>> there are files also using "^A" as delimiter but with no problem. >>> BTW, the reason using "^A" as delimiter is these files are hive data. >>> >>> On Sat, Jan 7, 2017 at 12:17 AM, Ravi Prakash <[email protected]> >>> wrote: >>> > Is there a carriage return / new line / some other whitespace which >>> > `cat` >>> > may be appending? >>> > >>> > On Thu, Jan 5, 2017 at 6:09 PM, Mungeol Heo <[email protected]> >>> > wrote: >>> >> >>> >> Hello, >>> >> >>> >> Suppose, I name the HDFS file which cause the problem as A. >>> >> >>> >> hdfs dfs -ls A >>> >> -rw-r--r-- 3 web_admin hdfs 868003931 2017-01-04 09:05 A >>> >> >>> >> hdfs dfs -get A AFromGet >>> >> hdfs dfs -cat A > AFromCat >>> >> >>> >> ls -l >>> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan 5 18:32 AFromGet >>> >> -rw-r--r-- 1 hdfs hadoop 883715443 Jan 5 18:32 AFromCat >>> >> >>> >> hdfs dfs -put AFromGet >>> >> >>> >> diff <(hdfs dfs -cat A) <(hdfs dfs -cat AFromGet) >>> >> (no output, which means the contents of two files are same. At least, >>> >> after "cat") >>> >> >>> >> hdfs dfs -checksum A >>> >> A MD5-of-262144MD5-of-512CRC32C >>> >> 000002000000000000040000e667fb4f0dda78101feb2b689af8260b >>> >> >>> >> hdfs dfs -checksum AFromGet >>> >> AFromGet MD5-of-262144MD5-of-512CRC32C >>> >> 0000020000000000000400007284759249ff98c7395e6a4bb59343dc >>> >> >>> >> As I listed some results above. I wonder why is the size of the file >>> >> changed. >>> >> Any help will be GREAT! >>> >> >>> >> Thank you. >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> >> >>> > >> >>
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
