Hi,

Using Java API I write some text files in HDFS. Using the Gzip Codec, its
OK and compressed data will be written on HDFS with the same encoding, but
using the LZ4 Codec, it seems LZ4 codec change the text encoding and when I
check compressed text files on HDFS, the text file contents are unreadable!
Here is the code I use:

org.apache.hadoop.conf.Configuration conf = new Configuration();
CompressionCodecFactory ccf = new CompressionCodecFactory(conf);
CompressionCodec codec = ccf.getCodecByClassName(Lz4Codec.class.getName());

FileSystem fileSystem = FileSystem.get(conf);
FSDataOutputStream out;

out = fileSystem.create(path);

OutputStream compressedOutputSream = codec.createOutputStream(out);
BufferedWriter cout = new BufferedWriter( new OutputStreamWriter(
compressedOutputSream) );

cout.write(text_data + "\n");

How can I force LZ4 Codec to compress data as they are using the sane
encode or UTF-8?

Reply via email to