[ 
https://issues.apache.org/jira/browse/HADOOP-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045070#comment-15045070
 ] 

wangchao commented on HADOOP-12619:
-----------------------------------

The code of hadoop 2.7.1 changes the implement of GzipCodec.createOutputStream 
as 

{code}
  @Override
  public CompressionOutputStream createOutputStream(OutputStream out) 
    throws IOException {
    if (!ZlibFactory.isNativeZlibLoaded(conf)) {
      return new GzipOutputStream(out);
    }
    return CompressionCodec.Util.
        createOutputStreamWithCodecPool(this, conf, out);
  }

  @Override
  public CompressionOutputStream createOutputStream(OutputStream out, 
                                                    Compressor compressor) 
  throws IOException {
    return (compressor != null) ?
               new CompressorStream(out, compressor,
                                    conf.getInt("io.file.buffer.size", 
                                                4*1024)) :
               createOutputStream(out);
  }

    static CompressionOutputStream createOutputStreamWithCodecPool(
        CompressionCodec codec, Configuration conf, OutputStream out)
        throws IOException {
      Compressor compressor = CodecPool.getCompressor(codec, conf);
      CompressionOutputStream stream = null;
      try {
        stream = codec.createOutputStream(out, compressor);
      } finally {
        if (stream == null) {
          CodecPool.returnCompressor(compressor);
        } else {
          stream.setTrackedCompressor(compressor);
        }
      }
      return stream;
    }
 
{code}

but CompressorStream override the close method and still not return the 
compressor to pool



> Native memory leaks in CompressorStream
> ---------------------------------------
>
>                 Key: HADOOP-12619
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12619
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: wangchao
>
> The constructor of org.apache.hadoop.io.compress.CompressorStream requires an 
> org.apache.hadoop.io.compress.Compressor  object to compress bytes but it 
> does not invoke the compressor's finish method when close method are called. 
> This may causes the native memory leaks if the compressor is only used by 
> this CompressorStream object.
> I found this when set up a flume agent with gzip compression, the native 
> memory grows slowly and cannot fall back. 
> {code}
>   @Override
>   public CompressionOutputStream createOutputStream(OutputStream out) 
>     throws IOException {
>     return (ZlibFactory.isNativeZlibLoaded(conf)) ?
>                new CompressorStream(out, createCompressor(),
>                                     conf.getInt("io.file.buffer.size", 
> 4*1024)) :
>                new GzipOutputStream(out);
>   }
>   @Override
>   public Compressor createCompressor() {
>     return (ZlibFactory.isNativeZlibLoaded(conf))
>       ? new GzipZlibCompressor(conf)
>       : null;
>   }
> {code}
> The method of CompressorStream is
> {code}
>   @Override
>   public void close() throws IOException {
>     if (!closed) {
>       finish();
>       out.close();
>       closed = true;
>     }
>   }
>   @Override
>   public void finish() throws IOException {
>     if (!compressor.finished()) {
>       compressor.finish();
>       while (!compressor.finished()) {
>         compress();
>       }
>     }
>   }
> {code}
> No one will end the compressor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to