Re: Compaction does not release file descriptors

Robert Samuel Newson Mon, 05 Sep 2016 05:29:52 -0700

Thanks Alex.

I can't see an obvious path where we open but fail to close in the compaction 
daemon module, but we could move to using couch_util:with_db/2 for extra 
safety, so perhaps it's elsewhere in the compaction code.


It would be useful to know what kind of processes are keeping the files open, 
we can see those with process_info(Pid of couch_file, [monitored_by]).

B.

> On 5 Sep 2016, at 13:20, Alexander Shorin <[email protected]> wrote:
> 
> Robert,
> 
> This is an old bug that compactor has no matter what Erlang version
> was used. I hit it during work on munin-couchdb:
> https://github.com/gws/munin-plugin-couchdb/#open-files
> 
> Graph shows what will happen if you'll compact a lot of databases
> continuously for some time. I recall how I with Davis and Adam borrow
> into the code looking for the reason where we don't close the file,
> but it wasn't easy to find the cause.
> 
> --
> ,,,^..^,,,
> 
> 
> On Mon, Sep 5, 2016 at 12:32 PM, Robert Samuel Newson
> <[email protected]> wrote:
>> Hi Nigel,
>> 
>> Thanks for the report. We've seen this issue with R14 series of erlang, 
>> where calling file:close/1 doesn't always close the file descriptor, so I 
>> suggest trying something newer (I can vouch for 17.5 from extensive 
>> production experience).
>> 
>> Can you confirm if this is _every_ compaction or only a subset? If the 
>> latter, can you estimate what percentage?
>> 
>> You say "at 50%" so I'm inferring you've enabled the compaction daemon, but 
>> please confirm.
>> 
>> B.
>> 
>> 
>>> On 5 Sep 2016, at 09:53, Nigel Phippen <[email protected]> wrote:
>>> 
>>> Hello all,
>>> 
>>> This is my first post so please bear with me.
>>> 
>>> I am running CouchDB 1.6.1 (with Erlang is R14B-04.3.el6) on CentOS 6.7.
>>> 
>>> I have multiple databases on our single server, with each database having 
>>> around a dozen views. Thousands of new documents are added to the databases 
>>> throughout the day but there are no document deletions (unless done for 
>>> administrative purposes). Many documents are regularly updated, possibly 
>>> hundreds of times, leading to documents having multiple versions. Database 
>>> and view compaction is set to occur at 50%.
>>> 
>>> The problem I am seeing is that, over the course of several days, disk 
>>> space is being consumed in the volume housing the CouchDB databases. Upon 
>>> investigation, I can see that CouchDB (or possibly some other process) 
>>> appears to have moved files to a  '/usr/local/var/lib/couchdb/.delete' 
>>> folder, ready for deletion, but has not actually fully deleted the files.
>>> 
>>>    -------------------------------------------------------
>>>    # /usr/sbin/lsof +aL1
>>> 
>>>    COMMAND     PID      USER   FD   TYPE DEVICE SIZE/OFF NLINK   NODE NAME
>>>    beam.smp  21784   couchdb   19u   REG  253,1 12747263     0 157740 
>>> /usr/local/var/lib/couchdb/.delete/e3a4de3acbf62f6fe6621c0d584adcee 
>>> (deleted)
>>>    beam.smp  21784   couchdb   41u   REG  253,1 13292013     0 157757 
>>> /usr/local/var/lib/couchdb/.delete/7202d47094b51d60d9a4cc39f448f2c8 
>>> (deleted)
>>>    beam.smp  21784   couchdb   61u   REG  253,1 12317183     0 158688 
>>> /usr/local/var/lib/couchdb/.delete/518f417167c31921925fe66b11ca85d2 
>>> (deleted)
>>>    beam.smp  21784   couchdb   64u   REG  253,1  8471022     0 158669 
>>> /usr/local/var/lib/couchdb/.delete/bea3b216976a62912ee79034fc374314 
>>> (deleted)
>>>    beam.smp  21784   couchdb  162u   REG  253,1  9097692     0 139109 
>>> /usr/local/var/lib/couchdb/.delete/48f75f12d680afbd7ec1c0c3c01ccb99 
>>> (deleted)
>>>    beam.smp  21784   couchdb  168u   REG  253,1  8901102     0 155061 
>>> /usr/local/var/lib/couchdb/.delete/e5692819be8422a83f675daa1267cc3a 
>>> (deleted)
>>>    beam.smp  21784   couchdb  187u   REG  253,1 13046253     0 157756 
>>> /usr/local/var/lib/couchdb/.delete/8f2cb8517ab7659cc04091cc9db735e8 
>>> (deleted)
>>>    -------------------------------------------------------
>>> 
>>> Over several days, there can be dozens of these files, consuming GBytes of 
>>> space. Left unchecked, all disk space in the /usr volume will be consumed, 
>>> causing the system to fail. The only way to clear out the files for good is 
>>> to restart the CouchDB service.
>>> 
>>> This appears to be the same problem as reported in 
>>> https://issues.apache.org/jira/browse/COUCHDB-926  over five years ago.
>>> 
>>> I'd appreciate any assistance is resolving this issue. Please let me know 
>>> if additional information is required.
>>> 
>>> Many thanks,
>>> 
>>> Nigel.
>>> ---------------------------------------------------------------------------------------
>>> This email has been scanned for email related threats and delivered safely 
>>> by Mimecast.
>>> For more information please visit http://www.mimecast.com
>>> ---------------------------------------------------------------------------------------
>>> 
>>

Re: Compaction does not release file descriptors

Reply via email to