Thanks Alex. I can't see an obvious path where we open but fail to close in the compaction daemon module, but we could move to using couch_util:with_db/2 for extra safety, so perhaps it's elsewhere in the compaction code.
It would be useful to know what kind of processes are keeping the files open, we can see those with process_info(Pid of couch_file, [monitored_by]). B. > On 5 Sep 2016, at 13:20, Alexander Shorin <[email protected]> wrote: > > Robert, > > This is an old bug that compactor has no matter what Erlang version > was used. I hit it during work on munin-couchdb: > https://github.com/gws/munin-plugin-couchdb/#open-files > > Graph shows what will happen if you'll compact a lot of databases > continuously for some time. I recall how I with Davis and Adam borrow > into the code looking for the reason where we don't close the file, > but it wasn't easy to find the cause. > > -- > ,,,^..^,,, > > > On Mon, Sep 5, 2016 at 12:32 PM, Robert Samuel Newson > <[email protected]> wrote: >> Hi Nigel, >> >> Thanks for the report. We've seen this issue with R14 series of erlang, >> where calling file:close/1 doesn't always close the file descriptor, so I >> suggest trying something newer (I can vouch for 17.5 from extensive >> production experience). >> >> Can you confirm if this is _every_ compaction or only a subset? If the >> latter, can you estimate what percentage? >> >> You say "at 50%" so I'm inferring you've enabled the compaction daemon, but >> please confirm. >> >> B. >> >> >>> On 5 Sep 2016, at 09:53, Nigel Phippen <[email protected]> wrote: >>> >>> Hello all, >>> >>> This is my first post so please bear with me. >>> >>> I am running CouchDB 1.6.1 (with Erlang is R14B-04.3.el6) on CentOS 6.7. >>> >>> I have multiple databases on our single server, with each database having >>> around a dozen views. Thousands of new documents are added to the databases >>> throughout the day but there are no document deletions (unless done for >>> administrative purposes). Many documents are regularly updated, possibly >>> hundreds of times, leading to documents having multiple versions. Database >>> and view compaction is set to occur at 50%. >>> >>> The problem I am seeing is that, over the course of several days, disk >>> space is being consumed in the volume housing the CouchDB databases. Upon >>> investigation, I can see that CouchDB (or possibly some other process) >>> appears to have moved files to a '/usr/local/var/lib/couchdb/.delete' >>> folder, ready for deletion, but has not actually fully deleted the files. >>> >>> ------------------------------------------------------- >>> # /usr/sbin/lsof +aL1 >>> >>> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME >>> beam.smp 21784 couchdb 19u REG 253,1 12747263 0 157740 >>> /usr/local/var/lib/couchdb/.delete/e3a4de3acbf62f6fe6621c0d584adcee >>> (deleted) >>> beam.smp 21784 couchdb 41u REG 253,1 13292013 0 157757 >>> /usr/local/var/lib/couchdb/.delete/7202d47094b51d60d9a4cc39f448f2c8 >>> (deleted) >>> beam.smp 21784 couchdb 61u REG 253,1 12317183 0 158688 >>> /usr/local/var/lib/couchdb/.delete/518f417167c31921925fe66b11ca85d2 >>> (deleted) >>> beam.smp 21784 couchdb 64u REG 253,1 8471022 0 158669 >>> /usr/local/var/lib/couchdb/.delete/bea3b216976a62912ee79034fc374314 >>> (deleted) >>> beam.smp 21784 couchdb 162u REG 253,1 9097692 0 139109 >>> /usr/local/var/lib/couchdb/.delete/48f75f12d680afbd7ec1c0c3c01ccb99 >>> (deleted) >>> beam.smp 21784 couchdb 168u REG 253,1 8901102 0 155061 >>> /usr/local/var/lib/couchdb/.delete/e5692819be8422a83f675daa1267cc3a >>> (deleted) >>> beam.smp 21784 couchdb 187u REG 253,1 13046253 0 157756 >>> /usr/local/var/lib/couchdb/.delete/8f2cb8517ab7659cc04091cc9db735e8 >>> (deleted) >>> ------------------------------------------------------- >>> >>> Over several days, there can be dozens of these files, consuming GBytes of >>> space. Left unchecked, all disk space in the /usr volume will be consumed, >>> causing the system to fail. The only way to clear out the files for good is >>> to restart the CouchDB service. >>> >>> This appears to be the same problem as reported in >>> https://issues.apache.org/jira/browse/COUCHDB-926 over five years ago. >>> >>> I'd appreciate any assistance is resolving this issue. Please let me know >>> if additional information is required. >>> >>> Many thanks, >>> >>> Nigel. >>> --------------------------------------------------------------------------------------- >>> This email has been scanned for email related threats and delivered safely >>> by Mimecast. >>> For more information please visit http://www.mimecast.com >>> --------------------------------------------------------------------------------------- >>> >>
