Hello,
I have been using biomaRt package from Bioconductor to fetch some
biological annotations. What I have notice this week is that getBM() calls
leak TCP connections (probably via Curl). I have a loop that makes calls
such as:
annotations <- getBM(attributes=attributes,
filter =filter.types,
values =filter.value,
mart =mart)
I can see each request creating a new open connection when I execute this
loop and monitor the open connections using 'lsof' program. The whole loop
crashes after 1000 iterations because that exceeds the limit of allowed
parallel connections. Loops with less than 1000 iterations are completed
with correct results although the connections are left open.
I have also tried to use curl parameter do that I first call:
curlHandle <- getCurlHandle()
then I use this handle for the getBM() call but that does not change
anything. Should I apply some kind of close call each each iteration?
Package: biomaRt
Version: 2.4.0
Packaged: 2010-04-22 22:52:44 UTC; biocbuild
Built: R 2.11.0; ; 2010-04-27 12:27:46 UTC; unix
Package: RCurl
Version: 1.4-3
Date/Publication: 2010-07-25 12:15:39
Built: R 2.11.1; x86_64-pc-linux-gnu; 2010-09-23
10:54:07 UTC; unix
Example output for the COSMIC Biomart:
MART_NAME = "CosmicMart"
MART_HOST = "www.sanger.ac.uk"
MART_PATH = "/genetics/CGP/cosmic/biomart/martservice"
MART_DSET = "COSMIC48"
$ lsof | grep sanger.ac
.
.
.
R 19974 myname 259u IPv4 137937 0t0
TCP myhost:48226->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 260u IPv4 137971 0t0
TCP myhost:48228->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 261u IPv4 137984 0t0
TCP myhost:48230->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 262u IPv4 138004 0t0
TCP myhost:48233->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 263u IPv4 138016 0t0
TCP myhost:48235->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 264u IPv4 138032 0t0
TCP myhost:48239->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 265u IPv4 138077 0t0
TCP myhost:45214->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 266u IPv4 138102 0t0
TCP myhost:45228->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 267u IPv4 138116 0t0
TCP myhost:45230->ssl-slb11b.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 268u IPv4 138123 0t0
TCP myhost:48263->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 269u IPv4 138135 0t0
TCP myhost:48265->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 270u IPv4 138147 0t0
TCP myhost:48267->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 271u IPv4 138185 0t0
TCP myhost:48272->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 272u IPv4 138198 0t0
TCP myhost:48274->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 273u IPv4 138210 0t0
TCP myhost:48276->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 274u IPv4 138226 0t0
TCP myhost:48282->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 275u IPv4 138246 0t0
TCP myhost:48284->ssl-slb11a.sanger.ac.uk:www (CLOSE_WAIT)
R 19974 myname 276u IPv4 138258 0t0
TCP myhost:48286->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R 19974 myname 277u IPv4 138272 0t0
TCP myhost:48288->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R 19974 myname 278u IPv4 138527 0t0
TCP myhost:48290->ssl-slb11a.sanger.ac.uk:www (ESTABLISHED)
R 19974 myname 279u IPv4 138533 0t0
TCP myhost:45259->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
R 19974 myname 280u IPv4 138545 0t0
TCP myhost:45261->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
R 19974 myname 281u IPv4 138557 0t0
TCP myhost:45263->ssl-slb11b.sanger.ac.uk:www (ESTABLISHED)
The final error message that I'll obtain after 1000 open connections is:
[STDERR] Error in value[[3L]](cond) :
[STDERR] Request to BioMart web service failed. Verify if you are still
connected to the internet. Alternatively the BioMart web service is
temporarily down.
[STDERR] Calls: main ... tryCatch -> tryCatchList -> tryCatchOne ->
<Anonymous>
[STDERR] Error during wrapup: cannot open the connection
[STDERR] Execution halted
My R version is 2.11.1 (2010-05-31).
Best regards,
Marko Laakso