[Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Joris Meys
Dear all,

I've noticed by trying to download gz files from here :
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811

At the bottom one can download GSM907811.CEL.gz . If I download this
manually and try

oligo::read.celfiles("GSM907811.CEL.gz")

everything works fine. (oligo is a bioConductor package)

However, if I download using

download.file("
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
",
  destfile = "GSM907811.CEL.gz")

The file is downloaded, but oligo::read.celfiles() returns the following
error:

Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
  End of gz file reached unexpectedly. Perhaps this file is truncated.

Moreover, if I try to delete it after using download.file(), I get a
warning that permission is denied. I can only remove it using Windows file
explorer after I closed the R session, indicating that the connection is
still open. Yet, showConnections() doesn't show any open connections either.

Session info below. Note that I started from a completely fresh R session.
oligo is needed due to the specific file format of these gz files. They're
not standard tarred files.

Cheers
Joris

Session Info
-

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
methods
[9] base

other attached packages:
 [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
oligo_1.44.0
 [4] Biobase_2.39.2 oligoClasses_1.42.0
RSQLite_2.1.0
 [7] Biostrings_2.48.0  XVector_0.19.9
IRanges_2.13.28
[10] S4Vectors_0.17.42  BiocGenerics_0.25.3

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16compiler_3.5.0
 [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
 [5] bitops_1.0-6iterators_1.0.9
 [7] tools_3.5.0 zlibbioc_1.25.0
 [9] digest_0.6.15   bit_1.1-12
[11] memoise_1.1.0   preprocessCore_1.41.0
[13] lattice_0.20-35 ff_2.2-13
[15] pkgconfig_2.0.1 Matrix_1.2-14
[17] foreach_1.4.4   DelayedArray_0.5.31
[19] yaml_2.1.18 GenomeInfoDbData_1.1.0
[21] affxparser_1.52.0   bit64_0.9-7
[23] grid_3.5.0  BiocParallel_1.13.3
[25] blob_1.1.1  codetools_0.2-15
[27] matrixStats_0.53.1  GenomicRanges_1.31.23
[29] splines_3.5.0   SummarizedExperiment_1.9.17
[31] RCurl_1.95-4.10 affyio_1.49.2


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Henrik Bengtsson
Use mode="wb" when you download the file. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.

R core, and others, is there a good argument for why we are not making this
the default download mode? It seems like a such a simple fix to such a
common "mistake".

Henrik

On Thu, May 3, 2018, 00:44 Joris Meys  wrote:

> Dear all,
>
> I've noticed by trying to download gz files from here :
> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811
>
> At the bottom one can download GSM907811.CEL.gz . If I download this
> manually and try
>
> oligo::read.celfiles("GSM907811.CEL.gz")
>
> everything works fine. (oligo is a bioConductor package)
>
> However, if I download using
>
> download.file("
>
> https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
> ",
>   destfile = "GSM907811.CEL.gz")
>
> The file is downloaded, but oligo::read.celfiles() returns the following
> error:
>
> Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
>   End of gz file reached unexpectedly. Perhaps this file is truncated.
>
> Moreover, if I try to delete it after using download.file(), I get a
> warning that permission is denied. I can only remove it using Windows file
> explorer after I closed the R session, indicating that the connection is
> still open. Yet, showConnections() doesn't show any open connections
> either.
>
> Session info below. Note that I started from a completely fresh R session.
> oligo is needed due to the specific file format of these gz files. They're
> not standard tarred files.
>
> Cheers
> Joris
>
> Session Info
>
> -
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows >= 8 x64 (build 9200)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats4parallel  stats graphics  grDevices utils datasets
> methods
> [9] base
>
> other attached packages:
>  [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
> oligo_1.44.0
>  [4] Biobase_2.39.2 oligoClasses_1.42.0
> RSQLite_2.1.0
>  [7] Biostrings_2.48.0  XVector_0.19.9
> IRanges_2.13.28
> [10] S4Vectors_0.17.42  BiocGenerics_0.25.3
>
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.16compiler_3.5.0
>  [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
>  [5] bitops_1.0-6iterators_1.0.9
>  [7] tools_3.5.0 zlibbioc_1.25.0
>  [9] digest_0.6.15   bit_1.1-12
> [11] memoise_1.1.0   preprocessCore_1.41.0
> [13] lattice_0.20-35 ff_2.2-13
> [15] pkgconfig_2.0.1 Matrix_1.2-14
> [17] foreach_1.4.4   DelayedArray_0.5.31
> [19] yaml_2.1.18 GenomeInfoDbData_1.1.0
> [21] affxparser_1.52.0   bit64_0.9-7
> [23] grid_3.5.0  BiocParallel_1.13.3
> [25] blob_1.1.1  codetools_0.2-15
> [27] matrixStats_0.53.1  GenomicRanges_1.31.23
> [29] splines_3.5.0   SummarizedExperiment_1.9.17
> [31] RCurl_1.95-4.10 affyio_1.49.2
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
> <
> https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g
> >
>
> ---
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Martin Morgan



On 05/02/2018 03:21 PM, Joris Meys wrote:

Dear all,

I've noticed by trying to download gz files from here :
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811

At the bottom one can download GSM907811.CEL.gz . If I download this
manually and try

oligo::read.celfiles("GSM907811.CEL.gz")

everything works fine. (oligo is a bioConductor package)

However, if I download using

download.file("
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
",
   destfile = "GSM907811.CEL.gz")


On windows, the 'mode' argument to download.file() needs to be "wb" 
(write binary) for binary files.


Martin



The file is downloaded, but oligo::read.celfiles() returns the following
error:

Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
   End of gz file reached unexpectedly. Perhaps this file is truncated.

Moreover, if I try to delete it after using download.file(), I get a
warning that permission is denied. I can only remove it using Windows file
explorer after I closed the R session, indicating that the connection is
still open. Yet, showConnections() doesn't show any open connections either.

Session info below. Note that I started from a completely fresh R session.
oligo is needed due to the specific file format of these gz files. They're
not standard tarred files.

Cheers
Joris

Session Info
-

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
methods
[9] base

other attached packages:
  [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
oligo_1.44.0
  [4] Biobase_2.39.2 oligoClasses_1.42.0
RSQLite_2.1.0
  [7] Biostrings_2.48.0  XVector_0.19.9
IRanges_2.13.28
[10] S4Vectors_0.17.42  BiocGenerics_0.25.3

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.16compiler_3.5.0
  [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
  [5] bitops_1.0-6iterators_1.0.9
  [7] tools_3.5.0 zlibbioc_1.25.0
  [9] digest_0.6.15   bit_1.1-12
[11] memoise_1.1.0   preprocessCore_1.41.0
[13] lattice_0.20-35 ff_2.2-13
[15] pkgconfig_2.0.1 Matrix_1.2-14
[17] foreach_1.4.4   DelayedArray_0.5.31
[19] yaml_2.1.18 GenomeInfoDbData_1.1.0
[21] affxparser_1.52.0   bit64_0.9-7
[23] grid_3.5.0  BiocParallel_1.13.3
[25] blob_1.1.1  codetools_0.2-15
[27] matrixStats_0.53.1  GenomicRanges_1.31.23
[29] splines_3.5.0   SummarizedExperiment_1.9.17
[31] RCurl_1.95-4.10 affyio_1.49.2





This email message may contain legally privileged and/or...{{dropped:2}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Joris Meys
Using the correct mode absolutely solves it. Apologies for not trying the
obvious.

Cheers
Joris

On Thu, May 3, 2018 at 2:10 PM, Martin Morgan  wrote:

>
>
> On 05/02/2018 03:21 PM, Joris Meys wrote:
>
>> Dear all,
>>
>> I've noticed by trying to download gz files from here :
>> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811
>>
>> At the bottom one can download GSM907811.CEL.gz . If I download this
>> manually and try
>>
>> oligo::read.celfiles("GSM907811.CEL.gz")
>>
>> everything works fine. (oligo is a bioConductor package)
>>
>> However, if I download using
>>
>> download.file("
>> https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&for
>> mat=file&file=GSM907811%2ECEL%2Egz
>> ",
>>destfile = "GSM907811.CEL.gz")
>>
>
> On windows, the 'mode' argument to download.file() needs to be "wb" (write
> binary) for binary files.
>
> Martin
>
>
>> The file is downloaded, but oligo::read.celfiles() returns the following
>> error:
>>
>> Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
>>End of gz file reached unexpectedly. Perhaps this file is truncated.
>>
>> Moreover, if I try to delete it after using download.file(), I get a
>> warning that permission is denied. I can only remove it using Windows file
>> explorer after I closed the R session, indicating that the connection is
>> still open. Yet, showConnections() doesn't show any open connections
>> either.
>>
>> Session info below. Note that I started from a completely fresh R session.
>> oligo is needed due to the specific file format of these gz files. They're
>> not standard tarred files.
>>
>> Cheers
>> Joris
>>
>> Session Info
>> 
>> -
>>
>> R version 3.5.0 (2018-04-23)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows >= 8 x64 (build 9200)
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>> Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] stats4parallel  stats graphics  grDevices utils datasets
>> methods
>> [9] base
>>
>> other attached packages:
>>   [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
>> oligo_1.44.0
>>   [4] Biobase_2.39.2 oligoClasses_1.42.0
>> RSQLite_2.1.0
>>   [7] Biostrings_2.48.0  XVector_0.19.9
>> IRanges_2.13.28
>> [10] S4Vectors_0.17.42  BiocGenerics_0.25.3
>>
>> loaded via a namespace (and not attached):
>>   [1] Rcpp_0.12.16compiler_3.5.0
>>   [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
>>   [5] bitops_1.0-6iterators_1.0.9
>>   [7] tools_3.5.0 zlibbioc_1.25.0
>>   [9] digest_0.6.15   bit_1.1-12
>> [11] memoise_1.1.0   preprocessCore_1.41.0
>> [13] lattice_0.20-35 ff_2.2-13
>> [15] pkgconfig_2.0.1 Matrix_1.2-14
>> [17] foreach_1.4.4   DelayedArray_0.5.31
>> [19] yaml_2.1.18 GenomeInfoDbData_1.1.0
>> [21] affxparser_1.52.0   bit64_0.9-7
>> [23] grid_3.5.0  BiocParallel_1.13.3
>> [25] blob_1.1.1  codetools_0.2-15
>> [27] matrixStats_0.53.1  GenomicRanges_1.31.23
>> [29] splines_3.5.0   SummarizedExperiment_1.9.17
>> [31] RCurl_1.95-4.10 affyio_1.49.2
>>
>>
>>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>



-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Joris Meys
Dear all,

I've been diving a bit deeper into this per request of Tomas Kalibra, and
found the following :

- the lock on the file is only after trying to read it using oligo, so
that's not a R problem in itself. The problem is independent of extrenal
packages.

- using Windows' fc utility and cygwin's cmp utility I found out that every
so often the download.file() function inserts an extra byte. There's no
real obvious pattern in how these bytes are added, but the file downloaded
using download.file() is actually larger (in this case by about 8 kb). The
file xxx_inR.CEL.gz is read in using:

setwd("E:/Temp/genexpr/Compare")
id <- "GSM907854"
flink <- paste0("
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907854&format=file&file=GSM907854%2ECEL%2Egz
")
fname <- paste0(id,"_inR.CEL.gz")
download.file(flink,
  destfile = fname)

The file xxx_direct.CEL.gz is downloaded from
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907854 (download link
at the bottom of the page).

Output of dir in CMD:

05/03/2018  11:02 AM 4,529,547 GSM907854_direct.CEL.gz
05/03/2018  11:17 AM 4,537,668 GSM907854_inR.CEL.gz

or from R :

> diff(file.size(dir())) # contains both CEL files.
[1] 8121

Strangely enough I get the following message from download.file() :

Content type 'application/octet-stream' length 4529547 bytes (4.3 MB)
downloaded 4.3 MB

So the reported length is exactly the same as if I would download the file
directly, but the file on disk itself is larger. So it seems
download.file() is adding bytes when saving the data on disk.  This
behaviour is independent of antivirus and/or firewalls turned on or off.

Also keep in mind that these are NOT standard gzipped files. These files
are a specific format for Affymetrix Human Gene 1.0 ST Arrays.

If I need to run other tests, please let me know.
Kind regards

Joris

On Wed, May 2, 2018 at 9:21 PM, Joris Meys  wrote:

> Dear all,
>
> I've noticed by trying to download gz files from here :
> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811
>
> At the bottom one can download GSM907811.CEL.gz . If I download this
> manually and try
>
> oligo::read.celfiles("GSM907811.CEL.gz")
>
> everything works fine. (oligo is a bioConductor package)
>
> However, if I download using
>
> download.file("https://www.ncbi.nlm.nih.gov/geo/download/
> ?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz",
>   destfile = "GSM907811.CEL.gz")
>
> The file is downloaded, but oligo::read.celfiles() returns the following
> error:
>
> Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
>   End of gz file reached unexpectedly. Perhaps this file is truncated.
>
> Moreover, if I try to delete it after using download.file(), I get a
> warning that permission is denied. I can only remove it using Windows file
> explorer after I closed the R session, indicating that the connection is
> still open. Yet, showConnections() doesn't show any open connections either.
>
> Session info below. Note that I started from a completely fresh R session.
> oligo is needed due to the specific file format of these gz files. They're
> not standard tarred files.
>
> Cheers
> Joris
>
> Session Info
> 
> -
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows >= 8 x64 (build 9200)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats4parallel  stats graphics  grDevices utils datasets
> methods
> [9] base
>
> other attached packages:
>  [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
> oligo_1.44.0
>  [4] Biobase_2.39.2 oligoClasses_1.42.0
> RSQLite_2.1.0
>  [7] Biostrings_2.48.0  XVector_0.19.9
> IRanges_2.13.28
> [10] S4Vectors_0.17.42  BiocGenerics_0.25.3
>
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.16compiler_3.5.0
>  [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
>  [5] bitops_1.0-6iterators_1.0.9
>  [7] tools_3.5.0 zlibbioc_1.25.0
>  [9] digest_0.6.15   bit_1.1-12
> [11] memoise_1.1.0   preprocessCore_1.41.0
> [13] lattice_0.20-35 ff_2.2-13
> [15] pkgconfig_2.0.1 Matrix_1.2-14
> [17] foreach_1.4.4   DelayedArray_0.5.31
> [19] yaml_2.1.18 GenomeInfoDbData_1.1.0
> [21] affxparser_1.52.0   bit64_0.9-7
> [23] grid_3.5.0  BiocParallel_1.13.3
> [25] blob_1.1.1  codetools_0.2-15
> [27] matrixStats_0.53.1  GenomicRanges_1.31.23
> [29] splines_3.5.0   SummarizedExperiment_1.9.17
> [31] RCurl_1.95-4.10 affyio_1.49.2
>
>
> --
> Joris Meys
> Statistical consultant
>
> Depart

Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Duncan Murdoch

On 03/05/2018 8:42 AM, Henrik Bengtsson wrote:

Use mode="wb" when you download the file. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.

R core, and others, is there a good argument for why we are not making this
the default download mode? It seems like a such a simple fix to such a
common "mistake".


Many downloads are text files (HTML, CSV, etc.), and if those are 
downloaded in binary, a Windows user might end up with a file that 
Notepad can't handle, because it would have Unix-style line endings.
(It's possible Notepad no longer requires CR LF endings; I haven't used 
it in years.  But there are probably other brain-dead Windows programs 
that do.)


Duncan Murdoch




Henrik

On Thu, May 3, 2018, 00:44 Joris Meys  wrote:


Dear all,

I've noticed by trying to download gz files from here :
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811

At the bottom one can download GSM907811.CEL.gz . If I download this
manually and try

oligo::read.celfiles("GSM907811.CEL.gz")

everything works fine. (oligo is a bioConductor package)

However, if I download using

download.file("

https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
",
   destfile = "GSM907811.CEL.gz")

The file is downloaded, but oligo::read.celfiles() returns the following
error:

Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
   End of gz file reached unexpectedly. Perhaps this file is truncated.

Moreover, if I try to delete it after using download.file(), I get a
warning that permission is denied. I can only remove it using Windows file
explorer after I closed the R session, indicating that the connection is
still open. Yet, showConnections() doesn't show any open connections
either.

Session info below. Note that I started from a completely fresh R session.
oligo is needed due to the specific file format of these gz files. They're
not standard tarred files.

Cheers
Joris

Session Info

-

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
methods
[9] base

other attached packages:
  [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
oligo_1.44.0
  [4] Biobase_2.39.2 oligoClasses_1.42.0
RSQLite_2.1.0
  [7] Biostrings_2.48.0  XVector_0.19.9
IRanges_2.13.28
[10] S4Vectors_0.17.42  BiocGenerics_0.25.3

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.16compiler_3.5.0
  [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
  [5] bitops_1.0-6iterators_1.0.9
  [7] tools_3.5.0 zlibbioc_1.25.0
  [9] digest_0.6.15   bit_1.1-12
[11] memoise_1.1.0   preprocessCore_1.41.0
[13] lattice_0.20-35 ff_2.2-13
[15] pkgconfig_2.0.1 Matrix_1.2-14
[17] foreach_1.4.4   DelayedArray_0.5.31
[19] yaml_2.1.18 GenomeInfoDbData_1.1.0
[21] affxparser_1.52.0   bit64_0.9-7
[23] grid_3.5.0  BiocParallel_1.13.3
[25] blob_1.1.1  codetools_0.2-15
[27] matrixStats_0.53.1  GenomicRanges_1.31.23
[29] splines_3.5.0   SummarizedExperiment_1.9.17
[31] RCurl_1.95-4.10 affyio_1.49.2


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<
https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g




---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

 [[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Martin Morgan


On 05/03/2018 05:48 AM, Joris Meys wrote:

Dear all,

I've been diving a bit deeper into this per request of Tomas Kalibra, and
found the following :

- the lock on the file is only after trying to read it using oligo, so
that's not a R problem in itself. The problem is independent of extrenal
packages.

- using Windows' fc utility and cygwin's cmp utility I found out that every
so often the download.file() function inserts an extra byte. There's no
real obvious pattern in how these bytes are added, but the file downloaded
using download.file() is actually larger (in this case by about 8 kb). The
file xxx_inR.CEL.gz is read in using:


I believe the difference in mode = "w" vs "wb", and the reason this is 
restricted to Windows downloads, is due to the difference in text file 
line endings, where with mode="w", download.file (and many other 
utilities outside R) recognize the "foo\n" as "foo\r\n". Obviously this 
messes up binary files.


I guess in the CEL.gz file there are about 8k "\n" characters.

Henrik's suggestion (default = "wb") would introduce the complementary 
problem -- text files would have incorrect line endings.


Martin





setwd("E:/Temp/genexpr/Compare")
id <- "GSM907854"
flink <- paste0("
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907854&format=file&file=GSM907854%2ECEL%2Egz
")
fname <- paste0(id,"_inR.CEL.gz")
download.file(flink,
   destfile = fname)

The file xxx_direct.CEL.gz is downloaded from
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907854 (download link
at the bottom of the page).

Output of dir in CMD:

05/03/2018  11:02 AM 4,529,547 GSM907854_direct.CEL.gz
05/03/2018  11:17 AM 4,537,668 GSM907854_inR.CEL.gz

or from R :


diff(file.size(dir())) # contains both CEL files.

[1] 8121

Strangely enough I get the following message from download.file() :

Content type 'application/octet-stream' length 4529547 bytes (4.3 MB)
downloaded 4.3 MB

So the reported length is exactly the same as if I would download the file
directly, but the file on disk itself is larger. So it seems
download.file() is adding bytes when saving the data on disk.  This
behaviour is independent of antivirus and/or firewalls turned on or off.

Also keep in mind that these are NOT standard gzipped files. These files
are a specific format for Affymetrix Human Gene 1.0 ST Arrays.

If I need to run other tests, please let me know.
Kind regards

Joris

On Wed, May 2, 2018 at 9:21 PM, Joris Meys  wrote:


Dear all,

I've noticed by trying to download gz files from here :
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811

At the bottom one can download GSM907811.CEL.gz . If I download this
manually and try

oligo::read.celfiles("GSM907811.CEL.gz")

everything works fine. (oligo is a bioConductor package)

However, if I download using

download.file("https://www.ncbi.nlm.nih.gov/geo/download/
?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz",
   destfile = "GSM907811.CEL.gz")

The file is downloaded, but oligo::read.celfiles() returns the following
error:

Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
   End of gz file reached unexpectedly. Perhaps this file is truncated.

Moreover, if I try to delete it after using download.file(), I get a
warning that permission is denied. I can only remove it using Windows file
explorer after I closed the R session, indicating that the connection is
still open. Yet, showConnections() doesn't show any open connections either.

Session info below. Note that I started from a completely fresh R session.
oligo is needed due to the specific file format of these gz files. They're
not standard tarred files.

Cheers
Joris

Session Info

-

R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C

[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4parallel  stats graphics  grDevices utils datasets
methods
[9] base

other attached packages:
  [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
oligo_1.44.0
  [4] Biobase_2.39.2 oligoClasses_1.42.0
RSQLite_2.1.0
  [7] Biostrings_2.48.0  XVector_0.19.9
IRanges_2.13.28
[10] S4Vectors_0.17.42  BiocGenerics_0.25.3

loaded via a namespace (and not attached):
  [1] Rcpp_0.12.16compiler_3.5.0
  [3] BiocInstaller_1.30.0GenomeInfoDb_1.15.5
  [5] bitops_1.0-6iterators_1.0.9
  [7] tools_3.5.0 zlibbioc_1.25.0
  [9] digest_0.6.15   bit_1.1-12
[11] memoise_1.1.0   preprocessCore_1.41.0
[13] lattice_0.20-35 ff_2.2-13
[15] pkgconfig_2.0.1 Matrix_1.2-14
[17] foreach_1.4.4   DelayedArray_0.5.31

Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Jeroen Ooms
On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
 wrote:
> Use mode="wb" when you download the file. See
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
>
> R core, and others, is there a good argument for why we are not making this
> the default download mode? It seems like a such a simple fix to such a
> common "mistake".

I'd like to second this feature request. This default behaviour is
unexpected and often leads to r scripts that were written on
mac/linux, to produce corrupted files on windows, checksum mismatches,
etc.

Even for text files, the default should be to download the file as-is.
Trying to "fix" line-endings should be opt-in, never the default.
Downloading a file via a browser or ftp client on windows also doesn't
change the file, why should R?


On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch  wrote:
> Many downloads are text files (HTML, CSV, etc.), and if those are downloaded
> in binary, a Windows user might end up with a file that Notepad can't
> handle, because it would have Unix-style line endings.

True but I don't think this is relevant. The same holds e.g. for the R
files in source packages, which also have unix line endings. Most
Windows users will use an actual editor that understands both types of
line endings, or can convert between the two.

Downloading-file should do just that.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Joris Meys
Thank you Henrik and Martin for explaining what was going on. Very
insightful!

On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms  wrote:

> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
>  wrote:
> > Use mode="wb" when you download the file. See
> > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
> >
> > R core, and others, is there a good argument for why we are not making
> this
> > the default download mode? It seems like a such a simple fix to such a
> > common "mistake".
>
> I'd like to second this feature request. This default behaviour is
> unexpected and often leads to r scripts that were written on
> mac/linux, to produce corrupted files on windows, checksum mismatches,
> etc.
>
> Even for text files, the default should be to download the file as-is.
> Trying to "fix" line-endings should be opt-in, never the default.
> Downloading a file via a browser or ftp client on windows also doesn't
> change the file, why should R?
>

I third the feature request.


>
>
> On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch 
> wrote:
> > Many downloads are text files (HTML, CSV, etc.), and if those are
> downloaded
> > in binary, a Windows user might end up with a file that Notepad can't
> > handle, because it would have Unix-style line endings.
>
> True but I don't think this is relevant. The same holds e.g. for the R
> files in source packages, which also have unix line endings. Most
> Windows users will use an actual editor that understands both types of
> line endings, or can convert between the two.
>
> Downloading-file should do just that.
>

Again, I agree. In my (limited) experience the only program that fails to
properly display \n as a line ending, is Notepad. But it can still open the
file regardless. If line ending conflicts cause bugs, it's almost always a
unix-like OS struggling with Windows-style endings. I have yet to meet the
first one the other way around.

Cheers
Joris


-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)


---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] length of `...`

2018-05-03 Thread Dénes Tóth

Hi,


In some cases the number of arguments passed as ... must be determined 
inside a function, without evaluating the arguments themselves. I use 
the following construct:


dotlength <- function(...) length(substitute(expression(...))) - 1L

# Usage (returns 3):
dotlength(1, 4, something = undefined)

How can I define a method for length() which could be called directly on 
`...`? Or is it an intention to extend the base length() function to 
accept ellipses?



Regards,
Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Mark van der Loo
This question is better aimed at the r-help mailinglist as it is not about
developing R itself.


having said that,

I can only gues why you want to do this, but why not do something like this:


f <- function(...){
   L <- list(...)
   len <- length()
  # you can stll pass the ... as follows:
  do.call(someotherfunction, L)

}


-Mark

Op do 3 mei 2018 om 16:29 schreef Dénes Tóth :

> Hi,
>
>
> In some cases the number of arguments passed as ... must be determined
> inside a function, without evaluating the arguments themselves. I use
> the following construct:
>
> dotlength <- function(...) length(substitute(expression(...))) - 1L
>
> # Usage (returns 3):
> dotlength(1, 4, something = undefined)
>
> How can I define a method for length() which could be called directly on
> `...`? Or is it an intention to extend the base length() function to
> accept ellipses?
>
>
> Regards,
> Denes
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposed speedup of ifelse

2018-05-03 Thread Radford Neal
> I propose a patch to ifelse that leverages anyNA(test) to achieve an
> improvement in performance. For a test vector of length 10, the change
> nearly halves the time taken and for a test of length 1 million, there
> is a tenfold increase in speed. Even for small vectors, the
> distributions of timings between the old and the proposed ifelse do
> not intersect.

For smaller vectors, your results are significantly affected by your
invoking the old version via base::ifelse.  You could try defining
your new version as new_ifelse, and invoking the old version as just
ifelse.  There might still be some issues with the two versions having
different context w.r.t environments, and hence looking up functions
in different ways.  You could copy the code of the old version and
define it in the global environment just like new_ifelse.

When using ifelse rather than base::ifelse, it seems the new version
is slower for vectors of length 10, but faster for long vectors.

Also, I'd use system.time rather than microbenchmark.  The latter will
mix invocations of the two functions in a way where it is unclear that
garbage collection time will be fairly attributed.  Also, it's a bit
silly to plot the distributions of times, which will mostly reflect
variations in when garbage collections at various levels occur - just
the mean is what is relevant.

Regards,

   Radford Neal

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread William Dunlap via R-devel
In R-3.5.0 you can use ...length():
  > f <- function(..., n) ...length()
  > f(stop("one"), stop("two"), stop("three"), n=7)
  [1] 3

Prior to that substitute() is the way to go
  > g <- function(..., n) length(substitute(...()))
  > g(stop("one"), stop("two"), stop("three"), n=7)
  [1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
  > fn <- function(..., n) ...elt(n)
  > fn(stop("one"), 3*5, stop("three"), n=2)
  [1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
  > gn <- function(..., n) {
  +   nthExpr <- substitute(...())[[n]]
  +   eval(nthExpr, envir=parent.frame())
  + }
  > gn(stop("one"), environment(), stop("two"), n=2)
  




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 3, 2018 at 7:29 AM, Dénes Tóth  wrote:

> Hi,
>
>
> In some cases the number of arguments passed as ... must be determined
> inside a function, without evaluating the arguments themselves. I use the
> following construct:
>
> dotlength <- function(...) length(substitute(expression(...))) - 1L
>
> # Usage (returns 3):
> dotlength(1, 4, something = undefined)
>
> How can I define a method for length() which could be called directly on
> `...`? Or is it an intention to extend the base length() function to accept
> ellipses?
>
>
> Regards,
> Denes
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Gabe Becker
As of 3.5.0 the ...length() function does exactly what you are asking for.
Before that, I don't know of an easy way to get the length without
evaluation via R code. There may be one I'm not thinking of though, I
haven't needed to do this myself.

Hope that helps.

~G

On Thu, May 3, 2018 at 7:52 AM, Mark van der Loo 
wrote:

> This question is better aimed at the r-help mailinglist as it is not about
> developing R itself.
>
>
> having said that,
>
> I can only gues why you want to do this, but why not do something like
> this:
>
>
> f <- function(...){
>L <- list(...)
>len <- length()
>   # you can stll pass the ... as follows:
>   do.call(someotherfunction, L)
>
> }
>
>
> -Mark
>
> Op do 3 mei 2018 om 16:29 schreef Dénes Tóth :
>
> > Hi,
> >
> >
> > In some cases the number of arguments passed as ... must be determined
> > inside a function, without evaluating the arguments themselves. I use
> > the following construct:
> >
> > dotlength <- function(...) length(substitute(expression(...))) - 1L
> >
> > # Usage (returns 3):
> > dotlength(1, 4, something = undefined)
> >
> > How can I define a method for length() which could be called directly on
> > `...`? Or is it an intention to extend the base length() function to
> > accept ellipses?
> >
> >
> > Regards,
> > Denes
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Tóth Dénes
Thank you Bill (and the R-Core Team), this is even more than what I 
thought of. I somehow missed this in the NEWS.


BTW, substitue(...()) is beautiful.


On 05/03/2018 05:01 PM, William Dunlap wrote:

In R-3.5.0 you can use ...length():
   > f <- function(..., n) ...length()
   > f(stop("one"), stop("two"), stop("three"), n=7)
   [1] 3

Prior to that substitute() is the way to go
   > g <- function(..., n) length(substitute(...()))
   > g(stop("one"), stop("two"), stop("three"), n=7)
   [1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
   > fn <- function(..., n) ...elt(n)
   > fn(stop("one"), 3*5, stop("three"), n=2)
   [1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
   > gn <- function(..., n) {
   +   nthExpr <- substitute(...())[[n]]
   +   eval(nthExpr, envir=parent.frame())
   + }
   > gn(stop("one"), environment(), stop("two"), n=2)
   




Bill Dunlap
TIBCO Software
wdunlap tibco.com 

On Thu, May 3, 2018 at 7:29 AM, Dénes Tóth > wrote:


Hi,


In some cases the number of arguments passed as ... must be
determined inside a function, without evaluating the arguments
themselves. I use the following construct:

dotlength <- function(...) length(substitute(expression(...))) - 1L

# Usage (returns 3):
dotlength(1, 4, something = undefined)

How can I define a method for length() which could be called
directly on `...`? Or is it an intention to extend the base length()
function to accept ellipses?


Regards,
Denes

__
R-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Dr. Tóth Dénes ügyvezető
Kogentum Kft.
Tel.: 06-30-2583723
Web: www.kogentum.hu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Duncan Murdoch

On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:

In R-3.5.0 you can use ...length():
   > f <- function(..., n) ...length()
   > f(stop("one"), stop("two"), stop("three"), n=7)
   [1] 3

Prior to that substitute() is the way to go
   > g <- function(..., n) length(substitute(...()))
   > g(stop("one"), stop("two"), stop("three"), n=7)
   [1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
   > fn <- function(..., n) ...elt(n)
   > fn(stop("one"), 3*5, stop("three"), n=2)
   [1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
   > gn <- function(..., n) {
   +   nthExpr <- substitute(...())[[n]]
   +   eval(nthExpr, envir=parent.frame())
   + }
   > gn(stop("one"), environment(), stop("two"), n=2)
   



Bill, the last of these doesn't quite work, because ... can be passed 
down through a string of callers.  You don't necessarily want to 
evaluate it in the parent.frame().  For example:


x <- "global"
f <- function(...) {
  x <- "f"
  g(...)
}
g <- function(...) {
  firstExpr <- substitute(...())[[1]]
  c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
}

Calling g(x) correctly prints "global" twice, but calling f(x) 
incorrectly prints


[1] "global" "f"

You can get the first element of ... without evaluating the rest using 
..1, but I don't know a way to do this for general n in pre-3.5.0 base R.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread peter dalgaard


> On 3 May 2018, at 16:52 , Mark van der Loo  wrote:
> 
> This question is better aimed at the r-help mailinglist as it is not about
> developing R itself.

Um, no... People there might well send you back here.

As for the original question, there are also variations over

dddlen <- function(...)length(match.call(expand.dots=FALSE)$...)



-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Martin Morgan

nargs() provides the number of arguments without evaluating them

> f = function(x, ..., y) nargs()
> f()
[1] 0
> f(a=1, b=2)
[1] 2
> f(1, a=1, b=2)
[1] 3
> f(x=1, a=1, b=2)
[1] 3
> f(stop())
[1] 1


On 05/03/2018 11:01 AM, William Dunlap via R-devel wrote:

In R-3.5.0 you can use ...length():
   > f <- function(..., n) ...length()
   > f(stop("one"), stop("two"), stop("three"), n=7)
   [1] 3

Prior to that substitute() is the way to go
   > g <- function(..., n) length(substitute(...()))
   > g(stop("one"), stop("two"), stop("three"), n=7)
   [1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
   > fn <- function(..., n) ...elt(n)
   > fn(stop("one"), 3*5, stop("three"), n=2)
   [1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
   > gn <- function(..., n) {
   +   nthExpr <- substitute(...())[[n]]
   +   eval(nthExpr, envir=parent.frame())
   + }
   > gn(stop("one"), environment(), stop("two"), n=2)
   




Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, May 3, 2018 at 7:29 AM, Dénes Tóth  wrote:


Hi,


In some cases the number of arguments passed as ... must be determined
inside a function, without evaluating the arguments themselves. I use the
following construct:

dotlength <- function(...) length(substitute(expression(...))) - 1L

# Usage (returns 3):
dotlength(1, 4, something = undefined)

How can I define a method for length() which could be called directly on
`...`? Or is it an intention to extend the base length() function to accept
ellipses?


Regards,
Denes

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham
On Thu, May 3, 2018 at 8:00 AM, Gabe Becker  wrote:
> As of 3.5.0 the ...length() function does exactly what you are asking for.
> Before that, I don't know of an easy way to get the length without
> evaluation via R code. There may be one I'm not thinking of though, I
> haven't needed to do this myself.

dotlength <- function(...) length(nargs())

?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham
On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch  wrote:
> On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:
>>
>> In R-3.5.0 you can use ...length():
>>> f <- function(..., n) ...length()
>>> f(stop("one"), stop("two"), stop("three"), n=7)
>>[1] 3
>>
>> Prior to that substitute() is the way to go
>>> g <- function(..., n) length(substitute(...()))
>>> g(stop("one"), stop("two"), stop("three"), n=7)
>>[1] 3
>>
>> R-3.5.0 also has the ...elt(n) function, which returns
>> the evaluated n'th entry in ... , without evaluating the
>> other ... entries.
>>> fn <- function(..., n) ...elt(n)
>>> fn(stop("one"), 3*5, stop("three"), n=2)
>>[1] 15
>>
>> Prior to 3.5.0, eval the appropriate component of the output
>> of substitute() in the appropriate environment:
>>> gn <- function(..., n) {
>>+   nthExpr <- substitute(...())[[n]]
>>+   eval(nthExpr, envir=parent.frame())
>>+ }
>>> gn(stop("one"), environment(), stop("two"), n=2)
>>
>>
>
> Bill, the last of these doesn't quite work, because ... can be passed down
> through a string of callers.  You don't necessarily want to evaluate it in
> the parent.frame().  For example:
>
> x <- "global"
> f <- function(...) {
>   x <- "f"
>   g(...)
> }
> g <- function(...) {
>   firstExpr <- substitute(...())[[1]]
>   c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
> }
>
> Calling g(x) correctly prints "global" twice, but calling f(x) incorrectly
> prints
>
> [1] "global" "f"
>
> You can get the first element of ... without evaluating the rest using ..1,
> but I don't know a way to do this for general n in pre-3.5.0 base R.

If you don't mind using a package:

# works with R 3.1 and up
library(rlang)

x <- "global"
f <- function(...) {
  x <- "f"
  g(...)
}
g <- function(...) {
  dots <- enquos(...)
  eval_tidy(dots[[1]])
}

f(x, stop("!"))
#> [1] "global"
g(x, stop("!"))
#> [1] "global"

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham
On Thu, May 3, 2018 at 8:28 AM, Hadley Wickham  wrote:
> On Thu, May 3, 2018 at 8:00 AM, Gabe Becker  wrote:
>> As of 3.5.0 the ...length() function does exactly what you are asking for.
>> Before that, I don't know of an easy way to get the length without
>> evaluation via R code. There may be one I'm not thinking of though, I
>> haven't needed to do this myself.
>
> dotlength <- function(...) length(nargs())
>
> ?

Oops, I got a bit overzealous there: I mean

dotlength <- function(...) nargs()

(This is subtly different from calling nargs() directly as it will
only count the elements in ...)

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Hervé Pagès

Hi,

It would be great if one of the experts could comment on the
difference between Hadley's dotlength and ...length? The fact
that someone bothered to implement a new primitive for that
when there seems to be a very simple and straightforward R-only
solution suggests that there might be some gotchas/pitfalls with
the R-only solution.

Thanks,
H.


On 05/03/2018 08:34 AM, Hadley Wickham wrote:

On Thu, May 3, 2018 at 8:18 AM, Duncan Murdoch  wrote:

On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:


In R-3.5.0 you can use ...length():
> f <- function(..., n) ...length()
> f(stop("one"), stop("two"), stop("three"), n=7)
[1] 3

Prior to that substitute() is the way to go
> g <- function(..., n) length(substitute(...()))
> g(stop("one"), stop("two"), stop("three"), n=7)
[1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
> fn <- function(..., n) ...elt(n)
> fn(stop("one"), 3*5, stop("three"), n=2)
[1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
> gn <- function(..., n) {
+   nthExpr <- substitute(...())[[n]]
+   eval(nthExpr, envir=parent.frame())
+ }
> gn(stop("one"), environment(), stop("two"), n=2)




Bill, the last of these doesn't quite work, because ... can be passed down
through a string of callers.  You don't necessarily want to evaluate it in
the parent.frame().  For example:

x <- "global"
f <- function(...) {
   x <- "f"
   g(...)
}
g <- function(...) {
   firstExpr <- substitute(...())[[1]]
   c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
}

Calling g(x) correctly prints "global" twice, but calling f(x) incorrectly
prints

[1] "global" "f"

You can get the first element of ... without evaluating the rest using ..1,
but I don't know a way to do this for general n in pre-3.5.0 base R.


If you don't mind using a package:

# works with R 3.1 and up
library(rlang)

x <- "global"
f <- function(...) {
   x <- "f"
   g(...)
}
g <- function(...) {
   dots <- enquos(...)
   eval_tidy(dots[[1]])
}

f(x, stop("!"))
#> [1] "global"
g(x, stop("!"))
#> [1] "global"

Hadley



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Duncan Murdoch

On 03/05/2018 11:18 AM, Duncan Murdoch wrote:

On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:

In R-3.5.0 you can use ...length():
> f <- function(..., n) ...length()
> f(stop("one"), stop("two"), stop("three"), n=7)
[1] 3

Prior to that substitute() is the way to go
> g <- function(..., n) length(substitute(...()))
> g(stop("one"), stop("two"), stop("three"), n=7)
[1] 3

R-3.5.0 also has the ...elt(n) function, which returns
the evaluated n'th entry in ... , without evaluating the
other ... entries.
> fn <- function(..., n) ...elt(n)
> fn(stop("one"), 3*5, stop("three"), n=2)
[1] 15

Prior to 3.5.0, eval the appropriate component of the output
of substitute() in the appropriate environment:
> gn <- function(..., n) {
+   nthExpr <- substitute(...())[[n]]
+   eval(nthExpr, envir=parent.frame())
+ }
> gn(stop("one"), environment(), stop("two"), n=2)




Bill, the last of these doesn't quite work, because ... can be passed
down through a string of callers.  You don't necessarily want to
evaluate it in the parent.frame().  For example:

x <- "global"
f <- function(...) {
x <- "f"
g(...)
}
g <- function(...) {
firstExpr <- substitute(...())[[1]]
c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
}

Calling g(x) correctly prints "global" twice, but calling f(x)
incorrectly prints

[1] "global" "f"

You can get the first element of ... without evaluating the rest using
..1, but I don't know a way to do this for general n in pre-3.5.0 base R.


Here's a way to do that:

eval(as.name(paste0("..", n)))

I was surprised this worked for n > 9, but it does.  Looking at the 
source, I think the largest legal value for n is huge; you'd hit other 
limits long before n was too big.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Hadley Wickham
On Thu, May 3, 2018 at 9:50 AM, Duncan Murdoch  wrote:
> On 03/05/2018 11:18 AM, Duncan Murdoch wrote:
>>
>> On 03/05/2018 11:01 AM, William Dunlap via R-devel wrote:
>>>
>>> In R-3.5.0 you can use ...length():
>>> > f <- function(..., n) ...length()
>>> > f(stop("one"), stop("two"), stop("three"), n=7)
>>> [1] 3
>>>
>>> Prior to that substitute() is the way to go
>>> > g <- function(..., n) length(substitute(...()))
>>> > g(stop("one"), stop("two"), stop("three"), n=7)
>>> [1] 3
>>>
>>> R-3.5.0 also has the ...elt(n) function, which returns
>>> the evaluated n'th entry in ... , without evaluating the
>>> other ... entries.
>>> > fn <- function(..., n) ...elt(n)
>>> > fn(stop("one"), 3*5, stop("three"), n=2)
>>> [1] 15
>>>
>>> Prior to 3.5.0, eval the appropriate component of the output
>>> of substitute() in the appropriate environment:
>>> > gn <- function(..., n) {
>>> +   nthExpr <- substitute(...())[[n]]
>>> +   eval(nthExpr, envir=parent.frame())
>>> + }
>>> > gn(stop("one"), environment(), stop("two"), n=2)
>>> 
>>>
>>
>> Bill, the last of these doesn't quite work, because ... can be passed
>> down through a string of callers.  You don't necessarily want to
>> evaluate it in the parent.frame().  For example:
>>
>> x <- "global"
>> f <- function(...) {
>> x <- "f"
>> g(...)
>> }
>> g <- function(...) {
>> firstExpr <- substitute(...())[[1]]
>> c(list(...)[[1]], eval(firstExpr, envir = parent.frame()))
>> }
>>
>> Calling g(x) correctly prints "global" twice, but calling f(x)
>> incorrectly prints
>>
>> [1] "global" "f"
>>
>> You can get the first element of ... without evaluating the rest using
>> ..1, but I don't know a way to do this for general n in pre-3.5.0 base R.
>
>
> Here's a way to do that:
>
> eval(as.name(paste0("..", n)))
>
> I was surprised this worked for n > 9, but it does.  Looking at the source,
> I think the largest legal value for n is huge; you'd hit other limits long
> before n was too big.

Maybe just get(paste0("..", n)) ?

Hadley

-- 
http://hadley.nz

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread peter dalgaard


> On 3 May 2018, at 19:23 , Hadley Wickham  wrote:
> 
> Maybe just get(paste0("..", n)) ?
> 
> Hadley

Maybe not. These things are slippery.

> f <- function(...) get("..1")
> f(2)
Error in get("..1") : object '..1' not found

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Henrik Bengtsson
Also, as mentioned in my
https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
not specifying the mode argument, the default on Windows is mode = "w"
*except* for certain, case-sensitive, filename extensions:

if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url)))
mode <- "wb"

Just like the need for mode = "wb" on Windows, the above
special-file-extension-hack is only happening on Windows, and is only
documented in ?download.file if you're on Windows; so someone who's on
Linux/macOS trying to help someone on Windows may not be aware of
this. This adds to even more confusions, e.g. "works for me".

/Henrik

On Thu, May 3, 2018 at 7:27 AM, Joris Meys  wrote:
> Thank you Henrik and Martin for explaining what was going on. Very
> insightful!
>
> On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms  wrote:
>>
>> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
>>  wrote:
>> > Use mode="wb" when you download the file. See
>> > https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
>> >
>> > R core, and others, is there a good argument for why we are not making
>> > this
>> > the default download mode? It seems like a such a simple fix to such a
>> > common "mistake".
>>
>> I'd like to second this feature request. This default behaviour is
>> unexpected and often leads to r scripts that were written on
>> mac/linux, to produce corrupted files on windows, checksum mismatches,
>> etc.
>>
>> Even for text files, the default should be to download the file as-is.
>> Trying to "fix" line-endings should be opt-in, never the default.
>> Downloading a file via a browser or ftp client on windows also doesn't
>> change the file, why should R?
>
>
> I third the feature request.
>
>>
>>
>>
>> On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch 
>> wrote:
>> > Many downloads are text files (HTML, CSV, etc.), and if those are
>> > downloaded
>> > in binary, a Windows user might end up with a file that Notepad can't
>> > handle, because it would have Unix-style line endings.
>>
>> True but I don't think this is relevant. The same holds e.g. for the R
>> files in source packages, which also have unix line endings. Most
>> Windows users will use an actual editor that understands both types of
>> line endings, or can convert between the two.
>>
>> Downloading-file should do just that.
>
>
> Again, I agree. In my (limited) experience the only program that fails to
> properly display \n as a line ending, is Notepad. But it can still open the
> file regardless. If line ending conflicts cause bugs, it's almost always a
> unix-like OS struggling with Windows-style endings. I have yet to meet the
> first one the other way around.
>
> Cheers
> Joris
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
> ---
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] length of `...`

2018-05-03 Thread Michel Lang
FWIW, there is also a backport of `...length()` for R versions >3.0.0
in my package backports (shameless self promotion):
.


2018-05-03 19:41 GMT+02:00 peter dalgaard :
>
>
>> On 3 May 2018, at 19:23 , Hadley Wickham  wrote:
>>
>> Maybe just get(paste0("..", n)) ?
>>
>> Hadley
>
> Maybe not. These things are slippery.
>
>> f <- function(...) get("..1")
>> f(2)
> Error in get("..1") : object '..1' not found
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Proposed speedup of ifelse

2018-05-03 Thread Hugh Parsonage
Thanks Radford. I concur with all your points. I've attempted to address
the issues you raised through the github.io post.  The new method appears
to be slower for test lengths < 100 and possibly longer lengths (not just <
10). Of course length(test) < 100 is very quick, so I simply added this to
the conditions that cause the old ifelse method to be invoked. I'll leave
it to R-core to decide whether or not the benefits for longer vectors are
worth it.






On Fri, 4 May 2018 at 01:01 Radford Neal  wrote:

> > I propose a patch to ifelse that leverages anyNA(test) to achieve an
> > improvement in performance. For a test vector of length 10, the change
> > nearly halves the time taken and for a test of length 1 million, there
> > is a tenfold increase in speed. Even for small vectors, the
> > distributions of timings between the old and the proposed ifelse do
> > not intersect.
>
> For smaller vectors, your results are significantly affected by your
> invoking the old version via base::ifelse.  You could try defining
> your new version as new_ifelse, and invoking the old version as just
> ifelse.  There might still be some issues with the two versions having
> different context w.r.t environments, and hence looking up functions
> in different ways.  You could copy the code of the old version and
> define it in the global environment just like new_ifelse.
>
> When using ifelse rather than base::ifelse, it seems the new version
> is slower for vectors of length 10, but faster for long vectors.
>
> Also, I'd use system.time rather than microbenchmark.  The latter will
> mix invocations of the two functions in a way where it is unclear that
> garbage collection time will be fairly attributed.  Also, it's a bit
> silly to plot the distributions of times, which will mostly reflect
> variations in when garbage collections at various levels occur - just
> the mean is what is relevant.
>
> Regards,
>
>Radford Neal
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] download.file does not process gz files correctly (truncates them?)

2018-05-03 Thread Tomas Kalibera

On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:

Also, as mentioned in my
https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
not specifying the mode argument, the default on Windows is mode = "w"
*except* for certain, case-sensitive, filename extensions:

 if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url)))
 mode <- "wb"

Just like the need for mode = "wb" on Windows, the above
special-file-extension-hack is only happening on Windows, and is only
documented in ?download.file if you're on Windows; so someone who's on
Linux/macOS trying to help someone on Windows may not be aware of
this. This adds to even more confusions, e.g. "works for me".
If we were designing the API today, it would probably make more sense 
not to convert any line endings by default. Today's editors _usually_ 
can cope with different line endings and it is probably easier to detect 
that a text file has incorrect line endings rather than detecting that a 
binary file has been corrupted by an attempt to convert line endings. 
But whether to change existing, documented behavior is a different 
question. In order to help users and programmers who do not read the 
documentation carefully we would create problems for users and 
programmers who do. The current heuristic/hack is in line with the 
compatibility approach: it detects files that are obviously binary, so 
it changes the default behavior only for cases when it would obviously 
cause damage.


Tomas




/Henrik

On Thu, May 3, 2018 at 7:27 AM, Joris Meys  wrote:

Thank you Henrik and Martin for explaining what was going on. Very
insightful!

On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms  wrote:

On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
 wrote:

Use mode="wb" when you download the file. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.

R core, and others, is there a good argument for why we are not making
this
the default download mode? It seems like a such a simple fix to such a
common "mistake".

I'd like to second this feature request. This default behaviour is
unexpected and often leads to r scripts that were written on
mac/linux, to produce corrupted files on windows, checksum mismatches,
etc.

Even for text files, the default should be to download the file as-is.
Trying to "fix" line-endings should be opt-in, never the default.
Downloading a file via a browser or ftp client on windows also doesn't
change the file, why should R?


I third the feature request.




On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch 
wrote:

Many downloads are text files (HTML, CSV, etc.), and if those are
downloaded
in binary, a Windows user might end up with a file that Notepad can't
handle, because it would have Unix-style line endings.

True but I don't think this is relevant. The same holds e.g. for the R
files in source packages, which also have unix line endings. Most
Windows users will use an actual editor that understands both types of
line endings, or can convert between the two.

Downloading-file should do just that.


Again, I agree. In my (limited) experience the only program that fails to
properly display \n as a line ending, is Notepad. But it can still open the
file regardless. If line ending conflicts cause bugs, it's almost always a
unix-like OS struggling with Windows-style endings. I have yet to meet the
first one the other way around.

Cheers
Joris


--
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)

---
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel