Re: [Rd] unlist errors on a nested list of empty lists

2018-05-10 Thread Martin Maechler
> Steven Nydick 
> on Wed, 9 May 2018 13:25:11 + writes:

> I do not have access to the bug reporting system. If somebody can get me
> access, I can create a formal bug report.

> The latter issues seem like duplicates of:
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=12572 (with slightly
> different output), but as that bug was reported nearly 10 years ago, it
> might be worth creating an update under R version 3. I could not find the
> first issue when searching the bug reports (which I ran into when trying 
to
> parse JSON files), which is why I posted on r-devel.

Indeed, thanks a lot Steven (and Duncan!),  I've found the
following:

1. The first issue is a new bug, in R "only" since R version
  3.4.0, i.e. working upto R 3.3.3.
  Duncan's patch basically fixes.
  I've found that the C code there can be simplified and
  deconvoluted, and after that, I will commit basically the bug
  fix of Duncan Murdoch.   

2. The second issues indeed are an entirely different bug, and I
   would say actually point to a "design problem" of the wholething.
   The C code in islistfactor() talks about arbitrary trees with
   all leaves factors,  whereas the R code -- in the
   islistfactor() is TRUE -- actually only correctly deals with
   simple trees, namely of depth exactly 1. That are those you typically
   get from e.g., lapply(), and so this old design-bug triggers
   relatively rarely.

Last but not least: I have created an account for you, Steven,
on the bugzilla site.

Given we have holidays till the weekend and private duties of
mine, I won't get to more for now.

Best
Martin Maechler

   > On Tue, May 8, 2018 at 7:51 PM Duncan Murdoch 
> wrote:

>> On 08/05/2018 4:50 PM, Steven Nydick wrote:
>> > It also does the same thing if the factor is not on the first level of
>> > the list, which seems to be due to the fact that the islistfactor is
>> > recursive, but if a list is a list-factor, the first level lists are
>> > coerced into character strings.
>> >
>> >  > x <- list(list(factor(LETTERS[1])))
>> >  > unlist(x)
>> > Error in as.character.factor(x) : malformed factor
>> >
>> > However, if one of the factors is at the top level, and one is nested,
>> > then the result is:
>> >
>> >  > x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
>> >  > unlist(x)
>> >
>> > [1]  B
>> > Levels: B
>> >
>> > ... which does not seem to me to be desired behavior.
>> 
>> The patch I suggested doesn't help with either of these.  I'd suggest
>> collecting examples, and posting a bug report to bugs.r-project.org.
>> 
>> Duncan Murdoch
>> 
>> 
>> >
>> >
>> > On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch > > > wrote:
>> >
>> > On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
>> >  > On 08/05/2018 1:48 PM, Steven Nydick wrote:
>> >  >> Reproducible example:
>> >  >>
>> >  >> x <- list(list(list(), list()))
>> >  >> unlist(x)
>> >  >>
>> >  >> *> Error in as.character.factor(x) : malformed factor*
>> >  >
>> >  > The error comes from the line
>> >  >
>> >  > structure(res, levels = lv, names = nm, class = "factor")
>> >  >
>> >  > which is called because unlist() thinks that some entry is a
>> factor,
>> >  > with NULL levels and NULL names.  It's not legal for a factor to
>> have
>> >  > NULL levels.  Probably it should never get here; the earlier 
test
>> >  >
>> >  > if (.Internal(islistfactor(x, recursive))) {
>> >  >
>> >  > should have been false, and then the result would have been
>> >  >
>> >  > .Internal(unlist(x, recursive, use.names))
>> >  >
>> >  > (with both recursive and use.names being TRUE), which returns
>> NULL.
>> >
>> > And the problem is in the islistfactor function in 
src/main/apply.c,
>> > which looks like this:
>> >
>> > static Rboolean islistfactor(SEXP X)
>> > {
>> >   int i, n = length(X);
>> >
>> >   switch(TYPEOF(X)) {
>> >   case VECSXP:
>> >   case EXPRSXP:
>> >   if(n == 0) return NA_LOGICAL;
>> >   for(i = 0; i < LENGTH(X); i++)
>> >   if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
>> >   return TRUE;
>> >   break;
>> >   }
>> >   return isFactor(X);
>> > }
>> >
>> > One of those deeply nested lists is length 0, so at the lowest 
level
>> it
>> > returns NA_LOGICAL.  But then it does C-style logical testing on 
the
>> > results.  I think to C NA_LOGICAL counts as true, so at the next
>> level
>> > up we get the wrong answ

[Rd] grDevices::grey could provide clearer error message when length(alpha) != length(level)

2018-05-10 Thread Hugh Parsonage
e.g.

grDevices::grey(level = 0.1, alpha = c(0, 1))
#> Error in grey(level = 0.1, alpha = c(0, 1)) :
#>  attempt to set index 1/1 in SET_STRING_ELT

Perhaps
#> Error in grey(level = 0.1, alpha = c(0, 1)) :
#>  lengths of 'level' and 'alpha' differ

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grDevices::grey could provide clearer error message when length(alpha) != length(level)

2018-05-10 Thread Duncan Murdoch

On 10/05/2018 9:17 AM, Hugh Parsonage wrote:

e.g.

grDevices::grey(level = 0.1, alpha = c(0, 1))
#> Error in grey(level = 0.1, alpha = c(0, 1)) :
#>  attempt to set index 1/1 in SET_STRING_ELT

Perhaps
#> Error in grey(level = 0.1, alpha = c(0, 1)) :
#>  lengths of 'level' and 'alpha' differ



Or it could return a vector of length 2.  This is not how it is 
documented to operate, but it is how many other R functions handle 
vectors of mixed lengths.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] readLines() behaves differently for gzfile connection

2018-05-10 Thread Ben Heavner
When I read a .gz file with readLines() in 3.4.3, it returns text (and a
warning). In 3.5.0, it gives a warning, but no text. Is this expected
behavior or a bug?

3.4.3:
> source_file = "1k_annotation.gz"
> readfile_con <- gzfile(source_file, "r")
> readLines(readfile_con, n = 5)
[1] "#chr\tpos\tref\talt\t



Warning message:
In readLines(readfile_con, n = 5) :
  seek on a gzfile connection returned an internal error

> close(readfile_con)

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3

-

3.5.0:
> source_file = "1k_annotation.gz"
> readfile_con <- gzfile(source_file, "r")
> readLines(readfile_con, n = 5)
[1] "" "" "" "" ""
Warning message:
In readLines(readfile_con, n = 5) :
  seek on a gzfile connection returned an internal error
> close(readfile_con)
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0


(note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container, and
3.4.3 on my mac desktop machine)

Thanks!
Ben Heavner

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] readLines() behaves differently for gzfile connection

2018-05-10 Thread Michael Lawrence
Would it be possible to get that file or a representative subset of it
somewhere so that I can reproduce this?

Thanks,
Michael

On Thu, May 10, 2018 at 3:31 PM, Ben Heavner  wrote:
> When I read a .gz file with readLines() in 3.4.3, it returns text (and a
> warning). In 3.5.0, it gives a warning, but no text. Is this expected
> behavior or a bug?
>
> 3.4.3:
>> source_file = "1k_annotation.gz"
>> readfile_con <- gzfile(source_file, "r")
>> readLines(readfile_con, n = 5)
> [1] "#chr\tpos\tref\talt\t
>
> 
>
> Warning message:
> In readLines(readfile_con, n = 5) :
>   seek on a gzfile connection returned an internal error
>
>> close(readfile_con)
>
>> sessionInfo()
> R version 3.4.3 (2017-11-30)
> Platform: x86_64-apple-darwin15.6.0 (64-bit)
> Running under: macOS Sierra 10.12.6
>
> Matrix products: default
> BLAS:
> /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.3
>
> -
>
> 3.5.0:
>> source_file = "1k_annotation.gz"
>> readfile_con <- gzfile(source_file, "r")
>> readLines(readfile_con, n = 5)
> [1] "" "" "" "" ""
> Warning message:
> In readLines(readfile_con, n = 5) :
>   seek on a gzfile connection returned an internal error
>> close(readfile_con)
>> sessionInfo()
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Debian GNU/Linux 9 (stretch)
>
> Matrix products: default
> BLAS: /usr/lib/openblas-base/libblas.so.3
> LAPACK: /usr/lib/libopenblasp-r0.2.19.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=C
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0
>
> 
> (note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container, and
> 3.4.3 on my mac desktop machine)
>
> Thanks!
> Ben Heavner
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] readLines() behaves differently for gzfile connection

2018-05-10 Thread Ben Heavner
You bet - it's available on github at
https://github.com/UW-GAC/wgsaparsr/blob/master/tests/testthat/1k_annotation.gz

-Ben

On Thu, May 10, 2018 at 4:17 PM, Michael Lawrence  wrote:

> Would it be possible to get that file or a representative subset of it
> somewhere so that I can reproduce this?
>
> Thanks,
> Michael
>
> On Thu, May 10, 2018 at 3:31 PM, Ben Heavner  wrote:
> > When I read a .gz file with readLines() in 3.4.3, it returns text (and a
> > warning). In 3.5.0, it gives a warning, but no text. Is this expected
> > behavior or a bug?
> >
> > 3.4.3:
> >> source_file = "1k_annotation.gz"
> >> readfile_con <- gzfile(source_file, "r")
> >> readLines(readfile_con, n = 5)
> > [1] "#chr\tpos\tref\talt\t
> >
> > 
> >
> > Warning message:
> > In readLines(readfile_con, n = 5) :
> >   seek on a gzfile connection returned an internal error
> >
> >> close(readfile_con)
> >
> >> sessionInfo()
> > R version 3.4.3 (2017-11-30)
> > Platform: x86_64-apple-darwin15.6.0 (64-bit)
> > Running under: macOS Sierra 10.12.6
> >
> > Matrix products: default
> > BLAS:
> > /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRblas.0.dylib
> > LAPACK:
> > /Library/Frameworks/R.framework/Versions/3.4/
> Resources/lib/libRlapack.dylib
> >
> > locale:
> > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.4.3
> >
> > -
> >
> > 3.5.0:
> >> source_file = "1k_annotation.gz"
> >> readfile_con <- gzfile(source_file, "r")
> >> readLines(readfile_con, n = 5)
> > [1] "" "" "" "" ""
> > Warning message:
> > In readLines(readfile_con, n = 5) :
> >   seek on a gzfile connection returned an internal error
> >> close(readfile_con)
> >> sessionInfo()
> > R version 3.5.0 (2018-04-23)
> > Platform: x86_64-pc-linux-gnu (64-bit)
> > Running under: Debian GNU/Linux 9 (stretch)
> >
> > Matrix products: default
> > BLAS: /usr/lib/openblas-base/libblas.so.3
> > LAPACK: /usr/lib/libopenblasp-r0.2.19.so
> >
> > locale:
> >  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> >  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> >  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=C
> >  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> >  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] compiler_3.5.0
> >
> > 
> > (note: I'm running 3.5.0 via the docker rocker/tidyverse:3.5 container,
> and
> > 3.4.3 on my mac desktop machine)
> >
> > Thanks!
> > Ben Heavner
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel