Re: [Rd] [R] R is creating a new level which is emty after importing a SAS file

2018-07-04 Thread peter dalgaard
It is not obvious that this is an error. If your nominal variable in SAS has a 
level which is not present in data, then R might just be making a faithful 
translation. There is a distinction between (a) having a gender variable with 
two levels of which 0 females and (b) pretending that male is the only possible 
gender.

Anyways, droplevels() is your friend. (Notice that it easier to remove levels 
that you do not want than to insert levels that have been unwantedly deleted on 
input.) 

-pd

> On 4 Jul 2018, at 19:16 , Adam Z. Jabir  wrote:
> 
> Hi,
> 
> I have imported some sasdata into R using the sas7bdat package. I have some 
> nominal variables with some missing values.
> 
> R is creating a new level which is emty �.When I ask for tabulate this new 
> level is presented with 0 as a frequency.
> 
> I want to get rid of this level and have my file imported correctly.
> 
> Do you have some hint to help solve this problem?
> 
> 
> Please use this email adress to answer this query.
> 
> 
> Best,
> 
> Adam
> 
> 
> Envoy� � partir de Outlook
> 
>   [[alternative HTML version deleted]]
> 
> __
> r-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] unexpected behavior of unzip with list=T and unzip=/usr/bin/unzip

2018-07-04 Thread Paul Schrimpf
Hello,

I encountered some unexpected behavior of unzip when using info-zip's unzip
instead of R's internal program. Specifically, unzip("file.zip", list=TRUE,
unzip=/usr/bin/unzip) produces incorrect output if the zip archive has
filenames with spaces, and results in an error if the zip archive includes
an archive comment or file comments.

Here is some code to reproduce along with the attached files

## (mostly) expected behavior
res.intern <- unzip("noSpaces.zip",list=TRUE)
res.infozip <- unzip("noSpaces.zip",list=TRUE,unzip="/usr/bin/unzip")

identical(res.intern,res.infozip) ## will be false, but expected from
  ## documentation about dates
identical(res.infozip$Name,res.intern$Name) ## True
res.infozip$Length==res.intern$Length   ## TRUE
identical(res.infozip$Length,res.intern$Length) ## FALSE, because
## former numeric, later
integer

## More problematic cases
print(unzip("fileNameWithSpaces.zip",list=TRUE))
print(unzip("fileNameWithSpaces.zip",list=TRUE,unzip="/usr/bin/unzip"))
  ## read.table is used to parse output of unzip -l, and gets
  ## confused by extra spaces

print(unzip("withArchiveComment.zip",list=TRUE))
print(unzip("withArchiveComment.zip",list=TRUE,unzip="/usr/bin/unzip"))
  ## produces an error

print(unzip("entryComments.zip",list=TRUE))
print(unzip("entryComments.zip",list=TRUE,unzip="/usr/bin/unzip"))
  ## produces an error

Looking at the code for R's unzip, the basic problem is that it makes a
bunch of assumptions about the format of the output of "unzip -l"  that are
not always true and are not verified.

It's unclear to me whether R's unzip should be expected to be compatible
with all sorts of external unzip programs, so perhaps a sufficient solution
is simply to revise the documentation (which already mentions potential
problems  with dates and unzip, list=TRUE, and external programs).

Alternatively, R's unzip function could be changed to work with info-zip
unzip by :
(1) add "-ql" instead of just "-l" when list=TRUE to eliminate the printing
of comments
(2) not use read.table to parse the output of unzip, instead to something
like the following (which is an admittedly messy workaround)

res <- if (WINDOWS)
system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE)
else system2(unzip, c("-ql", shQuote(zipfile)), stdout = TRUE,
env = c("TZ=UTC"))
dashes <- grep("--",res)
s <- dashes[1]+1
l <- dashes[2]-1
starts <- gregexpr("-+",res[dashes[1]])[[1]]
ends <- gregexpr("[[:space:]]+",res[dashes[1]])[[1]]
z <- data.frame(
Name=sapply(res[s:l], function(x) {
  substr(x, starts[4], stop=nchar(x))
}),
Length=sapply(res[s:l], function(x) {
  as.numeric(substr(x, starts[1], stop=ends[1]))
}),
Date=sapply(res[s:l], function(x) {
  substr(x, starts[2], stop=ends[2])
}),
Time=sapply(res[s:l], function(x) {
  substr(x, starts[3], stop=ends[3])
}),
stringsAsFactors=FALSE
)
rownames(z) <- NULL

I can submit a patch if this is appropriate. I'm really not sure though
because I am new to R-devel. Also, this has the downsides of relying on the
behavior of info-zip unzip, which might change in future versions and is
unlikely to be the same for other external unzip programs. On the other
hand, the current code also relies on the behavior of info-zip unzip, but
also doesn't work in some cases.

Thanks,
Paul

P.S.

My sessionInfo is

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.3.1.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] devtools_1.13.5

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1withr_2.1.2memoise_1.1.0
digest_0.6.15

And unzip -v

UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for
details.

Latest sources and executables are at ftp://ftp.info-zip.org/pub/infozip/ ;
see ftp://ftp.info-zip.org/pub/infozip/UnZip.html for other sites.

Compiled with gcc 5.3.0 for Unix (Linux ELF) on Apr 17 2016.

UnZip special compilation options:
ACORN_FTYPE_NFS
COPYRIGHT_CLEAN (PKZIP 0.9x unreducing method