Re: [R] reshape2's dcast() Adds NAs to Data Frame

Jeff Newmiller Wed, 08 Aug 2012 23:06:06 -0700

I took a closer look, and unused factor levels is not the problem... theproblem is defining id variables appropriately.

1) "sample" is the name of a builtin function, so it is not advisable touse it as the name of data.

I have used "samp" instead of "sample"

2) Your input data is essentially in long form already, so you don't needto melt it.

3) It is almost never a good idea to use a floating point column as an idvariable.

Perhaps you were imagining something like:

> samp.cast <- dcast(samp[,1:5], site+sampdate+era~param,value.var="quant" )

  > str(samp.cast)
  'data.frame':  35 obs. of  57 variables:

$ site : Factor w/ 5 levels "D-1","D-2","D-3",..: 1 1 1 2 2 2 2 2 2 2...$ sampdate: Date, format: "2007-12-12" "2008-03-15" "2009-09-02""2010-06-10" ...

  $ era     : Factor w/ 2 levels "Post","Pre": 1 1 1 1 1 1 1 1 1 1 ...
  $ AgDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ AgTot : num 0.00013 0.00013 0.00013 0.00013 0.00013 0.00013 0.000130.00013 0.00013 0.00013 ...

  $ AlDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ AlTot   : num  0.106 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 0.08 ...
  $ Alk     : num  231 228 208 217 226 214 194 187 179 188 ...
  $ AsDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ AsTot : num 0.0113 0.0008 0.0008 0.0017 0.0027 0.0007 0.0022 0.00290.0023 0.0027 ...

  $ BaDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ BeDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ BeTot : num 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.0050.005 ...

  $ BiDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ CaDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ CaTot   : num  100 88.4 163 200 244 0.04 122 112 98.4 103 ...
  $ CdDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ CdTot : num 2e-04 2e-04 2e-04 2e-04 2e-04 2e-04 2e-04 2e-04 2e-042e-04 ...

  $ ClTot   : num  1.43 1.34 13.7 16.8 19.1 15.1 10.9 9.37 8.49 10.4 ...
  $ CoDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ CrDis : num 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.006 0.0060.006 ...

  $ CrTot   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ CuDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ CuTot : num 0.0239 0.0137 0.0015 0.00106 0.00106 0.00353 0.001080.009 0.00236 0.00144 ...

  $ DO      : num  4.96 9.91 6.98 6.2 6.47 5.73 5.84 5.74 6.12 6.39 ...
  $ FeDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ FeTot   : num  4.11 0.309 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.384 ...
  $ HgDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ HgTot : num NA 5.00e-05 5.00e-05 7.22e-07 1.93e-06 6.82e-076.56e-07 1.06e-06 1.41e-06 2.58e-05 ...

  $ MgDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ MgTot   : num  9.56 9.15 14.6 22.4 27 0.06 13.7 12.8 11 11.4 ...
  $ MnDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ MnTot : num 0.0348 0.0474 0.0231 0.004 0.004 0.004 0.004 0.0040.004 0.0049 ...

  $ MoDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ N       : num  0.293 0.05 15.8 41.2 54.7 34.5 16.7 13.9 10.4 11.9 ...

$ NH4 : num 0.97 0.82 0.036 0.03 0.06 0.03 0.034 0.045 0.03 0.031...

  $ NaDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ NiDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ NiTot   : num  0.01 0.224 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ...
  $ PbDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ PbTot : num 0.0253 0.0083 0.00596 0.0003 0.0003 0.0003 0.00030.00129 0.0003 0.000599 ...

  $ Pdis    : num  NA NA NA NA NA NA NA NA NA NA ...
  $ SC      : num  630 633 853 1129 1303 ...
  $ SO4     : num  65.8 75.4 159 226 268 167 101 83.3 69.9 61.3 ...

$ SbDis : num 0.000825 0.000825 0.000825 0.000825 0.000825 0.0008250.000825 0.000825 0.000825 0.000825 ...

  $ SbTot   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ SeDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ SeTot : num 0.00132 0.00122 0.00125 0.00181 0.00131 0.00114 0.001250.00125 0.00125 0.00138 ...

  $ SrDis   : num  NA NA NA NA NA NA NA NA NA NA ...
  $ TDS     : num  320 300 581 822 1020 662 507 418 335 385 ...
  $ TSS     : num  NA NA NA NA NA NA NA NA NA NA ...
  $ TlDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ TlTot : num 3e-04 3e-04 3e-04 3e-04 3e-04 3e-04 3e-04 3e-04 3e-043e-04 ...

  $ Vdis    : num  NA NA NA NA NA NA NA NA NA NA ...
  $ ZnDis   : num  NA NA NA NA NA NA NA NA NA NA ...

$ ZnTot : num 11.4 12.4 2.42 0.0406 0.0462 0.0318 0.0179 0.032 0.01780.0362 ...

  $ pH      : num  7.8 7.94 6.9 7.18 6.8 7.09 7.24 7.09 7.49 7.46 ...

Clearly, this still includes NA values, but if we look at the input datacorresponding to

the first row:

  > subset(samp,(site=="D-1")&("2007-12-12"==sampdate)&("Post"==era))
  site   sampdate  era param    quant ceneq1    floor  ceiling
  1   D-1 2007-12-12 Post AgTot 1.30e-04   TRUE 0.00e+00 1.30e-04
  2   D-1 2007-12-12 Post AlTot 1.06e-01  FALSE 1.06e-01 1.06e-01
  3   D-1 2007-12-12 Post   Alk 2.31e+02  FALSE 2.31e+02 2.31e+02
  4   D-1 2007-12-12 Post AsTot 1.13e-02  FALSE 1.13e-02 1.13e-02
  5   D-1 2007-12-12 Post BeTot 5.00e-03   TRUE 0.00e+00 5.00e-03
  6   D-1 2007-12-12 Post CaTot 1.00e+02  FALSE 1.00e+02 1.00e+02
  7   D-1 2007-12-12 Post CdTot 2.00e-04   TRUE 0.00e+00 2.00e-04
  8   D-1 2007-12-12 Post ClTot 1.43e+00  FALSE 1.43e+00 1.43e+00
  9   D-1 2007-12-12 Post CrDis 6.00e-03   TRUE 0.00e+00 6.00e-03
  10  D-1 2007-12-12 Post CuTot 2.39e-02  FALSE 2.39e-02 2.39e-02
  11  D-1 2007-12-12 Post    DO 4.96e+00  FALSE 4.96e+00 4.96e+00
  12  D-1 2007-12-12 Post FeTot 4.11e+00  FALSE 4.11e+00 4.11e+00
  13  D-1 2007-12-12 Post MgTot 9.56e+00  FALSE 9.56e+00 9.56e+00
  14  D-1 2007-12-12 Post MnTot 3.48e-02  FALSE 3.48e-02 3.48e-02
  15  D-1 2007-12-12 Post     N 2.93e-01  FALSE 2.93e-01 2.93e-01
  16  D-1 2007-12-12 Post   NH4 9.70e-01  FALSE 9.70e-01 9.70e-01
  17  D-1 2007-12-12 Post NiTot 1.00e-02   TRUE 0.00e+00 1.00e-02
  18  D-1 2007-12-12 Post PbTot 2.53e-02  FALSE 2.53e-02 2.53e-02
  19  D-1 2007-12-12 Post    SC 6.30e+02  FALSE 6.30e+02 6.30e+02
  20  D-1 2007-12-12 Post   SO4 6.58e+01  FALSE 6.58e+01 6.58e+01
  21  D-1 2007-12-12 Post SbDis 8.25e-04   TRUE 0.00e+00 8.25e-04
  22  D-1 2007-12-12 Post SeTot 1.32e-03  FALSE 1.32e-03 1.32e-03
  23  D-1 2007-12-12 Post   TDS 3.20e+02  FALSE 3.20e+02 3.20e+02
  24  D-1 2007-12-12 Post TlTot 3.00e-04   TRUE 0.00e+00 3.00e-04
  25  D-1 2007-12-12 Post ZnTot 1.14e+01  FALSE 1.14e+01 1.14e+01
  26  D-1 2007-12-12 Post    pH 7.80e+00  FALSE 7.80e+00 7.80e+00

There are only 26 chemicals corresponding to that row, but there
are a total of 54 different possible chemicals to quantify in the
first row.  Thus, there must be NA values inserted to fill out the
data frame.  (The problem gets worse when you try to keep those
other data columns as id columns... they represent additional

distinct combinations so you end up with more rows and fewer values ineach row.)

I am not familiar with the NADA library, so I cannot suggest what youSHOULD be doing, but it does seem that you should perhaps study some moreexamples of its use to figure out what form you should have your data in.


On Wed, 8 Aug 2012, arun wrote:

Hi,

I tried converting factors to character, but the results still has NAs.
convert.type1 <- function(obj,types){
    for (i in 1:length(obj)){
        FUN <- switch(types[i],character = as.character,
                                   numeric = as.numeric,
                                   factor = as.factor,
                   Date=as.Date.character,
                   logical=as.logical)   
        obj[,i] <- FUN(obj[,i])
    }
    obj
}

sample.melt1<-convert.type1(sample.melt,c("character","Date","character","character","logical","numeric","numeric","character","numeric"))
 str(sample.melt1)
#'data.frame':    715 obs. of  9 variables:
# $ site    : chr  "D-1" "D-1" "D-1" "D-1" ...
# $ sampdate: Date, format: "2007-12-12" "2007-12-12" ...
# $ era     : chr  "Post" "Post" "Post" "Post" ...
# $ param   : chr  "AgTot" "AlTot" "Alk" "AsTot" ...
# $ ceneq1  : logi  TRUE FALSE FALSE FALSE TRUE FALSE ...
# $ floor   : num  0 0.106 231 0.0113 0 100 0 1.43 0 0.0239 ...
# $ ceiling : num  1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 1.00e+02 
2.00e-04 1.43 6.00e-03 2.39e-02 ...
# $ variable: chr  "quant" "quant" "quant" "quant" ...
# $ value   : num  1.30e-04 1.06e-01 2.31e+02 1.13e-02 5.00e-03 1.00e+02 
2.00e-04 1.43 6.00e-03 2.39e-02 ...

sample.cast <- dcast(sample.melt1, site + sampdate + era + ceneq1 + floor + 
ceiling ~ param)
head(sample.cast)
  #site   sampdate  era ceneq1   floor ceiling AgDis AgTot AlDis Alk AlTot AsDis
#1  D-1 2007-12-12 Post  FALSE 0.00132 0.00132    NA    NA    NA  NA    NA    NA
#2  D-1 2007-12-12 Post  FALSE 0.01130 0.01130    NA    NA    NA  NA    NA    NA
#3  D-1 2007-12-12 Post  FALSE 0.02390 0.02390    NA    NA    NA  NA    NA    NA
#4  D-1 2007-12-12 Post  FALSE 0.02530 0.02530    NA    NA    NA  NA    NA    NA
#5  D-1 2007-12-12 Post  FALSE 0.03480 0.03480    NA    NA    NA  NA    NA    NA
#6  D-1 2007-12-12 Post  FALSE 0.10600 0.10600    NA    NA    NA  NA 0.106    NA
  #---------------------------------------------
  #---------------------------------------------
  #SO4 SrDis TDS TlDis TlTot TSS Vdis ZnDis ZnTot
#1  NA    NA  NA    NA    NA  NA   NA    NA    NA
#2  NA    NA  NA    NA    NA  NA   NA    NA    NA
#3  NA    NA  NA    NA    NA  NA   NA    NA    NA
#4  NA    NA  NA    NA    NA  NA   NA    NA    NA
#5  NA    NA  NA    NA    NA  NA   NA    NA    NA
#6  NA    NA  NA    NA    NA  NA   NA    NA    NA

A.K.





----- Original Message -----
From: Rich Shepard <rshep...@appl-ecosys.com>
To: R help <r-help@r-project.org>
Cc:
Sent: Wednesday, August 8, 2012 10:48 PM
Subject: Re: [R] reshape2's dcast() Adds NAs to Data Frame

On Wed, 8 Aug 2012, Jeff Newmiller wrote:

The explanation is that this is normal and consistent with behavior of
factors in general. If you don't want that, it is common to work with
character data instead of factors, only converting to factor when needed.
In most cases I invoke read.table with the as.is=TRUE argument and delay
converting to factors until I need them. Other people convert from factor
to character and back to factor to get rid of unwanted factor levels on an
as-needed basis.


Jeff,

  First thing tomorrow I will research the difference between characters and
data; I assumed they were the same.

Thanks,

Rich

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
---------------------------------------------------------------------------

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reshape2's dcast() Adds NAs to Data Frame

Reply via email to