Re: [R] Some days missing using xtabs

arun Tue, 23 Jul 2013 06:11:03 -0700

Hi,

I tried this without the changing the class, but there was no warning.


 str(release_freq)
#'data.frame':    62 obs. of  4 variables:
# $ d_release: Factor w/ 31 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ m_release: Factor w/ 2 levels "5","6": 1 1 1 1 1 1 1 1 1 1 ...
# $ y_release: Factor w/ 1 level "2004": 1 1 1 1 1 1 1 1 1 1 ...
# $ Freq     : num  0 0 0 0 1 1 1 0 0 1 ...
 str(temp_h12)
#'data.frame':    31 obs. of  4 variables:
# $ y_temp: int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
# $ m_temp: int  5 5 5 5 5 5 5 5 5 5 ...
# $ d_temp: int  1 2 3 4 5 6 7 8 9 10 ...
# $ temp  : num  16.9 18 17.4 19.7 105.7 ...


res<-merge(release_freq, temp_h12, by.x=c("y_release","m_release","d_release"), 
by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)

  head(res)
 # y_release m_release d_release Freq temp
#1      2004         5         1    0 16.9
#2      2004         5        10    1 16.1
#3      2004         5        11    1 15.8
#4      2004         5        12    1 15.1
#5      2004         5        13    0 17.8
#6      2004         5        14    0 17.4

# changing the class
release_freq$d_release <- as.integer(as.character(release_freq$d_release))
release_freq$m_release <- as.integer(as.character(release_freq$m_release))
release_freq$y_release <- as.integer(as.character(release_freq$y_release))
res1<- merge(release_freq, temp_h12, 
by.x=c("y_release","m_release","d_release"), 
by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)

head(res1)
#  y_release m_release d_release Freq temp
#1      2004         5         1    0 16.9
#2      2004         5        10    1 16.1
#3      2004         5        11    1 15.8
#4      2004         5        12    1 15.1
#5      2004         5        13    0 17.8
#6      2004         5        14    0 17.4

The results are not identical.
  identical(res,res1)
#[1] FALSE
str(res)
#'data.frame':    31 obs. of  5 variables:
# $ y_release: Factor w/ 1 level "2004": 1 1 1 1 1 1 1 1 1 1 ...
# $ m_release: Factor w/ 2 levels "5","6": 1 1 1 1 1 1 1 1 1 1 ...
# $ d_release: Factor w/ 31 levels "1","2","3","4",..: 1 10 11 12 13 14 15 16 
17 18 ...
# $ Freq     : num  0 1 1 1 0 0 1 1 0 1 ...
# $ temp     : num  16.9 16.1 15.8 15.1 17.8 17.4 16 17.7 17.3 22.3 ...
 str(res1)
#'data.frame':    31 obs. of  5 variables:
# $ y_release: int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
# $ m_release: int  5 5 5 5 5 5 5 5 5 5 ...
# $ d_release: int  1 10 11 12 13 14 15 16 17 18 ...
# $ Freq     : num  0 1 1 1 0 0 1 1 0 1 ...
# $ temp     : num  16.9 16.1 15.8 15.1 17.8 17.4 16 17.7 17.3 22.3 ...


sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringr_0.6.2  reshape2_1.2.2

loaded via a namespace (and not attached):
[1] plyr_1.8

A.K.

----- Original Message -----
From: Rui Barradas <ruipbarra...@sapo.pt>
To: Stefano Sofia <stefano.so...@regione.marche.it>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Sent: Tuesday, July 23, 2013 6:50 AM
Subject: Re: [R] Some days missing using xtabs

Hello,

As for your second question, before merge(), try the following.

release_freq$d_release <- as.integer(as.character(release_freq$d_release))
release_freq$m_release <- as.integer(as.character(release_freq$m_release))
release_freq$y_release <- as.integer(as.character(release_freq$y_release))


And the warning is gone.

Hope this helps,

Rui Barradas

Em 23-07-2013 10:33, Stefano Sofia escreveu:
> Dear R-users,
> given the following data frame called hospital_2004
>
> gender d_birth m_birth y_birth address d_admittance m_admittance y_admittance 
> yard_admittance d_release m_release y_release yard_release diaprinc diasec1 
> diasec2 diasec3 diasec4 diasec5
> 2 13 12 1929 42002 30 3 2004 3003 6 5 2004 4902 430 4299 51881 4275 78001 0
> 1 1 8 1935 42002 7 4 2004 2401 18 5 2004 1801 20500 V581 0388 5849 0 0
> 1 23 12 1956 42018 26 4 2004 2402 31 5 2004 2402 1552 5715 7895 25000 4148 
> 5722
> 1 9 8 1919 42002 05 5 2004 2602 22 5 2004 4902 51881 4254 4275 0 0 0
> 2 11 1 1925 52014 30 4 2004 2603 13 6 2004 4902 51881 49121 2732 4275 4299 
> 5849
> 2 1 3 1963 44060 1 5 2004 5101 16 5 2004 2401 3201 1519 1976 1983 4019 0
> 1 6 3 1937 45010 6 5 2004 3003 12 5 2004 4901 431 3314 41189 25001 4019 V594
> 1 3 9 1931 42034 3 5 2004 5101 5 5 2004 5101 78559 4829 5119 1619 4241 585
> 2 13 9 1912 41007 5 5 2004 4901 7 5 2004 4901 85225 4019 42731 49121 0 0
> 1 21 10 1936 15146 7 5 2004 4901 10 5 2004 4901 431 430 V594 V595 0 0
> 2 8 5 1933 43044 8 5 2004 5802 8 6 2004 5802 5712 45620 2851 5119 5184 0
> 1 25 1 1926 41057 8 5 2004 4901 15 5 2004 4901 431 78001 49121 0 0 0
> 1 6 1 1923 42002 10 5 2004 1401 11 5 2004 4901 4440 412 4413 0 0 0
> 1 19 3 1934 42022 9 5 2004 1401 21 6 2004 4901 4413 5609 99811 4019 412 0
> 1 6 6 1921 43052 15 5 2004 4302 4 6 2004 4302 1890 20280 436 49121 9986 V1005
>
> when I try to evaluate the frequency of daily releases through
>
> release_freq <- as.data.frame(xtabs( ~ d_release + m_release + y_release, 
> data=hospital_2004))
>
> I get the following result:
>
> d_release m_release y_release Freq
> 4         5      2004    0
> 5         5      2004    1
> 6         5      2004    1
> 7         5      2004    1
> 8         5      2004    0
> 10         5      2004    1
> 11         5      2004    1
> 12         5      2004    1
> 13         5      2004    0
> 15         5      2004    1
> 16         5      2004    1
> 18         5      2004    1
> 21         5      2004    0
> 22         5      2004    1
> 31         5      2004    1
> 4         6      2004    1
> 5         6      2004    0
> 6         6      2004    0
> 7         6      2004    0
> 8         6      2004    1
> 10         6      2004    0
> 11         6      2004    0
> 12         6      2004    0
> 13         6      2004    1
> 15         6      2004    0
> 16         6      2004    0
> 18         6      2004    0
> 21         6      2004    1
> 22         6      2004    0
> 31         6      2004    0
>
> Why the 1st, 2nd, 3rd, 9th, 14th, 17th, 19th, 20th, from 23rd to 30th of both 
> May and June are missing? (and there is the 31st of June?)
>
> And a final question: why given another data frame called temp_h12
>
> y_temp m_temp d_temp temp
> 2004 5 1 16.90
> 2004 5 2 18.00
> 2004 5 3 17.40
> 2004 5 4 19.70
> 2004 5 5 105.70
> 2004 5 6 17.30
> 2004 5 7 17.00
> 2004 5 8 16.20
> 2004 5 9 16.10
> 2004 5 10 16.10
> 2004 5 11 15.80
> 2004 5 12 15.10
> 2004 5 13 17.80
> 2004 5 14 17.40
> 2004 5 15 16.00
> 2004 5 16 17.70
> 2004 5 17 17.30
> 2004 5 18 22.30
> 2004 5 19 23.30
> 2004 5 20 24.30
> 2004 5 21 19.90
> 2004 5 22 15.70
> 2004 5 23 15.80
> 2004 5 24 17.10
> 2004 5 25 18.30
> 2004 5 26 21.00
> 2004 5 27 18.20
> 2004 5 28 17.90
> 2004 5 29 19.40
> 2004 5 30 22.10
> 2004 5 31 17.40
>
> merge(release_freq, temp_h12, by.x=c("y_release","m_release","d_release"), 
> by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)
>
> gives the following warning
>
> Warning message:
> In `[<-.factor`(`*tmp*`, ri, value = 1:31) :
>    invalid factor level, NAs generated
> ?
>
>
>
> Thank you for your help
> Stefano Sofia
>
>
>
> ________________________________
>
> AVVISO IMPORTANTE: Questo messaggio di posta elettronica può contenere 
> informazioni confidenziali, pertanto è destinato solo a persone autorizzate 
> alla ricezione. I messaggi di posta elettronica per i client di Regione 
> Marche possono contenere informazioni confidenziali e con privilegi legali. 
> Se non si è il destinatario specificato, non leggere, copiare, inoltrare o 
> archiviare questo messaggio. Se si è ricevuto questo messaggio per errore, 
> inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio 
> computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso 
> di necessità ed urgenza, la risposta al presente messaggio di posta 
> elettronica può essere visionata da persone estranee al destinatario.
> IMPORTANT NOTICE: This e-mail message is intended to be received only by 
> persons entitled to receive the confidential information it may contain. 
> E-mail messages to clients of Regione Marche may contain information that is 
> confidential and legally privileged. Please do not read, copy, forward, or 
> store this message unless you are an intended recipient of it. If you have 
> received this message in error, please forward it to the sender and delete it 
> completely from your computer system.
>
>     [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Some days missing using xtabs

Reply via email to