[R] time zone issue - beginners question

2011-07-06 Thread B Laura
Hello all!

As beginner I'm struggling for a while with time zones issue and can't find
a suitable solution.
I would be grateful for any help.

Dataset imported from excel has a variable transplant.date which has been
recorded with CET time zone.

> subDataset$transplant.date
 [1] "2000-01-01 CET" "2000-01-01 CET" "2000-01-02 CET" "2000-01-02 CET"
"2000-01-02 CET" "2000-01-02 CET" "2000-01-04 CET" "2000-01-04 CET"
"2000-01-04 CET" "2000-01-04 CET" "2000-01-04 CET" "2000-01-05 CET"
"2000-01-05 CET"
[14] "2000-01-05 CET" "2000-01-05 CET"


However
> Sys.time()
[1] "2011-07-06 15:22:44 CEST"

I need to calculate time difference in days but I'm still getting wrong
calculations. Most likely is this time zone issue.


> as.numeric(as.Date("2000-1-1")-as.Date(subDataset$transplant.date))
[1] 1 1 0 0 0 0 -2 -2 -2 -2 -2 -3 -3 -3 -3


Truncation doesn't help either

>
trunc(as.Date("2000-1-1"),"days")-trunc(as.Date(subDataset$transplant.date),"days")
Time differences in days
 [1]  1  1  0  0  0  0 -2 -2 -2 -2 -2 -3 -3 -3 -3


Are there any useful tips to cope with this?

Thank you very much!
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] timezones - any practical solution?

2011-07-12 Thread B Laura
Hello all,

Could someone help me with the time zones in understandable & practical way?
I got completely stucked with this.
Have googled for a while and read the manuals, but without solutions...

---
When data imported  from Excel 2007 into R (2.13)
all time variables, depending on date (summer or winter) get (un-asked for
it!) a time zone addition CEST (for summer dates) or CET (for winter dates).

> Dataset
Start  End1  End2
days2End1.from.Exceldays2End2.from.Excel
days2End1.in.R   days2End2.in.R
1  2010-01-01  2011-01-01  2012-01-01
365 730
365 days  730. days
2  2010-02-01  2011-02-01  2012-01-01
365 699
365 days  699. days
3  2010-03-01  2011-03-01  2012-01-01
365 671
365 days  671. days
4  2010-04-01  2011-04-01  2012-01-01
365 640
 365 days  640.0417 days
5  2010-05-01  2011-05-01  2012-01-01
365 610
365 days  610.0417 days
6  2010-06-01  2011-06-01  2012-01-01
365 579
365 days  579.0417 days
7  2010-07-01  2011-07-01  2012-01-01
365 549
365 days  549.0417 days
8  2010-08-01  2011-08-01  2012-01-01
365 518
 365 days  518.0417 days
9  2010-09-01  2011-09-01  2012-01-01
365 487
365 days  487.0417 days
10 2010-10-01 2011-10-01  2012-01-01
365 457
 365 days  457.0417 days
11 2010-11-01 2011-11-01  2012-01-01
365 426
365 days  426. days
12 2010-12-01 2011-12-01  2012-01-01
365 396
365 days  396. days


Variables 'days2End1.from.Excel  and 'days2End2.from.Excel'   are alculated
in Excel.

Same calculation (with same outcome!) I would like to be able to perform
with R.

Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R.

> Dataset$days2End1.from.Excel
 [1] 365 365 365 365 365 365 365 365 365 365 365 365

> Dataset$days2End1.in.R <- with(Dataset, End1- Start)
> Dataset$days2End1.in.R

Time differences in days
 [1] 365 365 365 365 365 365 365 365 365 365 365 365
attr(,"tzone")
[1] ""

> Dataset$days2End2.from.Excel
 [1] 730 699 671 640 610 579 549 518 487 457 426 396

> Dataset$days2End2.in.R <- with(Dataset, End2- Start)

> Dataset$days2End2.in.R
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,"tzone")
[1] ""

Quastion 1:

As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at
time period April until October, when CEST (summer) times are recorded
 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417
giving decimals on days, where round days expected (640 610 579 549 518 487
457).


Can someone explain me how to deal with it in R?
What is the best way to calculate days in R getting correct calculations?

Question 2:

As I only need to work with dates without time and without time zones, I
would be happy  to remove them if possible.
I tried already the trunc() function but without succes. The result doesn't
change.

> Dataset$days2End2.in.R.TRUNC <- with(Dataset, trunc(End2)- trunc(Start))
> Dataset$days2End2.in.R.TRUNC
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,"tzone")
[1] ""

I would be happy if someone could light up this thing.

Many thanks in advance!
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] time zone - any practical solution?

2011-07-12 Thread B Laura
Hello all,

Could someone help me with the time zones in understandable & practical way?
I got completely stucked with this.

Have googled for a while and read the manuals, but without solutions...



---

When data imported  from Excel 2007 into R (2.13)

all time variables, depending on date (summer or winter) get (un-asked for
it!) a time zone addition CEST (for summer dates) or CET (for winter dates).



> Dataset
Start  End1  End2
days2End1.from.Exceldays2End2.from.Excel
days2End1.in.R   days2End2.in.R
1  2010-01-01  2011-01-01  2012-01-01
365 730
365 days  730. days
2  2010-02-01  2011-02-01  2012-01-01
365 699
365 days  699. days
3  2010-03-01  2011-03-01  2012-01-01
365 671
365 days  671. days
4  2010-04-01  2011-04-01  2012-01-01
365 640
365 days  640.0417 days
5  2010-05-01  2011-05-01  2012-01-01
365 610
365 days  610.0417 days
6  2010-06-01  2011-06-01  2012-01-01
365 579
365 days  579.0417 days
7  2010-07-01  2011-07-01  2012-01-01
365 549
365 days  549.0417 days
8  2010-08-01  2011-08-01  2012-01-01
365 518
365 days  518.0417 days
9  2010-09-01  2011-09-01  2012-01-01
365 487
365 days  487.0417 days
10 2010-10-01 2011-10-01  2012-01-01
365 457
365 days  457.0417 days
11 2010-11-01 2011-11-01  2012-01-01
365 426
365 days  426. days
12 2010-12-01 2011-12-01  2012-01-01
365 396
365 days  396. days

> Dataset$Start
 [1] "2010-01-01 CET"  "2010-02-01 CET"  "2010-03-01 CET"  "2010-04-01 CEST"
"2010-05-01 CEST" "2010-06-01 CEST" "2010-07-01 CEST" "2010-08-01 CEST"
"2010-09-01 CEST" "2010-10-01 CEST" "2010-11-01 CET"  "2010-12-01 CET"

> Dataset$End1
 [1] "2011-01-01 CET"  "2011-02-01 CET"  "2011-03-01 CET"  "2011-04-01 CEST"
"2011-05-01 CEST" "2011-06-01 CEST" "2011-07-01 CEST" "2011-08-01 CEST"
"2011-09-01 CEST" "2011-10-01 CEST" "2011-11-01 CET"  "2011-12-01 CET"

> Dataset$End2
 [1] "2012-01-01 CET" "2012-01-01 CET" "2012-01-01 CET" "2012-01-01 CET"
"2012-01-01 CET" "2012-01-01 CET" "2012-01-01 CET" "2012-01-01 CET"
"2012-01-01 CET" "2012-01-01 CET" "2012-01-01 CET" "2012-01-01 CET"



Variables 'days2End1.from.Excel  and 'days2End2.from.Excel'   are calculated
in Excel.



Same calculation (with same outcome!) I would like to be able to perform
with R.



Variables 'days2End1.in.R' and 'days2End2.in.R are calculated with R.



> Dataset$days2End1.from.Excel
 [1] 365 365 365 365 365 365 365 365 365 365 365 365



> Dataset$days2End1.in.R <- with(Dataset, End1- Start)

> Dataset$days2End1.in.R


Time differences in days
 [1] 365 365 365 365 365 365 365 365 365 365 365 365
attr(,"tzone")
[1] ""


> Dataset$days2End2.from.Excel
 [1] 730 699 671 640 610 579 549 518 487 457 426 396



> Dataset$days2End2.in.R <- with(Dataset, End2- Start)



> Dataset$days2End2.in.R
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,"tzone")
[1] ""



Quastion 1:



As you can see 'Dataset$days2End2.in.R' gives wrong 'day' calculation at
time period April until October, when CEST (summer) times are recorded

 640.0417 610.0417 579.0417 549.0417 518.0417 487.0417 457.0417

giving decimals on days, where round days expected (640 610 579 549 518 487
457).





Can someone explain me how to deal with it in R?

What is the best way to calculate days in R getting correct calculations?



Question 2:



As I only need to work with dates without time and without time zones, I
would be happy  to remove them if possible.

I tried already the trunc() function but without succes. The result doesn't
change.



> Dataset$days2End2.in.R.TRUNC <- with(Dataset, trunc(End2)- trunc(Start))

> Dataset$days2End2.in.R.TRUNC
Time differences in days
 [1] 730. 699. 671. 640.0417 610.0417 579.0417 549.0417 518.0417
487.0417 457.0417 426. 396.
attr(,"tzone")
[1] ""




I would be happy if someone could light up this thing.



Many thanks in advance!

Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the p

Re: [R] time zone - any practical solution?

2011-07-12 Thread B Laura
Hi Jim,

by dropping them down it gives 1 day less than it should do, on all timezone
notations CEST and CET.


> start
 [1] "2002-09-04 CEST" "2000-07-27 CEST" "2003-01-04 CET"  "2001-06-29 CEST"
"2005-01-12 CET"  "2000-05-28 CEST" "2002-06-01 CEST" "2000-06-02 CEST"
"2000-02-27 CET"  "2000-09-29 CEST" "2003-10-22 CEST" "2002-06-03 CEST"
[13] "2004-12-30 CET"  "2000-04-07 CEST" "2006-02-03 CET"  "2003-06-12 CEST"
"2004-07-15 CEST" "2000-04-29 CEST" "2000-05-06 CEST" "2004-10-27 CEST"

> start <- format(as.Date(start,"%Y-%m-%d"),"%Y-%m-%d")
> start
 [1] "2002-09-03" "2000-07-26" "2003-01-03" "2001-06-28" "2005-01-11"
"2000-05-27" "2002-05-31" "2000-06-01" "2000-02-26" "2000-09-28"
"2003-10-21" "2002-06-02" "2004-12-29" "2000-04-06" "2006-02-02"
"2003-06-11" "2004-07-14"
[18] "2000-04-28" "2000-05-05" "2004-10-26"




2011/7/12 Jim Lemon 

> On 07/12/2011 08:58 PM, B Laura wrote:
>
>> Hello all,
>>
>> Could someone help me with the time zones in understandable&  practical
>> way?
>> I got completely stucked with this.
>>
>> Have googled for a while and read the manuals, but without solutions...
>>
>>
>>
>> --**--**---
>>
>> When data imported  from Excel 2007 into R (2.13)
>>
>> all time variables, depending on date (summer or winter) get (un-asked for
>> it!) a time zone addition CEST (for summer dates) or CET (for winter
>> dates).
>>
>>
>>
>>  Dataset
>>>
>> Start  End1  End2
>> days2End1.from.Exceldays2End2.from.Excel
>> days2End1.in.R   days2End2.in.R
>> 1  2010-01-01  2011-01-01  2012-01-01
>> 365 730
>> 365 days  730. days
>> 2  2010-02-01  2011-02-01  2012-01-01
>> 365 699
>> 365 days  699. days
>> 3  2010-03-01  2011-03-01  2012-01-01
>> 365 671
>> 365 days  671. days
>> 4  2010-04-01  2011-04-01  2012-01-01
>> 365 640
>> 365 days  640.0417 days
>> 5  2010-05-01  2011-05-01  2012-01-01
>> 365 610
>> 365 days  610.0417 days
>> 6  2010-06-01  2011-06-01  2012-01-01
>> 365 579
>> 365 days  579.0417 days
>> 7  2010-07-01  2011-07-01  2012-01-01
>> 365 549
>> 365 days  549.0417 days
>> 8  2010-08-01  2011-08-01  2012-01-01
>> 365 518
>> 365 days  518.0417 days
>> 9  2010-09-01  2011-09-01  2012-01-01
>> 365 487
>> 365 days  487.0417 days
>> 10 2010-10-01 2011-10-01  2012-01-01
>> 365 457
>> 365 days  457.0417 days
>> 11 2010-11-01 2011-11-01  2012-01-01
>> 365 426
>> 365 days  426. days
>> 12 2010-12-01 2011-12-01  2012-01-01
>> 365 396
>> 365 days  396. days
>>
>>  Dataset$Start
>>>
>>  [1] "2010-01-01 CET"  "2010-02-01 CET"  "2010-03-01 CET"  "2010-04-01
>> CEST"
>> "2010-05-01 CEST" "2010-06-01 CEST" "2010-07-01 CEST" "2010-08-01 CEST"
>> "2010-09-01 CEST" "2010-10-01 CEST" "2010-11-01 CET"  "2010-12-01 CET"
>>
>>  Dataset$End1
>>>
>>  [1] "2011-01-01 CET"  "2011-02-01 CET"  "2011-03-01 CET"  "2011-04-01
>> CEST"
>> "2011-05-01 CEST" "2011-06-01 CEST" "2011-07-01 CEST" "2011-08-01 CEST"
>> "2011-09-01 CEST" "2011-10-01 CEST" "2011-11-01 CET"  "2011-12-01 CET"
>>
>>  Dataset$End2
>&

Re: [R] time zone - any practical solution?

2011-07-12 Thread B Laura
Dear Gabor

http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel
doesnt describe handling dates with daylight saving time issues.

R classes Date can remove time and timezone, however calculating days
difference between two manipulated variables same problem appear if handling
these without Dates.

R News 4/1 doesnt provide solution to this neither.

Have read and struggled with this stuff for 3 days.

Anyone else who could help on this?

Regards,Laura.


2011/7/12 Gabor Grothendieck 

> On Tue, Jul 12, 2011 at 6:58 AM, B Laura  wrote:
> > Hello all,
> >
> > Could someone help me with the time zones in understandable & practical
> way?
> > I got completely stucked with this.
> >
> > Have googled for a while and read the manuals, but without solutions...
> >
> >
> >
> > ---
> >
> > When data imported  from Excel 2007 into R (2.13)
> >
> > all time variables, depending on date (summer or winter) get (un-asked
> for
> > it!) a time zone addition CEST (for summer dates) or CET (for winter
> dates).
> >
>
> Read
> http://rwiki.sciviews.org/doku.php?id=tips:data-io:ms_windows&s=excel
> which gives many ways of reading Excel into R and read R News 4/1
> which discusses appropriate R classes to use (you would b best to use
> Date, not POSIXct, in which case you could not have time zone problems
> in the first place) and internal representations of R vs. Excel.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] max & min values within dataframe

2011-11-14 Thread B Laura
dear R-team

I need to find the min, max values for each patient from dataset and keep
the output of it as a dataframe with the following columns
 - Patient nr
 - Region (remains same per patient)
 - Min score
 - Max score


Patient Region Score Time
11  X19   28
21  X20  126
31  X22  100
41  X25  191
52  Y121
62  Y122
72  Y254
82  Y267
93  X 61
10   3  X 64
11   3  X21   31
12   3  X22   68
13   3  X23   31
14   3  X24   38
15   3  X21   15
16   3  X22   24
17   3  X23   15
18   3  X24  243
19   3  X25   77
20   4  Y 65
21   4  Y22   28
22   4  Y23   75
23   4  Y24   19
24   5  Y233
25   5  Y241
26   5  Y23   33
27   5  Y24   13
28   5  Y25   42
29   5  Y26   21
30   5  Y274
31   6  Y244
32   6  Y328

So far I could find the min and max values for each patient, but the output
of it is not (yet) what I need.

> Patient.nr = unique(Patient)
> aggregate(Score, list(Patient), max)
  Group.1  x
1   1 25
2   2 26
3   3 25
4   4 24
5   5 27
6   6 32

> aggregate(Score, list(Patient), min)
  Group.1  x
1   1 19
2   2 12
3   3  6
4   4  6
5   5 23
6   6 24
I would like to do same but writing this new information (min, max values)
in a dataframe with following columns
 - Patient nr
- Region (remains same per patient)
- Min score
- Max score

Can anybody help me with this?

Thanks
Laura

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] max & min values within dataframe

2011-11-14 Thread B Laura
Thanks for these various tips.

Sarah, this is not a howework, but a simplified dataset speecificly for
this question.

Laura
.

2011/11/14 Dennis Murphy 

> Groupwise data summarization is a very common task, and it is worth
> learning the various ways to do it in R. Josh showed you one way to
> use aggregate() from the base package and Michael showed you one way
> of using the plyr package to do the same; another way would be
>
> ddply(df, .(Patient, Region), summarise, max = max(Score), min =
> min(Score))
>
> to save on writing an explicit function. Similarly, if you have a
> version of R >= 2.11.0, the aggregate() function now has a nice
> formula interface, so Josh's code could also be written as
>
> aggregate(Score ~ Patient + Region, data = df, FUN = range)
>
> with a subsequent renaming of the variables as shown.
>
> Other packages that could perform this task with ease include the doBy
> package, the data.table package, the remix package, the Hmisc package
> and, if you are comfortable with SQL, the sqldf package. For relative
> novices, the doBy package is a very nice place to start because it
> comes with a well written vignette and the function names correspond
> well with the tasks they perform (e.g., summaryBy(), transformBy()).
> The plyr and data.table packages are more general and more powerful in
> terms of the types of tasks to which each is suited. Unlike
> aggregate() and doBy:::summaryBy(), these packages can process
> multivariable functions. As noted above, if you have an SQL
> background, sqldf operates on R data objects as though they were SQL
> tables, which is advantageous in complex data extraction tasks.
> Package remix is useful if you want to organize results into a tabular
> form that is reminiscent of SAS.
>
> HTH,
> Dennis
>
> On Mon, Nov 14, 2011 at 8:10 AM, B Laura  wrote:
> > dear R-team
> >
> > I need to find the min, max values for each patient from dataset and keep
> > the output of it as a dataframe with the following columns
> >  - Patient nr
> >  - Region (remains same per patient)
> >  - Min score
> >  - Max score
> >
> >
> >Patient Region Score Time
> > 11  X19   28
> > 21  X20  126
> > 31  X22  100
> > 41  X25  191
> > 52  Y121
> > 62  Y122
> > 72  Y254
> > 82  Y267
> > 93  X 61
> > 10   3  X 64
> > 11   3  X21   31
> > 12   3  X22   68
> > 13   3  X23   31
> > 14   3  X24   38
> > 15   3  X21   15
> > 16   3  X22   24
> > 17   3  X23   15
> > 18   3  X24  243
> > 19   3  X25   77
> > 20   4  Y 65
> > 21   4  Y22   28
> > 22   4  Y23   75
> > 23   4  Y24   19
> > 24   5  Y233
> > 25   5  Y241
> > 26   5  Y23   33
> > 27   5  Y24   13
> > 28   5  Y25   42
> > 29   5  Y26   21
> > 30   5  Y274
> > 31   6  Y244
> > 32   6  Y328
> >
> > So far I could find the min and max values for each patient, but the
> output
> > of it is not (yet) what I need.
> >
> >> Patient.nr = unique(Patient)
> >> aggregate(Score, list(Patient), max)
> >  Group.1  x
> > 1   1 25
> > 2   2 26
> > 3   3 25
> > 4   4 24
> > 5   5 27
> > 6   6 32
> >
> >> aggregate(Score, list(Patient), min)
> >  Group.1  x
> > 1   1 19
> > 2   2 12
> > 3   3  6
> > 4   4  6
> > 5   5 23
> > 6   6 24
> > I would like to do same but writing this new information (min, max
> values)
> > in a dataframe with following columns
> >  - Patient nr
> > - Region (remains same per patient)
> > - Min score
> > - Max score
> >
> > Can anybody help me with this?
> >
> > Thanks
> > Laura
> >
> >[[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.