Re: [Rd] [datatable-help] speeding up perception

2011-07-12 Thread Matthew Dowle
> Matthew,
>
> I was hoping I misunderstood you first proposal, but I suspect I did not
> ;).
>
> Personally, I find  DT[1,V1 <- 3] highly disturbing - I would expect it to
> evaluate to
> { V1 <- 3; DT[1, V1] }
> thus returning the first element of the third column.

Please see FAQ 1.1, since further below it seems to be an expectation
issue about 'with' syntax, too.

>
> That said, I don't think it works, either. Taking you example and
> data.table form r-forge:
[ snip ]
> as you can see, DT is not modified.

Works for me on R 2.13.0. I'll try latest R later. If I can't reproduce
the non-working state I'll need some more environment information please.

> Also I suspect there is something quite amiss because even trivial things
> don't work:
>
>> DF[1:4,1:4]
>   V1 V2 V3 V4
> 1  3  1  1  1
> 2  1  1  1  1
> 3  1  1  1  1
> 4  1  1  1  1
>> DT[1:4,1:4]
> [1] 1 2 3 4

That's correct and fundamental to data.table. See FAQs 1.1, 1.7, 1.8, 1.9
and 1.10.

>
> When I first saw your proposal, I thought you have rather something like
> within(DT, V1[1] <- 3)
> in mind which looks innocent enough but performs terribly (note that I had
> to scale down the loop by a factor of 100!!!):
>
>> system.time(for (i in 1:10) within(DT, V1[1] <- 3))
>user  system elapsed
>   2.701   4.437   7.138

No, since 'with' is already built into data.table, I was thinking of
building 'within' in, too. I'll take a look at within(). Might as well
provide as many options as possible to the user to use as they wish.

> With the for loop something like within(DF, for (i in 1:1000) V1[i] <- 3))
> performs reasonably:
>
>> system.time(within(DT, for (i in 1:1000) V1[i] <- 3))
>user  system elapsed
>   0.392   0.613   1.003
>
> (Note: system.time() can be misleading when within() is involved, because
> the expression is evaluated in a different environment so within() won't
> actually change the object in the  global environment - it also interacts
> with the possible duplication)

Noted, thanks. That's pretty fast. Does within() on data.frame fix the
original issue Ivo raised, then?  If so, job done.

>
> Cheers,
> Simon
>
> On Jul 11, 2011, at 8:21 PM, Matthew Dowle wrote:
>
>> Thanks for the replies and info. An attempt at fast
>> assign is now committed to data.table v1.6.3 on
>> R-Forge. From NEWS :
>>
>> o   Fast update is now implemented, FR#200.
>>DT[i,j]<-value is now handled by data.table in C rather
>>than falling through to data.frame methods.
>>
>>Thanks to Ivo Welch for raising speed issues on r-devel,
>>to Simon Urbanek for the suggestion, and Luke Tierney and
>>Simon for information on R internals.
>>
>>[<- syntax still incurs one working copy of the whole
>>table (as of R 2.13.0) due to R's [<- dispatch mechanism
>>copying to `*tmp*`, so, for ultimate speed and brevity,
>>'within' syntax is now available as follows.
>>
>> o   A new 'within' argument has been added to [.data.table,
>>by default TRUE. It is very similar to the within()
>>function in base R. If an assignment appears in j, it
>>assigns to the column of DT, by reference; e.g.,
>>
>>DT[i,colname<-value]
>>
>>This syntax makes no copies of any part of memory at all.
>>
>>> m = matrix(1,nrow=10,ncol=100)
>>> DF = as.data.frame(m)
>>> DT = as.data.table(m)
>>> system.time(for (i in 1:1000) DF[1,1] <- 3)
>>   user  system elapsed
>>287.730 323.196 613.453
>>> system.time(for (i in 1:1000) DT[1,V1 <- 3])
>>   user  system elapsed
>>  1.152   0.004   1.161 # 528 times faster
>>
>> Please note :
>>
>>***
>>**  Within syntax is presently highly experimental.  **
>>***
>>
>> http://datatable.r-forge.r-project.org/
>>
>>
>> On Wed, 2011-07-06 at 09:08 -0500, luke-tier...@uiowa.edu wrote:
>>> On Wed, 6 Jul 2011, Simon Urbanek wrote:
>>>
 Interesting, and I stand corrected:

> x = data.frame(a=1:n,b=1:n)
> .Internal(inspect(x))
 @103511c00 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
 @102c7b000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
 @102af3000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...

> x[1,1]=42L
> .Internal(inspect(x))
 @10349c720 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
 @102c19000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
 @102b55000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...

> x[[1]][1]=42L
> .Internal(inspect(x))
 @103511a78 19 VECSXP g1c2 [OBJ,MARK,NAM(2),ATT] (len=2, tl=0)
 @102e65000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
 @101f14000 13 INTSXP g1c7 [MARK] (len=10, tl=0) 1,2,3,4,5,...

> x[[1]][1]=42L
> .Internal(inspect(x))
 @10349c800 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
 @102a2f000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
 @102ec7000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5

[Rd] [linux] connection never times out

2011-07-12 Thread jeroen00ms
According to the download.file manual the timeout of a connection can be set
using options(timeout=10). This seems to work as expected on windows, but on
linux the connection does not timeout. I reproduced the problem both 0on
R-2.13 on Ubuntu and on R-2.12.1 on CentOS, but not in Windows.

> options(timeout=5)
> download.file("http://123.123.123.123/bla";, dest=tempfile())

I am running Ubuntu 11.04 with the R binaries from CRAN:

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 
> 

--
View this message in context: 
http://r.789695.n4.nabble.com/linux-connection-never-times-out-tp3662088p3662088.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [datatable-help] speeding up perception

2011-07-12 Thread Matthew Dowle
Thanks for the replies and info. An attempt at fast
assign is now committed to data.table v1.6.3 on
R-Forge. From NEWS :

o   Fast update is now implemented, FR#200.
DT[i,j]<-value is now handled by data.table in C rather
than falling through to data.frame methods.

Thanks to Ivo Welch for raising speed issues on r-devel,
to Simon Urbanek for the suggestion, and Luke Tierney and
Simon for information on R internals.

[<- syntax still incurs one working copy of the whole
table (as of R 2.13.0) due to R's [<- dispatch mechanism
copying to `*tmp*`, so, for ultimate speed and brevity,
'within' syntax is now available as follows.

o   A new 'within' argument has been added to [.data.table,
by default TRUE. It is very similar to the within()
function in base R. If an assignment appears in j, it
assigns to the column of DT, by reference; e.g.,
 
DT[i,colname<-value]

This syntax makes no copies of any part of memory at all.

> m = matrix(1,nrow=10,ncol=100)
> DF = as.data.frame(m)
> DT = as.data.table(m)
> system.time(for (i in 1:1000) DF[1,1] <- 3)
   user  system elapsed 
287.730 323.196 613.453 
> system.time(for (i in 1:1000) DT[1,V1 <- 3])
   user  system elapsed 
  1.152   0.004   1.161 # 528 times faster

Please note :

***
**  Within syntax is presently highly experimental.  **
***

http://datatable.r-forge.r-project.org/


On Wed, 2011-07-06 at 09:08 -0500, luke-tier...@uiowa.edu wrote:
> On Wed, 6 Jul 2011, Simon Urbanek wrote:
> 
> > Interesting, and I stand corrected:
> >
> >> x = data.frame(a=1:n,b=1:n)
> >> .Internal(inspect(x))
> > @103511c00 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
> >  @102c7b000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
> >  @102af3000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
> >
> >> x[1,1]=42L
> >> .Internal(inspect(x))
> > @10349c720 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
> >  @102c19000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
> >  @102b55000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
> >
> >> x[[1]][1]=42L
> >> .Internal(inspect(x))
> > @103511a78 19 VECSXP g1c2 [OBJ,MARK,NAM(2),ATT] (len=2, tl=0)
> >  @102e65000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
> >  @101f14000 13 INTSXP g1c7 [MARK] (len=10, tl=0) 1,2,3,4,5,...
> >
> >> x[[1]][1]=42L
> >> .Internal(inspect(x))
> > @10349c800 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
> >  @102a2f000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
> >  @102ec7000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
> >
> >
> > I have R to release ;) so I won't be looking into this right now, but it's 
> > something worth investigating ... Since all the inner contents have NAMED=0 
> > I would not expect any duplication to be needed, but apparently becomes so 
> > is at some point ...
> 
> 
> The internals assume in various places that deep copies are made (one
> of the reasons NAMED setings are not propagated to sub-sturcture).
> The main issues are avoiding cycles and that there is no easy way to
> check for sharing.  There may be some circumstances in which a shallow
> copy would be OK but making sure it would be in all cases is probably
> more trouble than it is worth at this point. (I've tried this in the
> past in a few cases and always had to back off.)
> 
> 
> Best,
> 
> luke
> 
> >
> > Cheers,
> > Simon
> >
> >
> > On Jul 6, 2011, at 4:36 AM, Matthew Dowle wrote:
> >
> >>
> >> On Tue, 2011-07-05 at 21:11 -0400, Simon Urbanek wrote:
> >>> No subassignment function satisfies that condition, because you can 
> >>> always call them directly. However, that doesn't stop the default method 
> >>> from making that assumption, so I'm not sure it's an issue.
> >>>
> >>> David, Just to clarify - the data frame content is not copied, we are 
> >>> talking about the vector holding columns.
> >>
> >> If it is just the vector holding the columns that is copied (and not the
> >> columns themselves), why does n make a difference in this test (on R
> >> 2.13.0)?
> >>
> >>> n = 1000
> >>> x = data.frame(a=1:n,b=1:n)
> >>> system.time(for (i in 1:1000) x[1,1] <- 42L)
> >>   user  system elapsed
> >>  0.628   0.000   0.628
> >>> n = 10
> >>> x = data.frame(a=1:n,b=1:n)  # still 2 columns, but longer columns
> >>> system.time(for (i in 1:1000) x[1,1] <- 42L)
> >>   user  system elapsed
> >> 20.145   1.232  21.455
> >>>
> >>
> >> With $<- :
> >>
> >>> n = 1000
> >>> x = data.frame(a=1:n,b=1:n)
> >>> system.time(for (i in 1:1000) x$a[1] <- 42L)
> >>   user  system elapsed
> >>  0.304   0.000   0.307
> >>> n = 10
> >>> x = data.frame(a=1:n,b=1:n)
> >>> system.time(for (i in 1:1000) x$a[1] <- 42L)
> >>   user  system elapsed
> >> 37.586   0.388  38.161
> >>>
> >>
> >> If it's because the 1st column needs to be c

[Rd] save.image compression_level argument

2011-07-12 Thread andreas
Hi,

in "save.image", it would be nice if there was a "compression_level"
argument that is passed along to "save".

Or is there a reason for disabling the "compression_level" option for
saving workspaces, but enabling it for manually saving individual
objects?

Thanks,
Andreas

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [datatable-help] speeding up perception

2011-07-12 Thread Matthew Dowle

Simon,
If you didn't install.packages() with method="source" from R-Forge, that
would explain (some of) it. R-Forge builds binaries once each night. This
commit was long after the cutoff.
Matthew

>> Matthew,
>>
>> I was hoping I misunderstood you first proposal, but I suspect I did not
>> ;).
>>
>> Personally, I find  DT[1,V1 <- 3] highly disturbing - I would expect it
>> to
>> evaluate to
>> { V1 <- 3; DT[1, V1] }
>> thus returning the first element of the third column.
>
> Please see FAQ 1.1, since further below it seems to be an expectation
> issue about 'with' syntax, too.
>
>>
>> That said, I don't think it works, either. Taking you example and
>> data.table form r-forge:
> [ snip ]
>> as you can see, DT is not modified.
>
> Works for me on R 2.13.0. I'll try latest R later. If I can't reproduce
> the non-working state I'll need some more environment information please.
>
>> Also I suspect there is something quite amiss because even trivial
>> things
>> don't work:
>>
>>> DF[1:4,1:4]
>>   V1 V2 V3 V4
>> 1  3  1  1  1
>> 2  1  1  1  1
>> 3  1  1  1  1
>> 4  1  1  1  1
>>> DT[1:4,1:4]
>> [1] 1 2 3 4
>
> That's correct and fundamental to data.table. See FAQs 1.1, 1.7, 1.8, 1.9
> and 1.10.
>
>>
>> When I first saw your proposal, I thought you have rather something like
>> within(DT, V1[1] <- 3)
>> in mind which looks innocent enough but performs terribly (note that I
>> had
>> to scale down the loop by a factor of 100!!!):
>>
>>> system.time(for (i in 1:10) within(DT, V1[1] <- 3))
>>user  system elapsed
>>   2.701   4.437   7.138
>
> No, since 'with' is already built into data.table, I was thinking of
> building 'within' in, too. I'll take a look at within(). Might as well
> provide as many options as possible to the user to use as they wish.
>
>> With the for loop something like within(DF, for (i in 1:1000) V1[i] <-
>> 3))
>> performs reasonably:
>>
>>> system.time(within(DT, for (i in 1:1000) V1[i] <- 3))
>>user  system elapsed
>>   0.392   0.613   1.003
>>
>> (Note: system.time() can be misleading when within() is involved,
>> because
>> the expression is evaluated in a different environment so within() won't
>> actually change the object in the  global environment - it also
>> interacts
>> with the possible duplication)
>
> Noted, thanks. That's pretty fast. Does within() on data.frame fix the
> original issue Ivo raised, then?  If so, job done.
>
>>
>> Cheers,
>> Simon
>>
>> On Jul 11, 2011, at 8:21 PM, Matthew Dowle wrote:
>>
>>> Thanks for the replies and info. An attempt at fast
>>> assign is now committed to data.table v1.6.3 on
>>> R-Forge. From NEWS :
>>>
>>> o   Fast update is now implemented, FR#200.
>>>DT[i,j]<-value is now handled by data.table in C rather
>>>than falling through to data.frame methods.
>>>
>>>Thanks to Ivo Welch for raising speed issues on r-devel,
>>>to Simon Urbanek for the suggestion, and Luke Tierney and
>>>Simon for information on R internals.
>>>
>>>[<- syntax still incurs one working copy of the whole
>>>table (as of R 2.13.0) due to R's [<- dispatch mechanism
>>>copying to `*tmp*`, so, for ultimate speed and brevity,
>>>'within' syntax is now available as follows.
>>>
>>> o   A new 'within' argument has been added to [.data.table,
>>>by default TRUE. It is very similar to the within()
>>>function in base R. If an assignment appears in j, it
>>>assigns to the column of DT, by reference; e.g.,
>>>
>>>DT[i,colname<-value]
>>>
>>>This syntax makes no copies of any part of memory at all.
>>>
 m = matrix(1,nrow=10,ncol=100)
 DF = as.data.frame(m)
 DT = as.data.table(m)
 system.time(for (i in 1:1000) DF[1,1] <- 3)
>>>   user  system elapsed
>>>287.730 323.196 613.453
 system.time(for (i in 1:1000) DT[1,V1 <- 3])
>>>   user  system elapsed
>>>  1.152   0.004   1.161 # 528 times faster
>>>
>>> Please note :
>>>
>>>***
>>>**  Within syntax is presently highly experimental.  **
>>>***
>>>
>>> http://datatable.r-forge.r-project.org/
>>>
>>>
>>> On Wed, 2011-07-06 at 09:08 -0500, luke-tier...@uiowa.edu wrote:
 On Wed, 6 Jul 2011, Simon Urbanek wrote:

> Interesting, and I stand corrected:
>
>> x = data.frame(a=1:n,b=1:n)
>> .Internal(inspect(x))
> @103511c00 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
> @102c7b000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
> @102af3000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
>
>> x[1,1]=42L
>> .Internal(inspect(x))
> @10349c720 19 VECSXP g0c2 [OBJ,NAM(2),ATT] (len=2, tl=0)
> @102c19000 13 INTSXP g0c7 [] (len=10, tl=0) 42,2,3,4,5,...
> @102b55000 13 INTSXP g0c7 [] (len=10, tl=0) 1,2,3,4,5,...
>
>> x[[1]][1]=42L
>> .Internal(inspect(x))
> @103511a78 19 VECSXP g1c2 [OBJ,MARK,NAM(2),ATT] (len=2, tl=0)
> @102e65000 13

Re: [Rd] [datatable-help] speeding up perception

2011-07-12 Thread Simon Urbanek
On Jul 12, 2011, at 6:24 AM, Matthew Dowle wrote:

>> Matthew,
>> 
>> I was hoping I misunderstood you first proposal, but I suspect I did not
>> ;).
>> 
>> Personally, I find  DT[1,V1 <- 3] highly disturbing - I would expect it to
>> evaluate to
>> { V1 <- 3; DT[1, V1] }
>> thus returning the first element of the third column.
> 
> Please see FAQ 1.1, since further below it seems to be an expectation
> issue about 'with' syntax, too.
> 

Just to clarify - the NEWS has led me to believe that the destructive DT[i, x 
<- y] syntax is new. That is what my objection is about. I'm fine with 
subsetting operators working on expressions but I'm not happy with subsetting 
operators modifying the the object they are subsetting - since it's subsetting 
not subassignemnt - that's what I was referring to.



>> That said, I don't think it works, either. Taking you example and
>> data.table form r-forge:
> [ snip ]
>> as you can see, DT is not modified.
> 
> Works for me on R 2.13.0. I'll try latest R later. If I can't reproduce
> the non-working state I'll need some more environment information please.
> 

The issue persist on several machines I tested - including R 2.13.0:

> sessionInfo()
R version 2.13.0 Patched (2011-05-15 r55914)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] data.table_1.6.3


> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-unknown-linux-gnu/amd64 (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] data.table_1.6.3
> DT = as.data.table(m)
> for (i in 1:1000) DT[1,V1 <- 3]
> DT[1,]
 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
[1,]  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1   1   1   1   1   1   1



>> Also I suspect there is something quite amiss because even trivial things
>> don't work:
>> 
>>> DF[1:4,1:4]
>>  V1 V2 V3 V4
>> 1  3  1  1  1
>> 2  1  1  1  1
>> 3  1  1  1  1
>> 4  1  1  1  1
>>> DT[1:4,1:4]
>> [1] 1 2 3 4
> 
> That's correct and fundamental to data.table. See FAQs 1.1, 1.7, 1.8, 1.9
> and 1.10.
> 

Fair enough, I expected data.table to be a drop-in replacement of data.frames - 
I just wanted to check the values. Apparently it's not, by design, hence 
assumption was wrong.


>> 
>> When I first saw your proposal, I thought you have rather something like
>> within(DT, V1[1] <- 3)
>> in mind which looks innocent enough but performs terribly (note that I had
>> to scale down the loop by a factor of 100!!!):
>> 
>>> system.time(for (i in 1:10) within(DT, V1[1] <- 3))
>>   user  system elapsed
>>  2.701   4.437   7.138
> 
> No, since 'with' is already built into data.table, I was thinking of
> building 'within' in, too. I'll take a look at within(). Might as well
> provide as many options as possible to the user to use as they wish.
> 
>> With the for loop something like within(DF, for (i in 1:1000) V1[i] <- 3))
>> performs reasonably:
>> 
>>> system.time(within(DT, for (i in 1:1000) V1[i] <- 3))
>>   user  system elapsed
>>  0.392   0.613   1.003
>> 
>> (Note: system.time() can be misleading when within() is involved, because
>> the expression is evaluated in a different environment so within() won't
>> actually change the object in the  global environment - it also interacts
>> with the possible duplication)
> 
> Noted, thanks. That's pretty fast. Does within() on data.frame fix the
> original issue Ivo raised, then?  If so, job done.
> 

I don't think so - at least not in the strict sense of no copies (more digging 
may be needed, though, since it does so in system.time, possibly due to the 
NAMED value of the forced promise but I did not check). However, it allows to 
express the modification inside the expression which will save the global copy 
and thus be faster that the outside loop.

Cheers,
Simon



>> 
>> Cheers,
>> Simon
>> 
>> On Jul 11, 2011, at 8:21 PM, Matthew Dowle wrote:
>> 
>>> Thanks for the replies and info. An attempt at fast
>>> assign is now committed to data.table v1.6.3 on
>>> R-Forge. From NEWS :
>>> 
>>> o   Fast update is now implemented, FR#200.
>>>   DT[i,j]<-value is now handled by data.table in C rather
>>>   than falling through to data.frame methods.
>>> 
>>>   Thanks to Ivo Welch for raising speed issues on r-devel,
>>>   to Simon Urbanek for the suggestion, and Luke Tierney and
>>>   Simon for information on R internals.
>>> 
>>>   [<- syntax still incurs on

Re: [Rd] [linux] connection never times out

2011-07-12 Thread Uwe Ligges

?connections tells us:

"Note that this is a timeout for no response, not for the whole operation."

And indeed, it will take roughly 20 seconds rather than 60 - at least on 
the Linux machine I tried it on with R-2.13.1.


Best,
Uwe Ligges




On 12.07.2011 14:24, jeroen00ms wrote:

According to the download.file manual the timeout of a connection can be set
using options(timeout=10). This seems to work as expected on windows, but on
linux the connection does not timeout. I reproduced the problem both 0on
R-2.13 on Ubuntu and on R-2.12.1 on CentOS, but not in Windows.


options(timeout=5)
download.file("http://123.123.123.123/bla";, dest=tempfile())


I am running Ubuntu 11.04 with the R binaries from CRAN:


sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: i686-pc-linux-gnu (32-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base




--
View this message in context: 
http://r.789695.n4.nabble.com/linux-connection-never-times-out-tp3662088p3662088.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [linux] connection never times out

2011-07-12 Thread Jeroen Ooms
> Can you please verify the behaviour is still the same in a recent R-devel or
> at least R-2.13.1? And that there was no other already answered request on
> R-help or R-devel re. timeouts?

The code below is R 2.13.1. It shows that the timeout time is more
than 3 minutes, although it was set to 5 seconds.

> options(timeout=5)
> system.time(download.file("http://123.123.123.123";, dest=tempfile()))
trying URL 'http://123.123.123.123'
Error in download.file("http://123.123.123.123";, dest = tempfile()) :
  cannot open URL 'http://123.123.123.123'
In addition: Warning message:
In download.file("http://123.123.123.123";, dest = tempfile()) :
  unable to connect to '123.123.123.123' on port 80.
Timing stopped at: 0 0 189.375
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i686-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [linux] connection never times out

2011-07-12 Thread Simon Urbanek

On Jul 12, 2011, at 4:22 PM, Jeroen Ooms wrote:

>> Can you please verify the behaviour is still the same in a recent R-devel or
>> at least R-2.13.1? And that there was no other already answered request on
>> R-help or R-devel re. timeouts?
> 
> The code below is R 2.13.1. It shows that the timeout time is more
> than 3 minutes, although it was set to 5 seconds.
> 

Please set
options(internet.info=0)
and re-run your test.
Are you running this from a command-line R or do you have any graphics or GUIs 
running? (I'm asking because any fast handler activity will cancel timeouts)

Thanks,
Simon


>> options(timeout=5)
>> system.time(download.file("http://123.123.123.123";, dest=tempfile()))
> trying URL 'http://123.123.123.123'
> Error in download.file("http://123.123.123.123";, dest = tempfile()) :
>  cannot open URL 'http://123.123.123.123'
> In addition: Warning message:
> In download.file("http://123.123.123.123";, dest = tempfile()) :
>  unable to connect to '123.123.123.123' on port 80.
> Timing stopped at: 0 0 189.375
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: i686-pc-linux-gnu (32-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
> [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [linux] connection never times out

2011-07-12 Thread Simon Urbanek
Never mind, I found the issue - contrary to the documentation Linux does modify 
tv in the call to select() so our measure of elapsed time doesn't increase. 
Work-around now present in R-devel.

Cheers,
Simon


On Jul 12, 2011, at 5:23 PM, Simon Urbanek wrote:

> 
> On Jul 12, 2011, at 4:22 PM, Jeroen Ooms wrote:
> 
>>> Can you please verify the behaviour is still the same in a recent R-devel or
>>> at least R-2.13.1? And that there was no other already answered request on
>>> R-help or R-devel re. timeouts?
>> 
>> The code below is R 2.13.1. It shows that the timeout time is more
>> than 3 minutes, although it was set to 5 seconds.
>> 
> 
> Please set
> options(internet.info=0)
> and re-run your test.
> Are you running this from a command-line R or do you have any graphics or 
> GUIs running? (I'm asking because any fast handler activity will cancel 
> timeouts)
> 
> Thanks,
> Simon
> 
> 
>>> options(timeout=5)
>>> system.time(download.file("http://123.123.123.123";, dest=tempfile()))
>> trying URL 'http://123.123.123.123'
>> Error in download.file("http://123.123.123.123";, dest = tempfile()) :
>> cannot open URL 'http://123.123.123.123'
>> In addition: Warning message:
>> In download.file("http://123.123.123.123";, dest = tempfile()) :
>> unable to connect to '123.123.123.123' on port 80.
>> Timing stopped at: 0 0 189.375
>>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: i686-pc-linux-gnu (32-bit)
>> 
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>> [9] LC_ADDRESS=C   LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] stats graphics  grDevices utils datasets  methods   base
>>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel