[Rd] hash access to data.frame uses prefix? (PR#10474)

2007-11-28 Thread pkensche
Full_Name: Philip Kensche
Version: R version 2.5.0 (2007-04-23)
OS: Linux
Submission from: (NULL) (131.174.146.31)


I want to access a row of a data.frame() by using the row names as hash keys.
This works fine for most keys.

Now consider the following data.frame().

> x <- data.frame(v=c(V40="a", V411="b"))

> x
 v
V40  a
V411 b

If I query for "V41", which does not exist in the data.frame() the call does not
return NA as I would expect but the row "V411".

> x[ "V41", ]
[1] b
Levels: a b

If there the prefix is not unique the query does not return a results, i.e.

> x <- data.frame(v=c(V412="a", V411="b"))

> x
 v
V412 a
V411 b

> x[ "V41", ]
[1] 
Levels: a b


sessionInfo() output:
> sessionInfo()
R version 2.5.0 (2007-04-23)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C

attached base packages:
[1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"
[7] "base"

other attached packages:
 lattice
"0.15-4"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hash access to data.frame uses prefix? (PR#10474)

2007-11-28 Thread ripley
Exact matching has preference over partial matching: see ?pmatch.

Your version of R is three versions obsolete: the latest version explains 
this in detail under ?`[.data.frame` and ?`[` (and maybe 2.5.0 does too).

Please do your homework before sending non-bugs to R-bugs.


On Wed, 28 Nov 2007, [EMAIL PROTECTED] wrote:

> Full_Name: Philip Kensche
> Version: R version 2.5.0 (2007-04-23)
> OS: Linux
> Submission from: (NULL) (131.174.146.31)
>
>
> I want to access a row of a data.frame() by using the row names as hash keys.

Hmm, you mean you use character vector indices.  No hashing is involved.

> This works fine for most keys.
>
> Now consider the following data.frame().
>
>> x <- data.frame(v=c(V40="a", V411="b"))
>
>> x
> v
> V40  a
> V411 b
>
> If I query for "V41", which does not exist in the data.frame() the call 
> does not return NA as I would expect but the row "V411".
>
>> x[ "V41", ]
> [1] b
> Levels: a b
>
> If there the prefix is not unique the query does not return a results, i.e.
>
>> x <- data.frame(v=c(V412="a", V411="b"))
>
>> x
> v
> V412 a
> V411 b
>
>> x[ "V41", ]
> [1] 
> Levels: a b
>
>
> sessionInfo() output:
>> sessionInfo()
> R version 2.5.0 (2007-04-23)
> i686-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"
> [7] "base"
>
> other attached packages:
> lattice
> "0.15-4"
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] hash access to data.frame uses prefix? (PR#10474)

2007-11-28 Thread ripley
If you do want only exact matching, use match().  E.g. in your example

x[match("V41", row.names(x)), ]

will give NA as you expected for a construction documented to do something 
else.

On Wed, 28 Nov 2007, Prof Brian Ripley wrote:

> Exact matching has preference over partial matching: see ?pmatch.
>
> Your version of R is three versions obsolete: the latest version explains 
> this in detail under ?`[.data.frame` and ?`[` (and maybe 2.5.0 does too).
>
> Please do your homework before sending non-bugs to R-bugs.
>
>
> On Wed, 28 Nov 2007, [EMAIL PROTECTED] wrote:
>
>> Full_Name: Philip Kensche
>> Version: R version 2.5.0 (2007-04-23)
>> OS: Linux
>> Submission from: (NULL) (131.174.146.31)
>> 
>> 
>> I want to access a row of a data.frame() by using the row names as hash 
>> keys.
>
> Hmm, you mean you use character vector indices.  No hashing is involved.
>
>> This works fine for most keys.
>> 
>> Now consider the following data.frame().
>> 
>>> x <- data.frame(v=c(V40="a", V411="b"))
>> 
>>> x
>> v
>> V40  a
>> V411 b
>> 
>> If I query for "V41", which does not exist in the data.frame() the call 
>> does not return NA as I would expect but the row "V411".
>> 
>>> x[ "V41", ]
>> [1] b
>> Levels: a b
>> 
>> If there the prefix is not unique the query does not return a results, i.e.
>> 
>>> x <- data.frame(v=c(V412="a", V411="b"))
>> 
>>> x
>> v
>> V412 a
>> V411 b
>> 
>>> x[ "V41", ]
>> [1] 
>> Levels: a b
>> 
>> 
>> sessionInfo() output:
>>> sessionInfo()
>> R version 2.5.0 (2007-04-23)
>> i686-pc-linux-gnu
>> 
>> locale:
>> LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY=en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
>> 
>> attached base packages:
>> [1] "stats" "graphics"  "grDevices" "utils" "datasets"  "methods"
>> [7] "base"
>> 
>> other attached packages:
>> lattice
>> "0.15-4"
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] colnames slow (PR#10470)

2007-11-28 Thread maechler
> "UweL" == Uwe Ligges <[EMAIL PROTECTED]>
> on Mon, 26 Nov 2007 22:14:07 +0100 writes:

UweL> [EMAIL PROTECTED] wrote:
>> Full_Name: Tomas Larsson
>> Version: 2.6.0
>> OS: Windows XP
>> Submission from: (NULL) (198.208.251.24)
>> 
>> 
>> This is not a bug, it is a performance issue but I think it should have 
an easy
>> fix.
>> 
>> I have a large matrix (about 2,000,000 by 20), when I type colnames(x) 
it takes
>> a long time to get the result.  However, if I select just the first 
couple of
>> rows of the matrix I don't have to wait for the result. See below for 
example.

>>> system.time(colnames(x))
>> user  system elapsed 
>> 9.980.00   10.00 

>>> system.time(colnames(x[1:2,]))
>> user  system elapsed 
>> 0.010.000.02 

UweL> Documentation in the released version of R (2.6.1) tells us:

UweL> For a data frame, 'rownames'
UweL> and 'colnames' are calls to 'row.names' and 'names' respectively,
UweL> but the latter are preferred.

aaah, so we do have a something close to a bug,
since the above is only correct if "are" is interpreted in quite
a wide sense :

Both colnames() and rownames() call dimnames() 
which is a (.Primitive) generic and the data.frame method simply
is
function (x) list(row.names(x), names(x))
So, in fact, both colnames() and rownames() 
each call *both* row.names() and names() even though only one of
them is needed.

This is indeed suboptimal in the case where colnames() is of
length 20 and rownames() is of length 2'000'000  ... not really
an atypical case.

And that (last paragraph) is also true when 'x' is a matrix and
not a data.frame. However, there's a bit more to it...

UweL> and on my machine I get:

UweL> system.time(names(x))
UweL> user  system elapsed
UweL> 0   0   0

yes.  But what if his 'x'  really *was* a matrix (and not a data frame)?

The speed of colnames(x) in such a case depends quite a bit if the
matrix has non-NULL rownames.  Ideally I think it should not,
and hence partly agree with Tomas.

HOWEVER, there's more to it.
If 'x' *was* a matrix --- and this proves to me that it was not
  in Tomas' case ---
even though colnames() seems like a waste (of memory, copying),
it is infact still very fast in newer versions of R ... most probably
because 'character' vectors are hashed now and much less memory
allocation is happening than in earlier versions of R. 

The only case that is slow is for a *data frame* with
"empty" i.e. automatic rownames.  Watch this :

m <- matrix(pi, 2e6, 20,
dimnames=list(LETTERS[sample(26,2e6,replace=TRUE)], letters[1:20]))
system.time(for(i in 1:100) cc <- colnames(m))
## 0.001  -- very fast
## ditto for this:
system.time(for(i in 1:100) dd <- dimnames(m))

## HOWEVER:
system.time(dm <- as.data.frame(m)) ## takes more than a second
##user  system elapsed
##   2.462   1.379   3.842

## Quite a bit slower (x 1000 !) than for the matrix above, but still ok:
system.time(for(i in 1:100) c2 <- colnames(dm))
##user  system elapsed
##   1.202   0.638   1.842
stopifnot(identical(c2, cc))

## ditto
system.time(for(i in 1:100) d2 <- dimnames(dm))
##user  system elapsed
##   1.143   0.626   1.769
stopifnot(identical(d2, dd))

### BUT  now: What happens if we have "empty" rownames  ???

## m0 :=  {m  with empty rownames} :
m0 <- m
dimnames(m0) <- list(NULL, colnames(m0))

## and ditto for the data frames:
## dm0 :=  {dm  with empty rownames, i.e. "internal/automatic 1:N rownames}:
system.time(dm0 <- as.data.frame(m0))
##user  system elapsed
##   1.677   1.241   2.922

system.time(c3 <- colnames(dm0))
##user  system elapsed
##   5.208   0.047   5.261

###---> OOOPS!  One single call to colnames(.)
###  needs more than  100  calls in the non-empty rownames case

## repeated calls become faster  . and 
system.time(c3 <- colnames(dm0))
##user  system elapsed
##   3.109   0.000   3.110
## . faster  ..and even much faster
system.time(c3 <- colnames(dm0))
##user  system elapsed
##   0.913   0.007   0.922

## Note: repeated calls to dimnames(.) here become faster :
system.time(d3 <- dimnames(dm0))

## Note indeed, that  names()  is lightning fast in comparison:
system.time(for(i in 1:100) c4 <- names(dm0)) ## is 'immediate' (0 sec)
##user  system elapsed
##   0.001   0.000   0.000   --- 100 x ~1000  times faster


--

All things considered,  I'd currently propose to add 

if(is.data.frame(x) && do.NULL)
return(names(x))

to the beginning of 'colnames'.  
We have such clause already at the beginning of 
'colnames<-'
  all of which would suggest to make these generic, but we
have been there before and consciously decided against doing so,
on the ground  that 'dimnames' is already generic and 
colnames(.) and rownames(.) should really be equivalent to
dimnames(.)[[j]]  for j=1 o

[Rd] help("R_LIBS") brings up the wrong help file (PR#10475)

2007-11-28 Thread timh
Doing
help("R_LIBS")
brings up a help file (the same one as help(library)),
but the help file doesn't mention R_LIBS.
It does have a link to .libPaths, which does document R_LIBS.

The quickest fix would be for help("R_LIBS") to bring up the .libPaths
help file.


--please do not edit the information below--

Version:
 platform = i386-pc-mingw32
 arch = i386
 os = mingw32
 system = i386, mingw32
 status = 
 major = 2
 minor = 5.0
 year = 2007
 month = 04
 day = 23
 svn rev = 41293
 language = R
 version.string = R version 2.5.0 (2007-04-23)

Windows XP (build 2600) Service Pack 2.0

Locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

Search Path:
 .GlobalEnv, package:stats, package:graphics, package:grDevices, package:utils, 
package:datasets, package:methods, Autoloads, package:base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] help("R_LIBS") brings up the wrong help file (PR#10475)

2007-11-28 Thread Katharine Mullen
help(R_LIBS) is the same as help(.libPaths) for me.

It appears you're running 2.5.0 - you could try updating, and then see if
the issue still exists.

---
platform   x86_64-unknown-linux-gnu
arch   x86_64
os linux-gnu
system x86_64, linux-gnu
status
major  2
minor  6.1
year   2007
month  11
day26
svn rev43537
language   R
version.string R version 2.6.1 (2007-11-26)


On Wed, 28 Nov 2007 [EMAIL PROTECTED] wrote:

> Doing
>   help("R_LIBS")
> brings up a help file (the same one as help(library)),
> but the help file doesn't mention R_LIBS.
> It does have a link to .libPaths, which does document R_LIBS.
>
> The quickest fix would be for help("R_LIBS") to bring up the .libPaths
> help file.
>
>
> --please do not edit the information below--
>
> Version:
>  platform = i386-pc-mingw32
>  arch = i386
>  os = mingw32
>  system = i386, mingw32
>  status =
>  major = 2
>  minor = 5.0
>  year = 2007
>  month = 04
>  day = 23
>  svn rev = 41293
>  language = R
>  version.string = R version 2.5.0 (2007-04-23)
>
> Windows XP (build 2600) Service Pack 2.0
>
> Locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> Search Path:
>  .GlobalEnv, package:stats, package:graphics, package:grDevices, 
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] help("R_LIBS") brings up the wrong help file (PR#10475)

2007-11-28 Thread ripley
It brings up ?.libPaths in R 2.5.1 and later (that is, the last three 
releases of R).

On Wed, 28 Nov 2007, [EMAIL PROTECTED] wrote:

> Doing
>   help("R_LIBS")
> brings up a help file (the same one as help(library)),
> but the help file doesn't mention R_LIBS.
> It does have a link to .libPaths, which does document R_LIBS.
>
> The quickest fix would be for help("R_LIBS") to bring up the .libPaths
> help file.
>
>
> --please do not edit the information below--
>
> Version:
> platform = i386-pc-mingw32
> arch = i386
> os = mingw32
> system = i386, mingw32
> status =
> major = 2
> minor = 5.0
> year = 2007
> month = 04
> day = 23
> svn rev = 41293
> language = R
> version.string = R version 2.5.0 (2007-04-23)
>
> Windows XP (build 2600) Service Pack 2.0
>
> Locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> Search Path:
> .GlobalEnv, package:stats, package:graphics, package:grDevices, 
> package:utils, package:datasets, package:methods, Autoloads, package:base
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel