Hi,
I'm trying to write a general-purpose "lexicon" class and associated methods
for storing and accessing information about large numbers of specific words
(e.g., their frequencies in different genres). Crucial to making such a class
practically useful is to get hashing working correctly so that information
about specific words can be accessed quickly. But I've never really understood
very well how hashing works, so I'm having trouble.
Here is an example of what I've done so far:
***
setClass("Lexicon",representation(e="environment"))
setMethod("initialize","Lexicon",function(.Object,wfreqs) {
.obj...@e <- new.env(hash=T,parent=emptyenv())
assign("wfreqs",wfreqs,envir=.obj...@e)
return(.Object)
})
## function to access word frequencies
wfreq <- function(lexicon,word) {
return(get("wfreqs",envir=lexi...@e)[word])
}
## example of use
my.lexicon <- new("Lexicon",wfreqs=c("the"=2,"person"=1))
wfreq(my.lexicon,"the")
***
However, testing indicates that the way I have set this up does not achieve the
intended benefits of having the environment hashed:
***
sample.wfreqs <- trunc(runif(1e5,max=100))
names(sample.wfreqs) <- as.character(1:length(sample.wfreqs))
lex <- new("Lexicon",wfreqs=sample.wfreqs)
words.to.lookup <- trunc(runif(100,min=1,max=1e5))
## look up the words directly from the sample.wfreqs vector
system.time({
for(i in words.to.lookup)
sample.wfreqs[as.character(i)]
},gcFirst=TRUE)
## look up the words through the wfreq() function; time approx the same
system.time({
for(i in words.to.lookup)
wfreq(lex,as.character(i))
},gcFirst=TRUE)
***
I'm guessing that the problem is that the indexing of the wfreqs vector in my
wfreq() function is not happening inside the actual lexicon's environment.
However, I have not been able to figure out the proper call to get the lookup
to happen inside the lexicon's environment. I've tried
wfreq1 <- function(lexicon,word) {
return(eval(wfreqs[word],envir=lexi...@e))
}
which I'd thought should work, but this gives me an error:
> wfreq1(my.lexicon,'the')
Error in eval(wfreqs[word], envir = lexi...@e) :
object 'wfreqs' not found
Any advice would be much appreciated!
Best & many thanks in advance,
Roger
--
Roger Levy Email: [email protected]
Assistant Professor Phone: 858-534-7219
Department of Linguistics Fax: 858-534-4789
UC San Diego Web: http://ling.ucsd.edu/~rlevy
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.