Hi,

I'm trying to write a general-purpose "lexicon" class and associated methods 
for storing and accessing information about large numbers of specific words 
(e.g., their frequencies in different genres).  Crucial to making such a class 
practically useful is to get hashing working correctly so that information 
about specific words can be accessed quickly.  But I've never really understood 
very well how hashing works, so I'm having trouble.

Here is an example of what I've done so far:

***

setClass("Lexicon",representation(e="environment"))
setMethod("initialize","Lexicon",function(.Object,wfreqs) {
        .obj...@e <- new.env(hash=T,parent=emptyenv())
        assign("wfreqs",wfreqs,envir=.obj...@e)
        return(.Object)
        })

## function to access word frequencies
wfreq <- function(lexicon,word) {
        return(get("wfreqs",envir=lexi...@e)[word])
}

## example of use
my.lexicon <- new("Lexicon",wfreqs=c("the"=2,"person"=1))
wfreq(my.lexicon,"the")

***

However, testing indicates that the way I have set this up does not achieve the 
intended benefits of having the environment hashed:

***

sample.wfreqs <- trunc(runif(1e5,max=100))
names(sample.wfreqs) <- as.character(1:length(sample.wfreqs))
lex <- new("Lexicon",wfreqs=sample.wfreqs)
words.to.lookup <- trunc(runif(100,min=1,max=1e5))
## look up the words directly from the sample.wfreqs vector
system.time({
        for(i in words.to.lookup)
                sample.wfreqs[as.character(i)]
        },gcFirst=TRUE)
## look up the words through the wfreq() function; time approx the same
system.time({
        for(i in words.to.lookup)
                wfreq(lex,as.character(i))
        },gcFirst=TRUE)

***

I'm guessing that the problem is that the indexing of the wfreqs vector in my 
wfreq() function is not happening inside the actual lexicon's environment.  
However, I have not been able to figure out the proper call to get the lookup 
to happen inside the lexicon's environment.  I've tried

wfreq1 <- function(lexicon,word) {
        return(eval(wfreqs[word],envir=lexi...@e))
}

which I'd thought should work, but this gives me an error:

> wfreq1(my.lexicon,'the')
Error in eval(wfreqs[word], envir = lexi...@e) : 
  object 'wfreqs' not found

Any advice would be much appreciated!

Best & many thanks in advance,

Roger

--

Roger Levy                      Email: [email protected]
Assistant Professor             Phone: 858-534-7219
Department of Linguistics       Fax:   858-534-4789
UC San Diego                    Web:   http://ling.ucsd.edu/~rlevy

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to