On 3-Jul-08, at 5:13 PM, Chris Harris wrote:
That's pretty much impossible (way too small). Double check those numbers.
I don't know where I got the above numbers. Sorry. Here are the real  
numbers:
.tis file: 730MB
.frq files: 10.1 GB
.prx file: 43.2 GB

Now keeping all *that* in RAM, that sounds like a challenge.
It doesn't have to be *all* in RAM... the OS will figure out what  
parts are needed.
One alternative you might consider is using a flash hard drive.   
Another is to index bigrams as terms, and do phrase queries using the  
conjunction of the bigrams of a phrase.  This should make phrase  
queries only a few times slower than term queries, and probably  
inflate your .frq to "only" 25GB (.prx could be ignored).
Some other tricks, like stop word removal, also speed up phrase queries.

-Mike

Reply via email to