Yes. They are decoded from the deltas in the tii file into absolutes in memory, on load.
Note that trunk (w/ flex indexing) has changed this substantially: we store only the offset into the terms dict file, as an absolute in a packed int array (no object per indexed term). Then, at the seek points in the terms index we store absolute frq/prx pointers, so that on seek we can rebase the decoding. Mike On Fri, Sep 17, 2010 at 10:02 AM, Giovanni Fernandez-Kincade <gfernandez-kinc...@capitaliq.com> wrote: >> The terms index (once loaded into RAM) has absolute longs, too. > > So in the TermInfo Index(.tii), the FreqDelta, ProxDelta, And SkipDelta > stored with each TermInfo are actually absolute? > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Friday, September 17, 2010 5:24 AM > To: solr-user@lucene.apache.org > Subject: Re: Understanding Lucene's File Format > > The entry for each term in the terms dict stores a long file offset pointer, > into the .frq file, and another long for the .prx file. > > But, these longs are delta-coded, so as you scan you have to sum up these > deltas to get the absolute file pointers. > > The terms index (once loaded into RAM) has absolute longs, too. > > So when looking up a term, we first bin search to the nearest indexed term > less than what you seek, then seek to that spot in the terms dict, then scan, > summing the deltas. > > Mike > > On Thu, Sep 16, 2010 at 3:53 PM, Giovanni Fernandez-Kincade > <gfernandez-kinc...@capitaliq.com> wrote: >> Hi, >> I've been trying to understand Lucene's file format and I keep getting hung >> up on one detail - how can Lucene quickly find the frequency data (or >> proximity data) for a particular term? According to the file formats page on >> the Lucene >> website<http://lucene.apache.org/java/2_2_0/fileformats.html#Term%20Dictionary>, >> the FreqDelta field in the Term Info file (.tis) is relative to the >> previous term. How is this helpful? The few references I've found on the web >> for this subject make it sound like the Term Dictionary has direct pointers >> to the frequency data for a given term, but that isn't consistent with the >> aforementioned reference. >> >> Thanks for your help, >> Gio. >> >