Yes.

They are decoded from the deltas in the tii file into absolutes in
memory, on load.

Note that trunk (w/ flex indexing) has changed this substantially: we
store only the offset into the terms dict file, as an absolute in a
packed int array (no object per indexed term).  Then, at the seek
points in the terms index we store absolute frq/prx pointers, so that
on seek we can rebase the decoding.

Mike

On Fri, Sep 17, 2010 at 10:02 AM, Giovanni Fernandez-Kincade
<gfernandez-kinc...@capitaliq.com> wrote:
>> The terms index (once loaded into RAM) has absolute longs, too.
>
> So in the TermInfo Index(.tii), the FreqDelta, ProxDelta, And SkipDelta 
> stored with each TermInfo are actually absolute?
>
> -----Original Message-----
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Friday, September 17, 2010 5:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Understanding Lucene's File Format
>
> The entry for each term in the terms dict stores a long file offset pointer, 
> into the .frq file, and another long for the .prx file.
>
> But, these longs are delta-coded, so as you scan you have to sum up these 
> deltas to get the absolute file pointers.
>
> The terms index (once loaded into RAM) has absolute longs, too.
>
> So when looking up a term, we first bin search to the nearest indexed term 
> less than what you seek, then seek to that spot in the terms dict, then scan, 
> summing the deltas.
>
> Mike
>
> On Thu, Sep 16, 2010 at 3:53 PM, Giovanni Fernandez-Kincade 
> <gfernandez-kinc...@capitaliq.com> wrote:
>> Hi,
>> I've been trying to understand Lucene's file format and I keep getting hung 
>> up on one detail - how can Lucene quickly find the frequency data (or 
>> proximity data) for a particular term? According to the file formats page on 
>> the Lucene 
>> website<http://lucene.apache.org/java/2_2_0/fileformats.html#Term%20Dictionary>,
>>  the FreqDelta field in the Term Info file (.tis) is relative to the 
>> previous term. How is this helpful? The few references I've found on the web 
>> for this subject make it sound like the Term Dictionary has direct pointers 
>> to the frequency data for a given term, but that isn't consistent with the 
>> aforementioned reference.
>>
>> Thanks for your help,
>> Gio.
>>
>

Reply via email to