gical units. I could possibly just limit my indexing to the first X mb of
any file, though. I hadn't thought of the implications for relevance or
post-processing that you bring up above.
Thanks,
Dave
On 2/23/08, Jon Lehto <[EMAIL PROTECTED]> wrote:
>
> Dave
>
> You may wa
Dave
You may want to break large docs into chunks, say by chapter or other
logical segment.
This will help in
- relevance ranking - the term frequency of large docs will cause
uneven weighting unless the relevance calculation does log normalization
- finer granularity of retrieval - for exa
Hi Ravish,
You may want to think about the synonym dictionary as being a tool on the side,
rather than each indexed document having a copy of the synonyms. At indexing
time, one might normalize synonyms to a single value, and at query time do the
same to get the match.
Alternately, use the syn