RE: Indexing very large files.

2008-02-24 Thread Jon Lehto
gical units. I could possibly just limit my indexing to the first X mb of any file, though. I hadn't thought of the implications for relevance or post-processing that you bring up above. Thanks, Dave On 2/23/08, Jon Lehto <[EMAIL PROTECTED]> wrote: > > Dave > > You may wa

RE: Indexing very large files.

2008-02-23 Thread Jon Lehto
Dave You may want to break large docs into chunks, say by chapter or other logical segment. This will help in - relevance ranking - the term frequency of large docs will cause uneven weighting unless the relevance calculation does log normalization - finer granularity of retrieval - for exa

Re: Is it possible to add synonyms run time?

2008-01-25 Thread Jon Lehto
Hi Ravish, You may want to think about the synonym dictionary as being a tool on the side, rather than each indexed document having a copy of the synonyms. At indexing time, one might normalize synonyms to a single value, and at query time do the same to get the match. Alternately, use the syn