Hi Otis,
I am recalling "pagination" feature, it is still unresolved (with default scoring implementation): even with small documents, searching-retrieving documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can take few minutes (I saw it with trunk version 6 months ago, and with very small documents, total 100 mlns docs); it is advisable to restrict search results to top-1000 in any case (as with Google)... I believe things can get wrong; yes, most plain-text retrieved from books should be 2kb per page, 500 pages, :=> 1,000,000 bytes (or double it for UTF-8) Theoretically, it doesn't make any sense to index BIG document containing all terms from dictionary without any "terms frequency" calcs, but even with it... I can't imagine we should index 1000s docs and each is just (different) version of whole Wikipedia, should be wrong design... Ok, use case: index single HUGE document. What will we do? Create index with _the_only_ document? And all search will return the same result (or nothing)? Paginate it; split into pages. I am pragmatic... Fuad On 11-06-07 8:04 PM, "Otis Gospodnetic" <otis_gospodne...@yahoo.com> wrote: >Hi, > > >> I think the question is strange... May be you are wondering about >>possible >> OOM exceptions? > >No, that's an easier one. I was more wondering whether with 400 MB Fields >(indexed, not stored) it becomes incredibly slow to: >* analyze >* commit / write to disk >* search > >> I think we can pass to Lucene single document containing >> comma separated list of "term, term, ..." (few billion times)... Except >> "stored" and "TermVectorComponent"...