Hi,
> I think the question is strange... May be you are wondering about possible > OOM exceptions? No, that's an easier one. I was more wondering whether with 400 MB Fields (indexed, not stored) it becomes incredibly slow to: * analyze * commit / write to disk * search > I think we can pass to Lucene single document containing > comma separated list of "term, term, ..." (few billion times)... Except > "stored" and "TermVectorComponent"... Oh, I know it can be done, but I'm wondering how bad things (like the ones above) get. > I believe thousands companies already indexed millions documents with > average size few hundreds Mbytes... There should not be any limits (except Which ones are you thinking about? What sort of documents? > 100,000 _unique_ terms vs. single document containing 100,000,000,000,000 > of non-unique terms (and trying to store offsets) > > Personally, I indexed only small (up to 1000 bytes) documents-fields, but > I believe 500Mb is very common use case with PDFs (which vendors use Nah, PDF files may be big, but I think the text in them is often not *that* big, unless those are PDFs of very big books. Thanks, Otis > On 11-06-07 7:02 PM, "Erick Erickson" <erickerick...@gmail.com> wrote: > > >From older (2.4) Lucene days, I once indexed the 23 volume "Encyclopedia > >of Michigan Civil War Volunteers" in a single document/field, so it's > >probably > >within the realm of possibility at least <G>... > > > >Erick > > > >On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic > ><otis_gospodne...@yahoo.com> wrote: > >> Hello, > >> > >> What are the biggest document fields that you've ever indexed in Solr > >>or that > >> you've heard of? Ah, it must be Tom's Hathi trust. :) > >> > >> I'm asking because I just heard of a case of an index where some > >>documents > >> having a field that can be around 400 MB in size! I'm curious if > >>anyone has any > >> experience with such monster fields? > >> Crazy? Yes, sure. > >> Doable? > >> > >> Otis > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > >> Lucene ecosystem search :: http://search-lucene.com/ > >> > >> > > > >