Hi,

> I think the question is strange... May be you are wondering about  possible
> OOM exceptions? 

No, that's an easier one. I was more wondering whether with 400 MB Fields 
(indexed, not stored) it becomes incredibly slow to:
* analyze
* commit / write to disk
* search

> I think we can pass to Lucene single document  containing
> comma separated list of "term, term, ..." (few billion times)...  Except
> "stored" and "TermVectorComponent"...

Oh, I know it can be done, but I'm wondering how bad things (like the ones 
above) get.

> I believe thousands  companies already indexed millions documents with
> average size few hundreds  Mbytes... There should not be any limits (except

Which ones are you thinking about?  What sort of documents?

> 100,000 _unique_ terms vs. single document containing  100,000,000,000,000
> of non-unique terms (and trying to store  offsets)
> 
> Personally, I indexed only small (up  to 1000 bytes) documents-fields, but
> I believe 500Mb is very common use case  with PDFs (which vendors use

Nah, PDF files may be big, but I think the text in them is often not *that* 
big, 
unless those are PDFs of very big books.

Thanks,
Otis


> On  11-06-07 7:02 PM, "Erick Erickson" <erickerick...@gmail.com>  wrote:
> 
> >From older (2.4) Lucene days, I once indexed the 23 volume  "Encyclopedia
> >of Michigan Civil War Volunteers" in a single  document/field, so it's
> >probably
> >within the realm of possibility  at least <G>...
> >
> >Erick
> >
> >On Tue, Jun 7, 2011 at  6:59 PM, Otis Gospodnetic
> ><otis_gospodne...@yahoo.com>  wrote:
> >> Hello,
> >>
> >> What are the biggest document  fields that you've ever indexed in Solr
> >>or that
> >> you've  heard of?  Ah, it must be Tom's Hathi trust. :)
> >>
> >> I'm  asking because I just heard of a case of an index where  some
> >>documents
> >> having a field that can be around 400 MB  in size!  I'm curious if
> >>anyone has any
> >> experience  with such monster fields?
> >> Crazy?  Yes, sure.
> >>  Doable?
> >>
> >> Otis
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> >> Lucene ecosystem search :: http://search-lucene.com/
> >>
> >>
> 
> 
> > 

Reply via email to