Otis,
Not sure about the Solr, but with Lucene It was certainly doable. I
saw fields way bigger than 400Mb indexed, sometimes having a large set
of unique terms as well (think something like log file with lots of
alphanumeric tokens, couple of gigs in size). While indexing and
querying of such thi
Hi Otis,
Our OCR fields average around 800 KB. My guess is that the largest docs we
index (in a single OCR field) are somewhere between 2 and 10MB. We have had
issues where the in-memory representation of the document (the in memory index
structures being built)is several times the size of t
The Salesforce book is 2800 pages of PDF, last I looked.
What can you do with a field that big? Can you get all of the snippets?
On Tue, Jun 7, 2011 at 5:33 PM, Fuad Efendi wrote:
> Hi Otis,
>
>
> I am recalling "pagination" feature, it is still unresolved (with default
> scoring implementation)
Hi Otis,
I am recalling "pagination" feature, it is still unresolved (with default
scoring implementation): even with small documents, searching-retrieving
documents 1 to 10 can take 0 milliseconds, but from 100,000 to 100,010 can
take few minutes (I saw it with trunk version 6 months ago, and wi
Hi,
> I think the question is strange... May be you are wondering about possible
> OOM exceptions?
No, that's an easier one. I was more wondering whether with 400 MB Fields
(indexed, not stored) it becomes incredibly slow to:
* analyze
* commit / write to disk
* search
> I think we can pass
I think the question is strange... May be you are wondering about possible
OOM exceptions? I think we can pass to Lucene single document containing
comma separated list of "term, term, ..." (few billion times)... Except
"stored" and "TermVectorComponent"...
I believe thousands companies already in
>From older (2.4) Lucene days, I once indexed the 23 volume "Encyclopedia
of Michigan Civil War Volunteers" in a single document/field, so it's probably
within the realm of possibility at least ...
Erick
On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic
wrote:
> Hello,
>
> What are the biggest do