.
To: solr-user@lucene.apache.org
Subject: Re: Multiple cores versus a "source" field.
One more opinion on source field vs separate collections for multiple corpora.
Index statistics don’t really settle down until at least 100k documents. Below
that, idf is pretty noisy. With Ultraseek, we
December 2017 4:11 p.m.
> To: solr-user
> Subject: Re: Multiple cores versus a "source" field.
>
> That's the unpleasant part of semi-structued documents (PDF, Word, whatever).
> You never know the relationship between raw size and indexable text.
>
> Basically a
with that now.
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Tuesday, 5 December 2017 4:11 p.m.
To: solr-user
Subject: Re: Multiple cores versus a "source" field.
That's the unpleasant part of semi-structued documents (PDF, Word, whate
That's the unpleasant part of semi-structued documents (PDF, Word,
whatever). You never know the relationship between raw size and
indexable text.
Basically anything that you don't care to contribute to _scoring_ is
often better in an fq clause. You can also use {!cache=false} to
bypass actually u
>You'll have a few economies of scale I think with a single core, but frankly I
>don't know if they'd be enough to measure. You say the docs are "quite large"
>though, >are you talking books? Magazine articles? is 20K large or are the 20M?
Technical reports. Sometimes up to 200MB pdfs, but that
At that scale, whatever you find administratively most convenient.
You'll have a few economies of scale I think with a single core, but
frankly I don't know if they'd be enough to measure. You say the docs
are "quite large" though, are you talking books? Magazine articles? is
20K large or are the 2