RE: Multiple cores versus a "source" field.

2017-12-05 Thread Phil Scadden
. To: solr-user@lucene.apache.org Subject: Re: Multiple cores versus a "source" field. One more opinion on source field vs separate collections for multiple corpora. Index statistics don’t really settle down until at least 100k documents. Below that, idf is pretty noisy. With Ultraseek, we

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Walter Underwood
December 2017 4:11 p.m. > To: solr-user > Subject: Re: Multiple cores versus a "source" field. > > That's the unpleasant part of semi-structued documents (PDF, Word, whatever). > You never know the relationship between raw size and indexable text. > > Basically a

RE: Multiple cores versus a "source" field.

2017-12-04 Thread Phil Scadden
with that now. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, 5 December 2017 4:11 p.m. To: solr-user Subject: Re: Multiple cores versus a "source" field. That's the unpleasant part of semi-structued documents (PDF, Word, whate

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Erick Erickson
That's the unpleasant part of semi-structued documents (PDF, Word, whatever). You never know the relationship between raw size and indexable text. Basically anything that you don't care to contribute to _scoring_ is often better in an fq clause. You can also use {!cache=false} to bypass actually u

RE: Multiple cores versus a "source" field.

2017-12-04 Thread Phil Scadden
>You'll have a few economies of scale I think with a single core, but frankly I >don't know if they'd be enough to measure. You say the docs are "quite large" >though, >are you talking books? Magazine articles? is 20K large or are the 20M? Technical reports. Sometimes up to 200MB pdfs, but that

Re: Multiple cores versus a "source" field.

2017-12-04 Thread Erick Erickson
At that scale, whatever you find administratively most convenient. You'll have a few economies of scale I think with a single core, but frankly I don't know if they'd be enough to measure. You say the docs are "quite large" though, are you talking books? Magazine articles? is 20K large or are the 2