Re: Newbie Question: Master Index or 100s Small Index

Erick Erickson Wed, 19 Mar 2014 07:15:13 -0700

Oh My. 2(something) is ancient, I second your move
to scrap the current situation and start over. I'm
really curious what the _reason_ for such a complex
setup are/were.

I second Toke's comments. This is actually
quite small by modern Solr/Lucene standards.

Personally I would index them all to a single index,
include something like a 'source' field that allowed
one to restrict the returned documents by a filter
query (fq) clause.

Toke makes the point that you will get subtly different
search results because the tf/idf calculations are
slightly different across your entire corpus than
within various sub-sections, but I suspect that you
won't notice it. Test and see, you can change later.

One thing to look at is the new hard/soft commit
distinction, see:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

The short form is you want to define your hard
autocommit to be fairly short (maybe 1 minute?)
with openSearcher=false for durability and your
soft commit whatever latency you need for being
able to search the newly-added docs.

I don't know how you're feeding docs to Solr, but
if you're using the ExtractingRequestHandler,
you are
1> transmitting the entire document over the wire,
only to throw most of it away. I'm guessing your 1.5K
of data is just a few percent of the total file size.
2> you're putting the extraction work on the same
box running Solr.

If that machine is overloaded, consider moving the Tika
processing over to one or more clients and only
sending the data you actually want to index over to Solr,
See:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick

On Wed, Mar 19, 2014 at 7:02 AM, Colin R <colin.russ...@dasmail.co.uk> wrote:
> Hi Toke
>
> Our current configuration Lucene 2.(something) with RAILO/CFML app server.
>
> 10K drives, Quad Core, 16GB, Two servers. But the indexing and searching are
> starting to fail and our developer is no longer with us so it is quicker to
> rebuild than fix all the code.
>
> Our existing config is lots of indexes with merges into the larger ones.
>
> They are still running very fast but indexing is causing us issues.
>
> Thanks
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Newbie-Question-Master-Index-or-100s-Small-Index-tp4125407p4125447.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Newbie Question: Master Index or 100s Small Index

Reply via email to