I am not a solr expert, but is it possible to try to build indexes based on
search statiscs?
What is that you could have a monitoring service that would generate statics
of search queries, document returns and place weights to each queriy based
on ocurrence, impact on the index, time to respond and number of documents
returned.

So you ncould have another batch service that would use this statistic
generated to reorganize and repartition your N indexes daily, based on the
query load. This way, it would be possible to get all your N indexes being
hit a similar amount of time, or load.

Sorry if I am not helping.

2008/5/15 William Pierce <[EMAIL PROTECTED]>:

> Folks:
>
> We are building a search capability into our web and plan to use Solr.
>  While we have the initial prototype version up and running on Solr 1.2,  we
> are now turning our attention to sizing/scalability.
>
> Our app in brief:  We get merchant sku files (in either xml/csv) which we
> process and index and make available to our site visitors to search.   Our
> current plan calls for us to support approx 10,000 merchants each with an
> average of 50,000 sku's.   This will make a total of approx 500 Million
> SKUs.  In addition,  we assume that on a daily basis approx 5-10% of the
> SKUs need to be updated (either added/deleted/modified).   (Assume each sku
> will be approx 4K)
>
> Here are a few questions that we are thinking about and would value any
> insights you all may have:
>
> a) Should we have just one giant master index (containing all the sku's)
> and then have multiple slaves to handle the search queries?    In this case,
> the master index will be approx 2 TB in size.  Not being an expert in
> solr/lucene,  I am thinking that this may be a bad idea to let one index
> become so large.   What size limit should we assume for each index?
>
> b) Or, should we partition the 10,000 merchants into N buckets and have a
> master index for each of the N buckets?   We could partition the merchants
> depending on their type or some other simple algorithm.   Then,  we could
> have slaves setup for each of the N masters.  The trick here will be to
> partition the merchants carefully.  Ideally we would like a search for any
> product type to hit only one index but this may not be possible always.
> For example, a search for "Harry Potter" may result in hits in "books",
> "dvds", "memorabilia", etc etc.
>
> With N masters we will have to plan for having a distributed search across
> the N indices (and then some mechanism for weighting the results across the
> results that come back).   Any recommendations for a distributed search
> solution?   I saw some references to Katta.  Is this viable?
>
> In the extreme case, we could have one master for each of the merchants (if
> there are 10000 merchants there will be 10,000 master indices).   The
> advantage here is that indices will have to be updated only for every
> merchant who submits a new data file.  The others remain unchanged.
>
> c) By the way,  for those of you who have deployed solr on a production
> environment can you give me your hardware configuration and the rough number
> of search queries that can be handled per second by a single solr instance
> -- assuming a dedicated box?
>
> d) Our plan is to release a beta version Spring 2009.  Should we plan on
> using Solr 1.2 or else move to solr 1.3 now?
>
> Any insights/thoughts/whitepapers will be greatly appreciated!
>
> Cheers,
>
> Bill
>
>
>
>


-- 
Alexander Ramos Jardim

Reply via email to