Re: What should focus be on hardware for solr servers?

Matthew Shapiro Wed, 13 Feb 2013 09:21:42 -0800

Thanks for the reply.

If the main amount of searches are the exact same (e.g. the empty search),
> the result will be cached. If 5,683 searches/month is the real count, this
> sounds like a very low amount of searches in a very limited corpus. Just
> about any machine should be fine. I guess I am missing something here.
> Could you elaborate a bit? How large is a document, how many do you expect
> to handle, what do you expect a query to look like, how should the result
> be presented?

Sorry, I should clarify our current statistics.  First of all I meant 183k
documents (not 183, woops).  Around 100k of those are full fledged html
articles (not web pages but articles in our CMS with html content inside of
them), the rest of the data are more like key/value data records with a lot
of attached meta data for searching.

Also, what I meant by search without a search term is that probably 80%
(hard to confirm due to the lack of stats given by the GSA) of our searches
are done on pure metadata clauses without any searching through the content
itself, so for example "give me documents that have a content type of
video, that are marked for client X, have a category of Y or Z, and was
published to platform A, ordered by date published".  The searches that use
a search term are more like use the same query from the example as before,
but find me all the documents that have the string "My Video" in it's title
and description.  From the way that the GSA provides us statistics (which
are pretty bare), it appears like they do not count "no search term"
searches in part of those statistics (the GSA is not really built for not
using search terms either, and we've had various issues using it in this
way because of it).

The reason we are using the GSA for this and not our MSSql database is
because some of this data requires multiple, and expensive, joins and we do
need full text search for when users want to use that option.  Also for
faceting.

On Wed, Feb 13, 2013 at 11:24 AM, Toke Eskildsen 
<[email protected]>wrote:

> Matthew Shapiro [[email protected]] wrote:
>
> [Hardware for Solr]
>
> > What type of hardware (at a high level) should I be looking for.  Are the
> > main constraints disk I/O, memory size, processing power, etc...?
>
> That depends on what you are trying to achieve. Broadly speaking, "simple"
> search and retrieval is mainly I/O bound. The easy way to handle that is to
> use SSDs as storage. However, a lot of people like the old school solution
> and compensates for the slow seeks of spinning drives by adding  RAM and
> doing warmup of the searcher or index files. So either SSD or RAM on the
> I/O side. If the corpus is non-trivial is size that is, which brings us
> to...
>
> > Right now we have about 183 documents stored in the GSA (which will go
> up a
> > lot once we are on Solr since the GSA is limiting).  The search systems
> are
> > used to display core information on several of our homepages, so our
> search
> > traffic is pretty significant (the GSA reports 5,683 searches in the last
> > month, however I am 99% sure this is not correct and is not counting
> search
> > requests without any search terms, which consists of most of our search
> > traffic).
>
> If the main amount of searches are the exact same (e.g. the empty search),
> the result will be cached. If 5,683 searches/month is the real count, this
> sounds like a very low amount of searches in a very limited corpus. Just
> about any machine should be fine. I guess I am missing something here.
> Could you elaborate a bit? How large is a document, how many do you expect
> to handle, what do you expect a query to look like, how should the result
> be presented?
>
> Regards,
> Toke Eskildsen

Re: What should focus be on hardware for solr servers?

Reply via email to