Matthew,

With an index that small, you should be able to build a proof of
concept on your own hardware and discover how it performs using
something like SolrMeter:


Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Wed, Feb 13, 2013 at 12:21 PM, Matthew Shapiro <m...@mshapiro.net> wrote:
> Thanks for the reply.
>
> If the main amount of searches are the exact same (e.g. the empty search),
>> the result will be cached. If 5,683 searches/month is the real count, this
>> sounds like a very low amount of searches in a very limited corpus. Just
>> about any machine should be fine. I guess I am missing something here.
>> Could you elaborate a bit? How large is a document, how many do you expect
>> to handle, what do you expect a query to look like, how should the result
>> be presented?
>
>
> Sorry, I should clarify our current statistics.  First of all I meant 183k
> documents (not 183, woops).  Around 100k of those are full fledged html
> articles (not web pages but articles in our CMS with html content inside of
> them), the rest of the data are more like key/value data records with a lot
> of attached meta data for searching.
>
> Also, what I meant by search without a search term is that probably 80%
> (hard to confirm due to the lack of stats given by the GSA) of our searches
> are done on pure metadata clauses without any searching through the content
> itself, so for example "give me documents that have a content type of
> video, that are marked for client X, have a category of Y or Z, and was
> published to platform A, ordered by date published".  The searches that use
> a search term are more like use the same query from the example as before,
> but find me all the documents that have the string "My Video" in it's title
> and description.  From the way that the GSA provides us statistics (which
> are pretty bare), it appears like they do not count "no search term"
> searches in part of those statistics (the GSA is not really built for not
> using search terms either, and we've had various issues using it in this
> way because of it).
>
> The reason we are using the GSA for this and not our MSSql database is
> because some of this data requires multiple, and expensive, joins and we do
> need full text search for when users want to use that option.  Also for
> faceting.
>
>
> On Wed, Feb 13, 2013 at 11:24 AM, Toke Eskildsen 
> <t...@statsbiblioteket.dk>wrote:
>
>> Matthew Shapiro [m...@mshapiro.net] wrote:
>>
>> [Hardware for Solr]
>>
>> > What type of hardware (at a high level) should I be looking for.  Are the
>> > main constraints disk I/O, memory size, processing power, etc...?
>>
>> That depends on what you are trying to achieve. Broadly speaking, "simple"
>> search and retrieval is mainly I/O bound. The easy way to handle that is to
>> use SSDs as storage. However, a lot of people like the old school solution
>> and compensates for the slow seeks of spinning drives by adding  RAM and
>> doing warmup of the searcher or index files. So either SSD or RAM on the
>> I/O side. If the corpus is non-trivial is size that is, which brings us
>> to...
>>
>> > Right now we have about 183 documents stored in the GSA (which will go
>> up a
>> > lot once we are on Solr since the GSA is limiting).  The search systems
>> are
>> > used to display core information on several of our homepages, so our
>> search
>> > traffic is pretty significant (the GSA reports 5,683 searches in the last
>> > month, however I am 99% sure this is not correct and is not counting
>> search
>> > requests without any search terms, which consists of most of our search
>> > traffic).
>>
>> If the main amount of searches are the exact same (e.g. the empty search),
>> the result will be cached. If 5,683 searches/month is the real count, this
>> sounds like a very low amount of searches in a very limited corpus. Just
>> about any machine should be fine. I guess I am missing something here.
>> Could you elaborate a bit? How large is a document, how many do you expect
>> to handle, what do you expect a query to look like, how should the result
>> be presented?
>>
>> Regards,
>> Toke Eskildsen

Reply via email to