On Feb 5, 2007, at 11:15 AM, Yonik Seeley wrote:
On 2/5/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
This week I'm going to be incrementally loading up to 3.7M records
into Solr, in 50k chunks.
I'd like to capture some performance numbers after each chunk to see
how she holds up.
What numbers are folks capturing? What techniques are you using to
capture numbers? I'm not looking for anything elaborate, as the goal
is really to see how faceting fares as more data is loaded. We've
got some ugly data in our initial experiment, so the faceting
concerns me.
Gulp... me too. That sounds like a lot of data, and the faceting code
is still young (it will get better with age :-)
The big performance factor for faceting will relate to the number of
unique values in a field.
So what are you trying to facet on?
The facets are bibliographic metadata about library holdings, such as
genre, subject, format, published date (year), and others. Basically
an open source think like this:
<http://www2.lib.ncsu.edu/catalog/?N=201015&Ns=Call+Number+sort%
7c0&sort=5>
(if that link didn't work, hit the main page at <http://
www.lib.ncsu.edu/catalog/browse.html> and drill in a little)
The data is real ugly, and there are typically several values per
field, so all facets are currently set as multiValued.
We shall see!
Erik