We just switched over to storing our data directly in Solr as
compressed JSON fields at http://frugalmechanic.com.  So far it's
working out great.  Our detail pages (e.g.:
http://frugalmechanic.com/auto-part/817453-33-2084-kn-high-performance-air-filter)
now make a single Solr request to grab the part data, pricing data,
and fitment data.  Before we'd make a call to Solr, and then probably
3-4 DB calls to load data.

As Lance pointed out, the downside is that whenever any of our part
data changes we have to re-index the entire document.  So updating
pricing for some of our larger retailers means reindexing a large
portion of our dataset.  But that's the tradeoff we were willing to
make going into the change and so far a daily re-index of the data
that takes 30-60mins isn't a big deal.  But later on we may split out
the data that changes frequently from the data that doesn't change
often.

We're working with about 2 million documents and our optimized index
files are currently at 3.2 GB.  Using compression on the large text
fields really helps keep the size down.

-Tim

On Wed, Feb 3, 2010 at 9:26 PM, Tommy Chheng <tommy.chh...@gmail.com> wrote:
> Hey AJ,
> For simplicity sake, I am using Solr to serve as storage and search for
> http://researchwatch.net.
> The dataset is 110K  NSF grants from 1999 to 2009. The faceting is all
> dynamic fields and I use a catch all to copy all fields to a default text
> field. All fields are also stored and used for individual grant view.
> The performance seems fine for my purposes. I haven't done any extensive
> benchmarking with it. The site was built using a light ROR/rsolr layer on a
> small EC2 instance.
>
> Feel free to bang against the site with jmeter if you want to stress test a
> sample server to failure.  :)
>
> --
> Tommy Chheng
> Developer & UC Irvine Graduate Student
> http://tommy.chheng.com
>
> On 2/3/10 5:41 PM, AJ Asver wrote:
>>
>> Hi all,
>>
>> I work on search at Scoopler.com, a real-time search engine which uses
>> Solr.
>>  We current use solr for indexing but then fetch data from our couchdb
>> cluster using the IDs solr returns.  We are now considering storing a
>> larger
>> portion of data in Solr's index itself so we don't have to hit the DB too.
>>  Assuming that we are still storing data on the db (for backend and back
>> up
>> purposes) are there any significant disadvantages to using solr as a data
>> store too?
>>
>> We currently run a master-slave setup on EC2 using x-large slave instances
>> to allow for the disk cache to use as much memory as possible.  I imagine
>> we
>> would definitely have to add more slave instances to accomodate the extra
>> data we're storing (and make sure it stays in memory).
>>
>> Any tips would be really helpful.
>> --
>> AJ Asver
>> Co-founder, Scoopler.com
>>
>> +44 (0) 7834 609830 / +1 (415) 670 9152
>> a...@scoopler.com
>>
>>
>> Follow me on Twitter: http://www.twitter.com/_aj
>> Add me on Linkedin: http://www.linkedin.com/in/ajasver
>> or YouNoodle: http://younoodle.com/people/ajmal_asver
>>
>> My Blog: http://ajasver.com
>>
>>
>

Reply via email to