We just switched over to storing our data directly in Solr as compressed JSON fields at http://frugalmechanic.com. So far it's working out great. Our detail pages (e.g.: http://frugalmechanic.com/auto-part/817453-33-2084-kn-high-performance-air-filter) now make a single Solr request to grab the part data, pricing data, and fitment data. Before we'd make a call to Solr, and then probably 3-4 DB calls to load data.
As Lance pointed out, the downside is that whenever any of our part data changes we have to re-index the entire document. So updating pricing for some of our larger retailers means reindexing a large portion of our dataset. But that's the tradeoff we were willing to make going into the change and so far a daily re-index of the data that takes 30-60mins isn't a big deal. But later on we may split out the data that changes frequently from the data that doesn't change often. We're working with about 2 million documents and our optimized index files are currently at 3.2 GB. Using compression on the large text fields really helps keep the size down. -Tim On Wed, Feb 3, 2010 at 9:26 PM, Tommy Chheng <tommy.chh...@gmail.com> wrote: > Hey AJ, > For simplicity sake, I am using Solr to serve as storage and search for > http://researchwatch.net. > The dataset is 110K NSF grants from 1999 to 2009. The faceting is all > dynamic fields and I use a catch all to copy all fields to a default text > field. All fields are also stored and used for individual grant view. > The performance seems fine for my purposes. I haven't done any extensive > benchmarking with it. The site was built using a light ROR/rsolr layer on a > small EC2 instance. > > Feel free to bang against the site with jmeter if you want to stress test a > sample server to failure. :) > > -- > Tommy Chheng > Developer & UC Irvine Graduate Student > http://tommy.chheng.com > > On 2/3/10 5:41 PM, AJ Asver wrote: >> >> Hi all, >> >> I work on search at Scoopler.com, a real-time search engine which uses >> Solr. >> We current use solr for indexing but then fetch data from our couchdb >> cluster using the IDs solr returns. We are now considering storing a >> larger >> portion of data in Solr's index itself so we don't have to hit the DB too. >> Assuming that we are still storing data on the db (for backend and back >> up >> purposes) are there any significant disadvantages to using solr as a data >> store too? >> >> We currently run a master-slave setup on EC2 using x-large slave instances >> to allow for the disk cache to use as much memory as possible. I imagine >> we >> would definitely have to add more slave instances to accomodate the extra >> data we're storing (and make sure it stays in memory). >> >> Any tips would be really helpful. >> -- >> AJ Asver >> Co-founder, Scoopler.com >> >> +44 (0) 7834 609830 / +1 (415) 670 9152 >> a...@scoopler.com >> >> >> Follow me on Twitter: http://www.twitter.com/_aj >> Add me on Linkedin: http://www.linkedin.com/in/ajasver >> or YouNoodle: http://younoodle.com/people/ajmal_asver >> >> My Blog: http://ajasver.com >> >> >