Chris, Cool stuff. Congratulations on launching.
I have a few scaling questions I hope you might be able to answer for me. I'm keen to understand how solr performs under production loads with significant real-time update traffic. Specifically, 1. How many searches per second are you currently handling? 2. How big is the solr fleet? 3. What is your update rate? 4. What is the propogation delay from master to slave, i.e. how often do you propogate and how long does it take per box? 5. What is your optimization schedule and how does it affect overall performance of the system? If anyone else out there has similar data from large-scale experiences with solr, I'd love to hear those too. Thanks, -D -----Original Message----- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Saturday, May 20, 2006 3:18 PM To: solr-user@lucene.apache.org Subject: Two Solr Announcements: CNET Product Search and DisMax I've got two related announcements to make, which I think are pretty cool... The first is that the Search result pages for CNET Shopper.com are now powered by Solr. You may be thinking "Didn't he announce that last year?" ... not quite. CNET's faceted product listing pages for browsing products by category have been powered by by Solr for about a year now, but up until a few weeks ago, searching for products by keywords was still powered by a legacy system. I was working hard to come up with a good mechanism for building Lucene queries based on user input, that would allow us to leverage our "domain expertise" about consumer technology products to ensure that users got the best matches. Which brings me to my second announcement: I've just committed a new SolrQueryHandler called the "DisMaxQueryHandler" into the Solr subversion repository. This query handler supports a simplified version of the Lucene QueryParser syntax. Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser special characters are escaped to simplify the user experience. The handler takes responsibility for building a good query from the user's input using BooleanQueries containing DisjunctionMaxQueries across fields and boosts you specify It also allows you to provide additional boosting queries, boosting functions, and filtering queries to artificially affect the outcome of all searches. These options can all be specified as init parameters for the handler in your solrconfig.xml or overridden the Solr query URL. The code in this plugin is what is now powering CNET product search. I've updated the "example" solrconfig.xml to take advantage of it, you can take it for a spin right now if you build from scratch using subversion, otherwise you'll have to wait for the solr-2006-05-21.zip nightly release due out in a few hours. Once you've got it, the javadocs for DisMaxRequestHandler contain the details about all of the options it supports, and here are a few URLs you can try out using the product data in the exampledocs directory... Normal results for the word "video" using the StandardRequestHandler with the default search field... http://localhost:8983/solr/select/?q=video&fl=name+score&qt=standard The "dismax" handler is configured to search across the text, features, name, sku, id, manu, and cat fields all with varying boosts designed to ensure that "better" matches appear first, specifically: documents which match on the name and cat fields get higher scores... http://localhost:8983/solr/select/?q=video&qt=dismax ...note that this instance is also configured with a default field list, which can be overridden in the URL... http://localhost:8983/solr/select/?q=video&qt=dismax&fl=*,score You can also override which fields are searched on, and how much boost each field gets... http://localhost:8983/solr/select/?q=video&qt=dismax&qf=features^20.0+text^0 .3 Another instance of the handler is registered using the qt "instock" and has slightly different configuration options, notably: a filter for (you guessed it) inStock:true)... http://localhost:8983/solr/select/?q=video&qt=dismax&fl=name,score,inStock http://localhost:8983/solr/select/?q=video&qt=instock&fl=name,score,inStock One of the other really cool features in this handler, is robust support for specifying the "BooleanQuery.minimumNumberShouldMatch" you want to be used based on how many terms are in your users query. These allows flexibility for typos and partial matches. For the dismax handler, 1 and 2 word queries require that all of the optional clauses match, but for 3-5 word queries one missing word is allowed... http://localhost:8983/solr/select/?q=belkin+ipod&qt=dismax http://localhost:8983/solr/select/?q=belkin+ipod+gibberish&qt=dismax http://localhost:8983/solr/select/?q=belkin+ipod+apple&qt=dismax Just like the StandardRequestHandler, it supports the debugQuery option to viewing the parsed query, and the score explanations for each doc... http://localhost:8983/solr/select/?q=belkin+ipod+gibberish&qt=dismax&debugQu ery=1 http://localhost:8983/solr/select/?q=video+card&qt=dismax&debugQuery=1 ...That's the overall gist of it. I hope other people find it useful out of the box -- and even if it doesn't meet your needs, hopefully it gives you some good ideas of the types of things that can be done in a SolrRequestHandler that aren't supported natively with the Lucene QueryParser. If you do decide to write your own handler, make sure to take a look at the new SolrPluginUtils class as well -- it provides some nice reusable methods that came in handy when writing the DisMaxRequestHandler. -Hoss