On Apr 20, 2007, at 2:30 PM, solruser wrote:
For pure Ruby access to Solr without a database, use solr-ruby.  The
0.01 gem is available as "gem install solr-ruby", but if you can I'd
recommend you tinker with the trunk codebase too.


Well I say, considering use of solr with rails application. Whats the ideal
approach?.

"rails application" is a pretty broad category of applications at this point. If we're talking about a database-backed application being searchable by Solr, I'd go for the RubyForge acts_as_solr first. However, I suspect that it needs work in terms of facilitating access to facets, highlighting, and other types of custom query handlers.

If your application is backed by other datastores, like in my cases a bunch of MARC records in binary format, or a flat delimited file, a ZIP file full of RDF/XML files, or even more interestingly another Solr instance that we wanted to repurpose in another Solr-based application, then go with solr-ruby.

It's my intention to bridge this gap in the near future somehow, I just haven't formulated an exact plan. acts_as_solr fits nicely and very very easily on top of solr-ruby. I envision acts_as_solr simply being part of solr-ruby and it'd only hook in if you have ActiveRecord installed, otherwise it'd be transparent, only taking up a few 10's of lines of code in an un-required .rb file.

The first step could be to patch the RubyForge acts_as_solr to use solr-ruby to kick start collaboration. As for where my effort fits into a calendar, within the next few weeks I'll be delving into it deeply and can speak more definitively.



Since there are many flavors floating around which is most sought after and supported. And I agree that definitive version will help ROR community to
accept solr with much larger level of confidence.
 And since ROR application are addressing
web2.0 the need for search and collaborate information is much higher. So I
personally believe addressing this will definately go long way.

That's the plan! No question about it. I personally am running on all cylinders, and will make progress on these technologies as my real-world needs require them, which is increasing all the time. All savvy SolRubyists are invited to jump in!

I've not documented this stuff on the wiki to the standards set by the Solr engine itself, but there is some pretty amazing power going on with solr-ruby right now. For example, the data mapping / indexer framework makes this easy to import a dataset into Solr using Ruby:

source = DataSource.new

mapping = {
  :id => :isbn,
  :name => :author,
  :source => "BOOKS",
  :year => Proc.new {|record| record.date[0,4] },
}

Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
  solr_document[:timestamp] = Time.now
end

This showcases the simplistic data source facility (*quack* - anything that has a #each method) [with a contrived DataSource bogus class], and the mapping capabilities. The mapping is a hash of Solr field names to value mapping. A value mapping can be a String ("BOOKS"), a Symbol (:isbn, :author) which looks up that field from (uh, #)each of the objects yielded to the each block. This lookup simply means again *quack* that the data object needs a [] method defined. The Proc example is a bit more advanced Ruby voodoo for embedded a bit of code into the mapping to be executed later with actual record passed into it, and in the example it strips off the first four characters of the records date property. And one more bit of Ruby coolness is the do ... end block for the indexer method. The indexer takes a data source and a mapper melding them together as described, and allowing you one final chance to affect the solr_document before it gets indexed, of course also provided the original data object.

We now already have a simple mapper, an XPath mapper, and an Hpricot mapper available. We also have some handy data sources including a tab-delimited file source (obsoleted in my play book by the CSV importer now built in). I'm also using a simple custom MARC binary data source and mapper specific to ruby-marc objects, and I just put together a SolrSource that takes a query (and filters) for one Solr instance in a configurable paging way, that feeds documents returned from that query successively out. Apply a mapper to that data source and you can pipe data from one Solr to another like this:

solr_source = Solr::Importer::SolrSource.new("http://localhost:8420/ solr", "*:*", ["year:[1776 TO 1918]", 'author:smith'])
count = 0
Solr::Indexer.index(source_solr, mapper, {:debug => false, :timeout => 120, :solr_url => "http://localhost:8983/solr"}) do |orig_data, solr_document|
  count = count + 1
  if count % 100 == 0
    puts "#{count}"
  end
end

The count junk is just to see console progress on how many records have been indexed.

So I'm working the Ruby/Solr thing as much as possible right now. There is something to what we've got there, but its not packaged as nicely as needed for a community to flourish, and for that I apologize. But there is also enough goodness there now to lure folks in to want to get involved.

Right now in RoR with the Flare plugin installed, you can have a controller that looks like this:

   class SearchController < ApplicationController
      flare
    end

And with some copy/pasting of templates (that we can build in as defaults somehow I'm sure) you have a faceted browsing Ajax tricked out (well, inplace editor and Ajax suggest) experience with how many lines of code? (the devil is in the details though, and that is why I don't yet recommend flare to folks that just want it to just work and also be configurable) Flare cuts a lot of corners by hard-coding some thing that need to be made configurable, etc. Typical prototyping approach, tinker, tinker, tinker, distill. I'm still in the first tinker phase with Flare right now. But folks interested in rolling up their sleeves and don't mind getting a little grubby with code are more than invited to delve into Flare now, with the forewarning that the flare you see today will not be at all near the Flare that spawns from the ashes. Pioneering spirit required.

: 3. performance benchmark for acts_as_solr plugin available if any

What kind of numbers are you after?  acts_as_solr searches Solr, and
then will fetch the records from the database to bring back model
objects, so you have to account for the database access in the
picture as well as Solr.


Well to be specific I am keen to know about creation and update of indexes
when you run into large number of documents. Since database is used to
populate the models and definately it will be the commulative effect of retrieval of document from solr with lucene, network issues (since its a web
service) and locally on database (depends on configuration).

Again we need to be clear about "large". I've got near 4M indexes under my belt now, but many others have gone to 10M+. Lucene and Solr both scale very well in the 10's of millions and even further up into the hundreds of millions I've heard.

Certainly those other latencies you mention are valid questions, but in my experience they've not been show-stopping concerns performance with Solr + Ruby has been more than acceptable... it's been just fine, even with several spots for improvement in all those areas in my applications. First rule of optimization: Don't. Second rule of optimization: Don't optimize yet.

        Erik


Reply via email to