I've spent a few hours tinkering with an Ruby ActiveRecord plugin to index, delete, and search models fronted by a database into Solr. The results are are

$ script/console
>> Book.new(:title => "Solr in Action", :author => "Yonik & Hoss").save
=> true
>> Book.new(:title => "Lucene in Action", :author => "Otis & Erik").save
=> true
>> action_books = Book.find_by_solr("action")
=> [#<Book:0x2406db0 @attributes={"title"=>"Solr in Action", "author"=>"Yonik & Hoss", "id"=>"21"}>, #<Book:0x2406d74 @attributes= {"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]
>> action_books = Book.find_by_solr("actions")  # to show stemming
=> [#<Book:0x279ebbc @attributes={"title"=>"Solr in Action", "author"=>"Yonik & Hoss", "id"=>"21"}>, #<Book:0x279eb80 @attributes= {"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>] >> Book.find_by_solr("yonik OR otis") # to show QueryParser boolean expressions => [#<Book:0x2793adc @attributes={"title"=>"Solr in Action", "author"=>"Yonik & Hoss", "id"=>"21"}>, #<Book:0x2793aa0 @attributes= {"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]

My model looks like this:

  class Book < ActiveRecord::Base
    acts_as_solr
  end

(ain't ActiveRecord slick?!)

acts_as_solr adds save and destroy hooks. All model attributes are sent to Solr like this:

>> action_books[0].to_solr_doc.to_s
=> "<doc><field name='id'>Book:21</field><field name='type'>Book</ field><field name='pk'>21</field><field name='title_t'>Solr in Action</field><field name='author_t'>Yonik &amp; Hoss</field></doc>"

The Solr id is <model_name>:<primary_key> formatted, type field is the model name and AND'd to queries to narrow them to the requesting model, the pk field is the primary key of the database table, and the rest of the attributes are named with an _t suffix to leverage the dynamic field capability. All _t fields are copied into the default search field of "text".

At this point it is extremely basic, no configurability, and there are lots of issues to address to flesh this into something robustly general purpose. But as a proof-of-concept I'm pleased at how easy it was to write this hook.

I'd like to commit this to the Solr repository. Any objections? Once committed, folks will be able to use "script/plugin install ..." to install the Ruby side of things, and using a binary distribution of Solr's example application and a custom solr/conf directory (just for schema.xml) they'd be up and running quite quickly. If ok to commit, what directory should I put things under? How about just "ruby"?

I currently do not foresee having a lot of time to spend on this, but I do feel quite strongly that having an "acts_as_solr" hook into ActiveRecord will really lure in a lot of Rails developers. I'm sure there will be plenty that will not want a hybrid Ruby/Java environment, and for them there is the ever improving Ferret project. Ferret, however, would still need layers added on top of it to achieve all that Solr provides, so Solr is where I'm at now. Despite my time constraints, I'm volunteering to bring this prototype to a documented and easily usable state, and manage patches submitted by savvy users to make it robust.

Thoughts?

        Erik

p.s. And for the really die-hard bleeding edgers, the complete acts_as_solr code is pasted below which you can put into a Rails project in vendor/plugins/acts_as_solr.rb, along with a simple one- line require 'acts_as_solr' init.rb in vendor/plugins. Sheepishly, here's the hackery....

--------
require 'active_record'
require 'rexml/document'
require 'net/http'


def post_to_solr(body, mode = :search)
  url = URI.parse("http://localhost:8983";)
post = Net::HTTP::Post.new(mode == :search ? "/solr/select" : "/ solr/update")
  post.body = body
  post.content_type = 'application/x-www-form-urlencoded'
  response = Net::HTTP.start(url.host, url.port) do |http|
    http.request(post)
  end
  return response.body
end

module SolrMixin
  module Acts #:nodoc:
    module ARSolr #:nodoc:

      def self.included(base)
        base.extend(ClassMethods)
      end

      module ClassMethods

        def acts_as_solr(options={}, solr_options={})
#          configuration = {}
#          solr_configuration = {}
#          configuration.update(options) if options.is_a?(Hash)

# solr_configuration.update(solr_options) if solr_options.is_a?(Hash)

          after_save :solr_save
          after_destroy :solr_destroy
          include SolrMixin::Acts::ARSolr::InstanceMethods
        end


        def find_by_solr(q, options = {}, find_options = {})
          q = "(#{q}) AND type:#{self.name}"
response = post_to_solr("q=#{ERB::Util::url_encode(q)} &wt=ruby&fl=pk")
          data = eval(response)
          docs = data['response']['docs']
          return [] if docs.size == 0

          ids = docs.collect {|doc| doc['pk']}
          conditions = [ "#{self.table_name}.id in (?)", ids ]
          result = self.find(:all,
                             :conditions => conditions)
        end
      end

      module InstanceMethods
        def solr_id
          "#{self.class.name}:#{self.id}"
        end

        def solr_save
          logger.debug "solr_save: #{self.class.name} : #{self.id}"

          xml = REXML::Element.new('add')
          xml.add_element to_solr_doc
          response = post_to_solr(xml.to_s, :update)
          solr_commit
          true
        end

        # remove from index
        def solr_destroy
          logger.debug "solr_destroy: #{self.class.name} : #{self.id}"
          post_to_solr("<delete><id>#{solr_id}</id></delete>", :update)
          solr_commit
          true
        end

        def solr_commit
post_to_solr('<optimize waitFlush="false" waitSearcher="false"/>', :update)
        end

        # convert instance to Solr document
        def to_solr_doc
logger.debug "to_doc: creating doc for class: # {self.class.name}, id: #{self.id}"
          doc = REXML::Element.new('doc')

          # Solr id is <classname>:<id> to be unique across all models
          doc.add_element field("id", solr_id)
          doc.add_element field("type", self.class.name)
          doc.add_element field("pk", self.id.to_s)

          # iterate through the fields and add them to the document
          self.attributes.each_pair do |key,value|
            # _t is appended as a dynamic "text" field for Solr
doc.add_element field("#{key}_t", value.to_s) unless key.to_s == "id"
          end
          return doc
        end

        def field(name, value)
          field = REXML::Element.new("field")
          field.add_attribute("name", name)
          field.add_text(value)

          field
        end

      end
    end
  end
end

# reopen ActiveRecord and include all the above to make
# them available to all our models if they want it
ActiveRecord::Base.class_eval do
  include SolrMixin::Acts::ARSolr
end



Reply via email to