Re: [acts_as_solr] Few question on usage

Erik Hatcher Sat, 21 Apr 2007 18:42:50 -0700


On Apr 20, 2007, at 2:30 PM, solruser wrote:

For pure Ruby access to Solr without a database, use solr-ruby.  The
0.01 gem is available as "gem install solr-ruby", but if you can I'd
recommend you tinker with the trunk codebase too.

Well I say, considering use of solr with rails application. Whatsthe ideal

approach?.

"rails application" is a pretty broad category of applications atthis point. If we're talking about a database-backed applicationbeing searchable by Solr, I'd go for the RubyForge acts_as_solrfirst. However, I suspect that it needs work in terms offacilitating access to facets, highlighting, and other types ofcustom query handlers.

If your application is backed by other datastores, like in my cases abunch of MARC records in binary format, or a flat delimited file, aZIP file full of RDF/XML files, or even more interestingly anotherSolr instance that we wanted to repurpose in another Solr-basedapplication, then go with solr-ruby.

It's my intention to bridge this gap in the near future somehow, Ijust haven't formulated an exact plan. acts_as_solr fits nicely andvery very easily on top of solr-ruby. I envision acts_as_solr simplybeing part of solr-ruby and it'd only hook in if you haveActiveRecord installed, otherwise it'd be transparent, only taking upa few 10's of lines of code in an un-required .rb file.

The first step could be to patch the RubyForge acts_as_solr to usesolr-ruby to kick start collaboration. As for where my effort fitsinto a calendar, within the next few weeks I'll be delving into itdeeply and can speak more definitively.

Since there are many flavors floating around which is most soughtafter andsupported. And I agree that definitive version will help RORcommunity to
accept solr with much larger level of confidence.
 And since ROR application are addressing
web2.0 the need for search and collaborate information is muchhigher. So I
personally believe addressing this will definately go long way.

That's the plan! No question about it. I personally am running onall cylinders, and will make progress on these technologies as myreal-world needs require them, which is increasing all the time. Allsavvy SolRubyists are invited to jump in!

I've not documented this stuff on the wiki to the standards set bythe Solr engine itself, but there is some pretty amazing power goingon with solr-ruby right now. For example, the data mapping / indexerframework makes this easy to import a dataset into Solr using Ruby:


source = DataSource.new

mapping = {
  :id => :isbn,
  :name => :author,
  :source => "BOOKS",
  :year => Proc.new {|record| record.date[0,4] },
}

Solr::Indexer.index(source, mapper) do |orig_data, solr_document|
  solr_document[:timestamp] = Time.now
end

This showcases the simplistic data source facility (*quack* -anything that has a #each method) [with a contrived DataSource bogusclass], and the mapping capabilities. The mapping is a hash of Solrfield names to value mapping. A value mapping can be a String("BOOKS"), a Symbol (:isbn, :author) which looks up that field from(uh, #)each of the objects yielded to the each block. This lookupsimply means again *quack* that the data object needs a [] methoddefined. The Proc example is a bit more advanced Ruby voodoo forembedded a bit of code into the mapping to be executed later withactual record passed into it, and in the example it strips off thefirst four characters of the records date property. And one more bitof Ruby coolness is the do ... end block for the indexer method. Theindexer takes a data source and a mapper melding them together asdescribed, and allowing you one final chance to affect thesolr_document before it gets indexed, of course also provided theoriginal data object.

We now already have a simple mapper, an XPath mapper, and an Hpricotmapper available. We also have some handy data sources including atab-delimited file source (obsoleted in my play book by the CSVimporter now built in). I'm also using a simple custom MARC binarydata source and mapper specific to ruby-marc objects, and I just puttogether a SolrSource that takes a query (and filters) for one Solrinstance in a configurable paging way, that feeds documents returnedfrom that query successively out. Apply a mapper to that data sourceand you can pipe data from one Solr to another like this:

solr_source = Solr::Importer::SolrSource.new("http://localhost:8420/solr", "*:*", ["year:[1776 TO 1918]", 'author:smith'])

count = 0

Solr::Indexer.index(source_solr, mapper, {:debug => false, :timeout=> 120, :solr_url => "http://localhost:8983/solr"}) do |orig_data,solr_document|

  count = count + 1
  if count % 100 == 0
    puts "#{count}"
  end
end

The count junk is just to see console progress on how many recordshave been indexed.

So I'm working the Ruby/Solr thing as much as possible right now.There is something to what we've got there, but its not packaged asnicely as needed for a community to flourish, and for that Iapologize. But there is also enough goodness there now to lure folksin to want to get involved.

Right now in RoR with the Flare plugin installed, you can have acontroller that looks like this:


   class SearchController < ApplicationController
      flare
    end

And with some copy/pasting of templates (that we can build in asdefaults somehow I'm sure) you have a faceted browsing Ajax trickedout (well, inplace editor and Ajax suggest) experience with how manylines of code? (the devil is in the details though, and that is whyI don't yet recommend flare to folks that just want it to just workand also be configurable) Flare cuts a lot of corners by hard-codingsome thing that need to be made configurable, etc. Typicalprototyping approach, tinker, tinker, tinker, distill. I'm still inthe first tinker phase with Flare right now. But folks interested inrolling up their sleeves and don't mind getting a little grubby withcode are more than invited to delve into Flare now, with theforewarning that the flare you see today will not be at all near theFlare that spawns from the ashes. Pioneering spirit required.

: 3. performance benchmark for acts_as_solr plugin available if any
What kind of numbers are you after?  acts_as_solr searches Solr, and
then will fetch the records from the database to bring back model
objects, so you have to account for the database access in the
picture as well as Solr.
Well to be specific I am keen to know about creation and update ofindexes
when you run into large number of documents. Since database is used to
populate the models and definately it will be the commulativeeffect ofretrieval of document from solr with lucene, network issues (sinceits a web
service) and locally on database (depends on configuration).

Again we need to be clear about "large". I've got near 4M indexesunder my belt now, but many others have gone to 10M+. Lucene andSolr both scale very well in the 10's of millions and even further upinto the hundreds of millions I've heard.

Certainly those other latencies you mention are valid questions, butin my experience they've not been show-stopping concerns performancewith Solr + Ruby has been more than acceptable... it's been justfine, even with several spots for improvement in all those areas inmy applications. First rule of optimization: Don't. Second rule ofoptimization: Don't optimize yet.


        Erik

Re: [acts_as_solr] Few question on usage

Reply via email to