On Jan 16, 2007, at 10:05 PM, Peter McPeterson wrote:
Hi all, I'm trying this solr ruby DSL called Flare/solrb and I don't really know how the faceted search works because I cant add whatever fields I want to to the index. This is currently not working:

conn = Solr::Connection.new('http://localhost:8983/solr')
doc = {:id => 1, :cat => 'eletronics', :features => 'video, music', :product => 'iPod'}
conn.send(Solr::Request::AddDocument.new(doc))
=> #<Solr::Response::AddDocument:0x554c2c @status_message="ERROR:unknown field 'cat'", @status_code="400", @raw_response="<result status=\"400\">ERROR:unknown field 'cat'</ result>", @doc=<UNDEFINED> ... </>>

In case that if it was working, what I'd like to do is:
(pseudo-code)

request = Solr::Request::Standard.new(
:query => 'ipod',
:facets => {
 :fields => :cat
 }
)

Any help would be appreciated.

I'm copying in Ed Summers, who may not be on solr-user now, but is a key contributor to solrb at the moment also.

Good question Peter. Bear with this, as I want to detail lots here so folks understand what is going on with solrb a bit more clearly than svn commits and brief allusions.

There are a couple of important things to note here specifically about Solr itself. It is driven by a schema (see solr/solr/conf/ schema.xml) which defines how fields are handled within Solr/Lucene. Solr needs to know what to do with field text when it gets it from an <add>. In the solrb version of Solr's schema, which varies from the Solr schema that ships with the Solr example application, locks down two only 3 field naming possiblities: id, *_text, and *_facet). I intentionally started it as simple as I could for now, knowing that opening up the schema is inevitable and we want to do it wisely with a bit more knowledge of how we want Ruby and Solr to interoperate.

Two relatively quick fix options to get you started:

(A) difficulty: easy Rename your non-id fields to *_text and *_facet. For example:

doc = {:id => 1, :cat_facet => 'eletronics', :features_facet => 'video, music', :product_text => 'iPod'}

(B) difficulty: solr experienced only. You're welcome to tweak the schema.xml and go to town with Request::AddDocument and any field names you want. Be sure you know what you're doing with faceting, tokenization, and sorting though.

-- NOTE: If you're familiar with Solr, this will make sense as a difference to the Solr proper example schema -- id: is mandatory, and is a unique identifier for a document, it can be any string you like. how searchable this id is depends on what characters it contains. minimizing special characters makes it easier to search for a specific id without worrying about query parser syntax conflicts.

*_text: is tokenized and copied into the "text" field (so the client doesn't need to/shouldn't send a "text" field, only *_text field names). the default search field is "text" and includes text from all *_text fields.

*_facet: is not tokenized, and it is suitable for use with the faceting features that Solr supports
---

The faceting feature is only starting to come together through the API, and so its not quite easily exposed. In fact, only earlier today did the response handling refactoring allow for facets to be accessed.

*** Sidebar ***
Why does the facet data come back as outside the 'response' structure? Here's an example:

{
'responseHeader'=>{
  'status'=>0,
  'QTime'=>3057,
  'params'=>{
        'wt'=>'ruby',
        'facet.limit'=>'2',
        'rows'=>'0',
        'facet.missing'=>'true',
        'start'=>'0',
        'facet'=>'true',
        'facet.field'=>[
         'subject_genre_facet',
         'subject_era_facet',
         'subject_topic_facet'],
        'indent'=>'on',
        'q'=>'[* TO *]',
        'facet.zeros'=>'true'}},
'response'=>{'numFound'=>49999,'start'=>0,'docs'=>[]
},
'facet_counts'=>{
  'facet_queries'=>{},
  'facet_fields'=>{
        'subject_genre_facet'=>{
         'Biography.'=>2605,
         'Congresses.'=>1837,
         ''=>38262},
        'subject_era_facet'=>{
         '20th century.'=>1251,
         '20th century'=>1250,
         ''=>41219},
        'subject_topic_facet'=>{
         'History.'=>2259,
         'History and criticism.'=>1769,
         ''=>15833}}}}

  (yes, i'm refactoring to add Yonik's latest facet changes in now!)
****************

Have a look at the latest API, thanks in large part to Ed's ideas on where a Sol.rb DSL should head:

        <http://wiki.apache.org/solr/solrb>

Here's the example pasted below:

  require 'solr'  # load the library
include Solr # Allow Solr:: to be omitted from class/module references

  # connect to the solr instance
conn = Connection.new('http://localhost:8983/solr', :autocommit => :on)

  # add a document to the index
  conn.add(:id => 123, :title_text => 'Lucene in Action')

  # update the document
  conn.update(:id => 123, :title_text => 'Solr in Action')

  # print out the first hit in a query for 'action'
  response = conn.query('action')
  print response.hits[0]

  # iterate through all the hits for 'action'
  conn.query('action') do |hit|
    puts hit.inspect
  end

  # delete document by id
  conn.delete(123)

We'll expand this short little example to include a facet or two as well for demo purposes. I'll do that in a day or so, after I upgrade solrb to Yonik's latest trunk changes for faceting.

In order to get facets from the trunk solrb API, I'm doing this currently in the Flare (unchecked in code, in a Rails action):

    field = "#{params[:value]}_facet"
    req = Solr::Request::Standard.new(:query => "[* TO *]",
       :facets => {:fields => [field],
                   :limit => -1, :zeros => false, :missing => true
                  },
       :rows => 0
    )

    results = SOLR.send(req)

    @facets = results.data['facet_counts']

In your data, name the field cat_facet and if you've indexed various categories, you'll see a dump of how many of each unique category there are in all of the data. To constrain, add :filter_queries or adjust the main :query parameters to Request::Standard.

Disclaimer: all of the API we've seen thus far is currently being tweaked daily, as we feel our way through this. Early adopters welcome, that want to tinker. I don't envision this being stabilized for a 1.0 release quality kinda thing for a couple to a few months and by then we'll have ironed out lots about field naming conventions (or schema.xml generation from a Ruby model perhaps, maybe even both dynafields and schema generation are worth having). I am aiming for field name mapping magic to occur at some layer above the raw solrb stuff you're using now, so you're talking closer to the metal now than most RubySolrists will be in the near future. Here's one vision of a possible straw man future: http://wiki.apache.org/solr/ solrb/BrainStorming

Welcome, Peter!

        Erik

Reply via email to