Re: Faceted search problem

Erik Hatcher Tue, 16 Jan 2007 22:37:01 -0800


On Jan 16, 2007, at 10:05 PM, Peter McPeterson wrote:

Hi all, I'm trying this solr ruby DSL called Flare/solrb and Idon't really know how the faceted search works because I cant addwhatever fields I want to to the index. This is currently not working:
conn = Solr::Connection.new('http://localhost:8983/solr')
doc = {:id => 1, :cat => 'eletronics', :features => 'video,music', :product => 'iPod'}
conn.send(Solr::Request::AddDocument.new(doc))
=> #<Solr::Response::AddDocument:0x554c2c@status_message="ERROR:unknown field 'cat'", @status_code="400",@raw_response="<result status=\"400\">ERROR:unknown field 'cat'</result>", @doc=<UNDEFINED> ... </>>
In case that if it was working, what I'd like to do is:
(pseudo-code)

request = Solr::Request::Standard.new(
:query => 'ipod',
:facets => {
 :fields => :cat
 }
)

Any help would be appreciated.

I'm copying in Ed Summers, who may not be on solr-user now, but is akey contributor to solrb at the moment also.

Good question Peter. Bear with this, as I want to detail lots hereso folks understand what is going on with solrb a bit more clearlythan svn commits and brief allusions.

There are a couple of important things to note here specificallyabout Solr itself. It is driven by a schema (see solr/solr/conf/schema.xml) which defines how fields are handled within Solr/Lucene.Solr needs to know what to do with field text when it gets it from an<add>. In the solrb version of Solr's schema, which varies from theSolr schema that ships with the Solr example application, locks downtwo only 3 field naming possiblities: id, *_text, and *_facet). Iintentionally started it as simple as I could for now, knowing thatopening up the schema is inevitable and we want to do it wisely witha bit more knowledge of how we want Ruby and Solr to interoperate.


Two relatively quick fix options to get you started:

(A) difficulty: easy Rename your non-id fields to *_text and*_facet. For example:

doc = {:id => 1, :cat_facet => 'eletronics', :features_facet=> 'video, music', :product_text => 'iPod'}

(B) difficulty: solr experienced only. You're welcome to tweakthe schema.xml and go to town with Request::AddDocument and any fieldnames you want. Be sure you know what you're doing with faceting,tokenization, and sorting though.

-- NOTE: If you're familiar with Solr, this will make sense as adifference to the Solr proper example schema --id: is mandatory, and is a unique identifier for a document, itcan be any string you like. how searchable this id is depends onwhat characters it contains. minimizing special characters makes iteasier to search for a specific id without worrying about queryparser syntax conflicts.

*_text: is tokenized and copied into the "text" field (so theclient doesn't need to/shouldn't send a "text" field, only *_textfield names). the default search field is "text" and includes textfrom all *_text fields.

*_facet: is not tokenized, and it is suitable for use with thefaceting features that Solr supports

---

The faceting feature is only starting to come together through theAPI, and so its not quite easily exposed. In fact, only earliertoday did the response handling refactoring allow for facets to beaccessed.


*** Sidebar ***

Why does the facet data come back as outside the 'response'structure? Here's an example:


{
'responseHeader'=>{
  'status'=>0,
  'QTime'=>3057,
  'params'=>{
        'wt'=>'ruby',
        'facet.limit'=>'2',
        'rows'=>'0',
        'facet.missing'=>'true',
        'start'=>'0',
        'facet'=>'true',
        'facet.field'=>[
         'subject_genre_facet',
         'subject_era_facet',
         'subject_topic_facet'],
        'indent'=>'on',
        'q'=>'[* TO *]',
        'facet.zeros'=>'true'}},
'response'=>{'numFound'=>49999,'start'=>0,'docs'=>[]
},
'facet_counts'=>{
  'facet_queries'=>{},
  'facet_fields'=>{
        'subject_genre_facet'=>{
         'Biography.'=>2605,
         'Congresses.'=>1837,
         ''=>38262},
        'subject_era_facet'=>{
         '20th century.'=>1251,
         '20th century'=>1250,
         ''=>41219},
        'subject_topic_facet'=>{
         'History.'=>2259,
         'History and criticism.'=>1769,
         ''=>15833}}}}

  (yes, i'm refactoring to add Yonik's latest facet changes in now!)
****************

Have a look at the latest API, thanks in large part to Ed's ideas onwhere a Sol.rb DSL should head:


        <http://wiki.apache.org/solr/solrb>

Here's the example pasted below:

  require 'solr'  # load the library

include Solr # Allow Solr:: to be omitted from class/modulereferences


  # connect to the solr instance

conn = Connection.new('http://localhost:8983/solr', :autocommit=> :on)


  # add a document to the index
  conn.add(:id => 123, :title_text => 'Lucene in Action')

  # update the document
  conn.update(:id => 123, :title_text => 'Solr in Action')

  # print out the first hit in a query for 'action'
  response = conn.query('action')
  print response.hits[0]

  # iterate through all the hits for 'action'
  conn.query('action') do |hit|
    puts hit.inspect
  end

  # delete document by id
  conn.delete(123)

We'll expand this short little example to include a facet or two aswell for demo purposes. I'll do that in a day or so, after I upgradesolrb to Yonik's latest trunk changes for faceting.

In order to get facets from the trunk solrb API, I'm doing thiscurrently in the Flare (unchecked in code, in a Rails action):


    field = "#{params[:value]}_facet"
    req = Solr::Request::Standard.new(:query => "[* TO *]",
       :facets => {:fields => [field],
                   :limit => -1, :zeros => false, :missing => true
                  },
       :rows => 0
    )

    results = SOLR.send(req)

    @facets = results.data['facet_counts']

In your data, name the field cat_facet and if you've indexed variouscategories, you'll see a dump of how many of each unique categorythere are in all of the data. To constrain, add :filter_queries oradjust the main :query parameters to Request::Standard.

Disclaimer: all of the API we've seen thus far is currently beingtweaked daily, as we feel our way through this. Early adopterswelcome, that want to tinker. I don't envision this being stabilizedfor a 1.0 release quality kinda thing for a couple to a few monthsand by then we'll have ironed out lots about field naming conventions(or schema.xml generation from a Ruby model perhaps, maybe even bothdynafields and schema generation are worth having). I am aimingfor field name mapping magic to occur at some layer above the rawsolrb stuff you're using now, so you're talking closer to the metalnow than most RubySolrists will be in the near future. Here's onevision of a possible straw man future: http://wiki.apache.org/solr/solrb/BrainStorming


Welcome, Peter!

        Erik

Re: Faceted search problem

Reply via email to