Re: Possible bug in copyField

2006-08-28 Thread Yonik Seeley

On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: By looking at what is stored.  Has this worked for others?

the "stored" value of a field is allways going to be the pre-analyzed text
-- that's why the stored values in your "text" fields still have upper
case characters and stop words.


And since the stored values will always be the same, it normally
doesn't make sense to store the targets of copyField if the sources
are also stored.

Youy can test if stemming was done by searching for a different tense
of a word in the field.

-Yonik


Re: Possible bug in copyField

2006-08-28 Thread jason rutherglen
Ok... Looks like its related to using SpanQueries (I hacked on the XML query 
code).  I remember a discussion about this issue.  Not something Solr 
specifically supports so my apologies.  However if anyone knows about this feel 
free to post something to the Lucene User list.  I will probably manually 
analyze the terms of the span query and create a stemmed span query.  Is that a 
good idea?

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Cc: jason rutherglen <[EMAIL PROTECTED]>
Sent: Monday, August 28, 2006 7:33:48 AM
Subject: Re: Possible bug in copyField

On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : By looking at what is stored.  Has this worked for others?
>
> the "stored" value of a field is allways going to be the pre-analyzed text
> -- that's why the stored values in your "text" fields still have upper
> case characters and stop words.

And since the stored values will always be the same, it normally
doesn't make sense to store the targets of copyField if the sources
are also stored.

Youy can test if stemming was done by searching for a different tense
of a word in the field.

-Yonik






Re: Possible bug in copyField

2006-08-28 Thread Erik Hatcher


On Aug 28, 2006, at 1:41 PM, jason rutherglen wrote:
Ok... Looks like its related to using SpanQueries (I hacked on the  
XML query code).  I remember a discussion about this issue.  Not  
something Solr specifically supports so my apologies.  However if  
anyone knows about this feel free to post something to the Lucene  
User list.  I will probably manually analyze the terms of the span  
query and create a stemmed span query.  Is that a good idea?


Well, query terms need to match how they were indexed :)   So it's a  
good idea in that respect.  Stemming is chock full of fun (or  
frustrating) issues like this and I don't have any easy advice, but  
certainly if you're stemming terms during indexing you'll need to  
stem them for queries.   Unless you index the original terms in a  
parallel field or in the same positions as the stemmed ones, where  
you can play with searching with or without stemming on the query side.


Erik





- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Cc: jason rutherglen <[EMAIL PROTECTED]>
Sent: Monday, August 28, 2006 7:33:48 AM
Subject: Re: Possible bug in copyField

On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: By looking at what is stored.  Has this worked for others?

the "stored" value of a field is allways going to be the pre- 
analyzed text
-- that's why the stored values in your "text" fields still have  
upper

case characters and stop words.


And since the stored values will always be the same, it normally
doesn't make sense to store the targets of copyField if the sources
are also stored.

Youy can test if stemming was done by searching for a different tense
of a word in the field.

-Yonik








Re: Possible bug in copyField

2006-08-28 Thread jason rutherglen
Could someone point me to where in the Solr code the Analyzer is applied to a 
query parser field?  

- Original Message 
From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, August 28, 2006 11:13:25 AM
Subject: Re: Possible bug in copyField


On Aug 28, 2006, at 1:41 PM, jason rutherglen wrote:
> Ok... Looks like its related to using SpanQueries (I hacked on the  
> XML query code).  I remember a discussion about this issue.  Not  
> something Solr specifically supports so my apologies.  However if  
> anyone knows about this feel free to post something to the Lucene  
> User list.  I will probably manually analyze the terms of the span  
> query and create a stemmed span query.  Is that a good idea?

Well, query terms need to match how they were indexed :)   So it's a  
good idea in that respect.  Stemming is chock full of fun (or  
frustrating) issues like this and I don't have any easy advice, but  
certainly if you're stemming terms during indexing you'll need to  
stem them for queries.   Unless you index the original terms in a  
parallel field or in the same positions as the stemmed ones, where  
you can play with searching with or without stemming on the query side.

Erik



>
> - Original Message 
> From: Yonik Seeley <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Cc: jason rutherglen <[EMAIL PROTECTED]>
> Sent: Monday, August 28, 2006 7:33:48 AM
> Subject: Re: Possible bug in copyField
>
> On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>>
>> : By looking at what is stored.  Has this worked for others?
>>
>> the "stored" value of a field is allways going to be the pre- 
>> analyzed text
>> -- that's why the stored values in your "text" fields still have  
>> upper
>> case characters and stop words.
>
> And since the stored values will always be the same, it normally
> doesn't make sense to store the targets of copyField if the sources
> are also stored.
>
> Youy can test if stemming was done by searching for a different tense
> of a word in the field.
>
> -Yonik
>
>
>
>







Re: Possible bug in copyField

2006-08-28 Thread Yonik Seeley

On 8/28/06, jason rutherglen <[EMAIL PROTECTED]> wrote:

Could someone point me to where in the Solr code the Analyzer is applied to a 
query parser field?


The lucene query parser normally does analysis.  It also does things
like making phrase queries from field that return multiple tokens.

-Yonik


Simplest way to load a custom RequestHandler in Jetty?

2006-08-28 Thread Chris Hostetter

For my Apachecon talk, I've created a few very simple "demo"
RequestHandlers to show off some various functionality ... I started
developing these in my solr SVN tree, so it wasn't untill today that it
occured to me I had no idea what was needed in Jetty to add an arbitrary
jar to the solr.war classpath at runtime.

My goal being to provide:
  * the java source
  * a precompiled jar
  * a solrconfig.xml that registers/refrences these RequestHandlers
  * simple instructions for downloading a nightly build, replacing the
config, copying the jar to someplace magical, and starting up the
server.

(For me giving the talk, i could just compile the code right into the
solr.war, or use Resin which I'm more familiar with -- but i was hoping ot
have an easy way that anyone could recreate the my demo using the default
example)

things I've already tried...

  * java -cp .:my.jar -jar start.jar
  * putting my.jar in the example/lib directory
  * putting my.jar in the example/ext directory (this has a differnet
problem - the jar is loaded before the webapp so it can't resolve
dependencies like "SolrRequestHandler")
  * modifing the jetty.xml to include something like this...
  
/hoss
./webapps/solr.war

true
/etc/webdefault.xml
false

  my.jar

  

  ...i got that last idea from here, based on the timeline, this fix
should be in the Jetty5.1.11 that we're using, but it doesn't seem to work
for me.


Has anyone out there gotten Jetty to load their custom RequestHandlers,
Analyzers, or Similarities? ... even if you haven't do you have any
suggestions on how to do it cleanly? (ie: without deconstructing the war
and injecting my jar)

-Hoss



acts_as_solr

2006-08-28 Thread Erik Hatcher
I've spent a few hours tinkering with an Ruby ActiveRecord plugin to  
index, delete, and search models fronted by a database into Solr.   
The results are are


$ script/console
>> Book.new(:title => "Solr in Action", :author => "Yonik & Hoss").save
=> true
>> Book.new(:title => "Lucene in Action", :author => "Otis &  
Erik").save

=> true
>> action_books = Book.find_by_solr("action")
=> [#"Solr in Action",  
"author"=>"Yonik & Hoss", "id"=>"21"}>, #{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]

>> action_books = Book.find_by_solr("actions")  # to show stemming
=> [#"Solr in Action",  
"author"=>"Yonik & Hoss", "id"=>"21"}>, #{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]
>> Book.find_by_solr("yonik OR otis") # to show QueryParser boolean  
expressions
=> [#"Solr in Action",  
"author"=>"Yonik & Hoss", "id"=>"21"}>, #{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>]


My model looks like this:

  class Book < ActiveRecord::Base
acts_as_solr
  end

(ain't ActiveRecord slick?!)

acts_as_solr adds save and destroy hooks.  All model attributes are  
sent to Solr like this:


>> action_books[0].to_solr_doc.to_s
=> "Book:21Bookfield>21Solr in  
ActionYonik & Hoss"


The Solr id is : formatted, type field is  
the model name and AND'd to queries to narrow them to the requesting  
model, the pk field is the primary key of the database table, and the  
rest of the attributes are named with an _t suffix to leverage the  
dynamic field capability.  All _t fields are copied into the default  
search field of "text".


At this point it is extremely basic, no configurability, and there  
are lots of issues to address to flesh this into something robustly  
general purpose.  But as a proof-of-concept I'm pleased at how easy  
it was to write this hook.


I'd like to commit this to the Solr repository.  Any objections?   
Once committed, folks will be able to use "script/plugin install ..."  
to install the Ruby side of things, and using a binary distribution  
of Solr's example application and a custom solr/conf directory (just  
for schema.xml) they'd be up and running quite quickly.  If ok to  
commit, what directory should I put things under?  How about just  
"ruby"?


I currently do not foresee having a lot of time to spend on this, but  
I do feel quite strongly that having an "acts_as_solr" hook into  
ActiveRecord will really lure in a lot of Rails developers.  I'm sure  
there will be plenty that will not want a hybrid Ruby/Java  
environment, and for them there is the ever improving Ferret  
project.  Ferret, however, would still need layers added on top of it  
to achieve all that Solr provides, so Solr is where I'm at now.   
Despite my time constraints, I'm volunteering to bring this prototype  
to a documented and easily usable state, and manage patches submitted  
by savvy users to make it robust.


Thoughts?

Erik

p.s. And for the really die-hard bleeding edgers, the complete  
acts_as_solr code is pasted below which you can put into a Rails  
project in vendor/plugins/acts_as_solr.rb, along with a simple one- 
line require 'acts_as_solr' init.rb in vendor/plugins.  Sheepishly,  
here's the hackery



require 'active_record'
require 'rexml/document'
require 'net/http'


def post_to_solr(body, mode = :search)
  url = URI.parse("http://localhost:8983";)
  post = Net::HTTP::Post.new(mode == :search ? "/solr/select" : "/ 
solr/update")

  post.body = body
  post.content_type = 'application/x-www-form-urlencoded'
  response = Net::HTTP.start(url.host, url.port) do |http|
http.request(post)
  end
  return response.body
end

module SolrMixin
  module Acts #:nodoc:
module ARSolr #:nodoc:

  def self.included(base)
base.extend(ClassMethods)
  end

  module ClassMethods

def acts_as_solr(options={}, solr_options={})
#  configuration = {}
#  solr_configuration = {}
#  configuration.update(options) if options.is_a?(Hash)

#  solr_configuration.update(solr_options) if  
solr_options.is_a?(Hash)


  after_save :solr_save
  after_destroy :solr_destroy
  include SolrMixin::Acts::ARSolr::InstanceMethods
end


def find_by_solr(q, options = {}, find_options = {})
  q = "(#{q}) AND type:#{self.name}"
  response = post_to_solr("q=#{ERB::Util::url_encode(q)} 
&wt=ruby&fl=pk")

  data = eval(response)
  docs = data['response']['docs']
  return [] if docs.size == 0

  ids = docs.collect {|doc| doc['pk']}
  conditions = [ "#{self.table_name}.id in (?)", ids ]
  result = self.find(:all,
 :conditions => conditions)
end
  end

  module InstanceMethods
def solr_id
  "#{self.class.name}:#{self.id}"
end

def solr_save
  logger.debug "solr_save: #{self.class.name} : #{self.id}"

  xml = RE

Re: Simplest way to load a custom RequestHandler in Jetty?

2006-08-28 Thread Erik Hatcher


On Aug 28, 2006, at 9:25 PM, Chris Hostetter wrote:

things I've already tried...

  * java -cp .:my.jar -jar start.jar
  * putting my.jar in the example/lib directory
  * putting my.jar in the example/ext directory (this has a differnet
problem - the jar is loaded before the webapp so it can't resolve
dependencies like "SolrRequestHandler")
  * modifing the jetty.xml to include something like this...
  
/hoss
./webapps/solr.war

true
name="jetty.home" default="."/>/etc/webdefault.xml

false

  my.jar

  

  ...i got that last idea from here, based on the timeline, this fix
should be in the Jetty5.1.11 that we're using, but it doesn't seem  
to work

for me.


I don't think Jetty supports hot reload (correct me if I'm wrong).   
You could pull this off with Tomcat though.  I did that kind of stuff  
for lots of Tapestry talks with Tomcat running and setting my IDE to  
compile classes right into WEB-INF/classes, waiting for the context  
to reload and hitting refresh in the browser.


Has anyone out there gotten Jetty to load their custom  
RequestHandlers,

Analyzers, or Similarities? ... even if you haven't do you have any
suggestions on how to do it cleanly? (ie: without deconstructing  
the war

and injecting my jar)


Why not deconstruct the WAR?   Here's how I have my development  
environment (and production too) set up with a little bit of Ant, and  
solr.war checked into our repository as lib/solr.war.  It only takes  
me a few seconds to go from saving a change to a .java file to being  
up and running in Solr: