Re: Possible bug in copyField
On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : By looking at what is stored. Has this worked for others? the "stored" value of a field is allways going to be the pre-analyzed text -- that's why the stored values in your "text" fields still have upper case characters and stop words. And since the stored values will always be the same, it normally doesn't make sense to store the targets of copyField if the sources are also stored. Youy can test if stemming was done by searching for a different tense of a word in the field. -Yonik
Re: Possible bug in copyField
Ok... Looks like its related to using SpanQueries (I hacked on the XML query code). I remember a discussion about this issue. Not something Solr specifically supports so my apologies. However if anyone knows about this feel free to post something to the Lucene User list. I will probably manually analyze the terms of the span query and create a stemmed span query. Is that a good idea? - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Cc: jason rutherglen <[EMAIL PROTECTED]> Sent: Monday, August 28, 2006 7:33:48 AM Subject: Re: Possible bug in copyField On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : By looking at what is stored. Has this worked for others? > > the "stored" value of a field is allways going to be the pre-analyzed text > -- that's why the stored values in your "text" fields still have upper > case characters and stop words. And since the stored values will always be the same, it normally doesn't make sense to store the targets of copyField if the sources are also stored. Youy can test if stemming was done by searching for a different tense of a word in the field. -Yonik
Re: Possible bug in copyField
On Aug 28, 2006, at 1:41 PM, jason rutherglen wrote: Ok... Looks like its related to using SpanQueries (I hacked on the XML query code). I remember a discussion about this issue. Not something Solr specifically supports so my apologies. However if anyone knows about this feel free to post something to the Lucene User list. I will probably manually analyze the terms of the span query and create a stemmed span query. Is that a good idea? Well, query terms need to match how they were indexed :) So it's a good idea in that respect. Stemming is chock full of fun (or frustrating) issues like this and I don't have any easy advice, but certainly if you're stemming terms during indexing you'll need to stem them for queries. Unless you index the original terms in a parallel field or in the same positions as the stemmed ones, where you can play with searching with or without stemming on the query side. Erik - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Cc: jason rutherglen <[EMAIL PROTECTED]> Sent: Monday, August 28, 2006 7:33:48 AM Subject: Re: Possible bug in copyField On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : By looking at what is stored. Has this worked for others? the "stored" value of a field is allways going to be the pre- analyzed text -- that's why the stored values in your "text" fields still have upper case characters and stop words. And since the stored values will always be the same, it normally doesn't make sense to store the targets of copyField if the sources are also stored. Youy can test if stemming was done by searching for a different tense of a word in the field. -Yonik
Re: Possible bug in copyField
Could someone point me to where in the Solr code the Analyzer is applied to a query parser field? - Original Message From: Erik Hatcher <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, August 28, 2006 11:13:25 AM Subject: Re: Possible bug in copyField On Aug 28, 2006, at 1:41 PM, jason rutherglen wrote: > Ok... Looks like its related to using SpanQueries (I hacked on the > XML query code). I remember a discussion about this issue. Not > something Solr specifically supports so my apologies. However if > anyone knows about this feel free to post something to the Lucene > User list. I will probably manually analyze the terms of the span > query and create a stemmed span query. Is that a good idea? Well, query terms need to match how they were indexed :) So it's a good idea in that respect. Stemming is chock full of fun (or frustrating) issues like this and I don't have any easy advice, but certainly if you're stemming terms during indexing you'll need to stem them for queries. Unless you index the original terms in a parallel field or in the same positions as the stemmed ones, where you can play with searching with or without stemming on the query side. Erik > > - Original Message > From: Yonik Seeley <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Cc: jason rutherglen <[EMAIL PROTECTED]> > Sent: Monday, August 28, 2006 7:33:48 AM > Subject: Re: Possible bug in copyField > > On 8/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: >> >> : By looking at what is stored. Has this worked for others? >> >> the "stored" value of a field is allways going to be the pre- >> analyzed text >> -- that's why the stored values in your "text" fields still have >> upper >> case characters and stop words. > > And since the stored values will always be the same, it normally > doesn't make sense to store the targets of copyField if the sources > are also stored. > > Youy can test if stemming was done by searching for a different tense > of a word in the field. > > -Yonik > > > >
Re: Possible bug in copyField
On 8/28/06, jason rutherglen <[EMAIL PROTECTED]> wrote: Could someone point me to where in the Solr code the Analyzer is applied to a query parser field? The lucene query parser normally does analysis. It also does things like making phrase queries from field that return multiple tokens. -Yonik
Simplest way to load a custom RequestHandler in Jetty?
For my Apachecon talk, I've created a few very simple "demo" RequestHandlers to show off some various functionality ... I started developing these in my solr SVN tree, so it wasn't untill today that it occured to me I had no idea what was needed in Jetty to add an arbitrary jar to the solr.war classpath at runtime. My goal being to provide: * the java source * a precompiled jar * a solrconfig.xml that registers/refrences these RequestHandlers * simple instructions for downloading a nightly build, replacing the config, copying the jar to someplace magical, and starting up the server. (For me giving the talk, i could just compile the code right into the solr.war, or use Resin which I'm more familiar with -- but i was hoping ot have an easy way that anyone could recreate the my demo using the default example) things I've already tried... * java -cp .:my.jar -jar start.jar * putting my.jar in the example/lib directory * putting my.jar in the example/ext directory (this has a differnet problem - the jar is loaded before the webapp so it can't resolve dependencies like "SolrRequestHandler") * modifing the jetty.xml to include something like this... /hoss ./webapps/solr.war true /etc/webdefault.xml false my.jar ...i got that last idea from here, based on the timeline, this fix should be in the Jetty5.1.11 that we're using, but it doesn't seem to work for me. Has anyone out there gotten Jetty to load their custom RequestHandlers, Analyzers, or Similarities? ... even if you haven't do you have any suggestions on how to do it cleanly? (ie: without deconstructing the war and injecting my jar) -Hoss
acts_as_solr
I've spent a few hours tinkering with an Ruby ActiveRecord plugin to index, delete, and search models fronted by a database into Solr. The results are are $ script/console >> Book.new(:title => "Solr in Action", :author => "Yonik & Hoss").save => true >> Book.new(:title => "Lucene in Action", :author => "Otis & Erik").save => true >> action_books = Book.find_by_solr("action") => [#"Solr in Action", "author"=>"Yonik & Hoss", "id"=>"21"}>, #{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>] >> action_books = Book.find_by_solr("actions") # to show stemming => [#"Solr in Action", "author"=>"Yonik & Hoss", "id"=>"21"}>, #{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>] >> Book.find_by_solr("yonik OR otis") # to show QueryParser boolean expressions => [#"Solr in Action", "author"=>"Yonik & Hoss", "id"=>"21"}>, #{"title"=>"Lucene in Action", "author"=>"Otis & Erik", "id"=>"22"}>] My model looks like this: class Book < ActiveRecord::Base acts_as_solr end (ain't ActiveRecord slick?!) acts_as_solr adds save and destroy hooks. All model attributes are sent to Solr like this: >> action_books[0].to_solr_doc.to_s => "Book:21Bookfield>21Solr in ActionYonik & Hoss" The Solr id is : formatted, type field is the model name and AND'd to queries to narrow them to the requesting model, the pk field is the primary key of the database table, and the rest of the attributes are named with an _t suffix to leverage the dynamic field capability. All _t fields are copied into the default search field of "text". At this point it is extremely basic, no configurability, and there are lots of issues to address to flesh this into something robustly general purpose. But as a proof-of-concept I'm pleased at how easy it was to write this hook. I'd like to commit this to the Solr repository. Any objections? Once committed, folks will be able to use "script/plugin install ..." to install the Ruby side of things, and using a binary distribution of Solr's example application and a custom solr/conf directory (just for schema.xml) they'd be up and running quite quickly. If ok to commit, what directory should I put things under? How about just "ruby"? I currently do not foresee having a lot of time to spend on this, but I do feel quite strongly that having an "acts_as_solr" hook into ActiveRecord will really lure in a lot of Rails developers. I'm sure there will be plenty that will not want a hybrid Ruby/Java environment, and for them there is the ever improving Ferret project. Ferret, however, would still need layers added on top of it to achieve all that Solr provides, so Solr is where I'm at now. Despite my time constraints, I'm volunteering to bring this prototype to a documented and easily usable state, and manage patches submitted by savvy users to make it robust. Thoughts? Erik p.s. And for the really die-hard bleeding edgers, the complete acts_as_solr code is pasted below which you can put into a Rails project in vendor/plugins/acts_as_solr.rb, along with a simple one- line require 'acts_as_solr' init.rb in vendor/plugins. Sheepishly, here's the hackery require 'active_record' require 'rexml/document' require 'net/http' def post_to_solr(body, mode = :search) url = URI.parse("http://localhost:8983";) post = Net::HTTP::Post.new(mode == :search ? "/solr/select" : "/ solr/update") post.body = body post.content_type = 'application/x-www-form-urlencoded' response = Net::HTTP.start(url.host, url.port) do |http| http.request(post) end return response.body end module SolrMixin module Acts #:nodoc: module ARSolr #:nodoc: def self.included(base) base.extend(ClassMethods) end module ClassMethods def acts_as_solr(options={}, solr_options={}) # configuration = {} # solr_configuration = {} # configuration.update(options) if options.is_a?(Hash) # solr_configuration.update(solr_options) if solr_options.is_a?(Hash) after_save :solr_save after_destroy :solr_destroy include SolrMixin::Acts::ARSolr::InstanceMethods end def find_by_solr(q, options = {}, find_options = {}) q = "(#{q}) AND type:#{self.name}" response = post_to_solr("q=#{ERB::Util::url_encode(q)} &wt=ruby&fl=pk") data = eval(response) docs = data['response']['docs'] return [] if docs.size == 0 ids = docs.collect {|doc| doc['pk']} conditions = [ "#{self.table_name}.id in (?)", ids ] result = self.find(:all, :conditions => conditions) end end module InstanceMethods def solr_id "#{self.class.name}:#{self.id}" end def solr_save logger.debug "solr_save: #{self.class.name} : #{self.id}" xml = RE
Re: Simplest way to load a custom RequestHandler in Jetty?
On Aug 28, 2006, at 9:25 PM, Chris Hostetter wrote: things I've already tried... * java -cp .:my.jar -jar start.jar * putting my.jar in the example/lib directory * putting my.jar in the example/ext directory (this has a differnet problem - the jar is loaded before the webapp so it can't resolve dependencies like "SolrRequestHandler") * modifing the jetty.xml to include something like this... /hoss ./webapps/solr.war true name="jetty.home" default="."/>/etc/webdefault.xml false my.jar ...i got that last idea from here, based on the timeline, this fix should be in the Jetty5.1.11 that we're using, but it doesn't seem to work for me. I don't think Jetty supports hot reload (correct me if I'm wrong). You could pull this off with Tomcat though. I did that kind of stuff for lots of Tapestry talks with Tomcat running and setting my IDE to compile classes right into WEB-INF/classes, waiting for the context to reload and hitting refresh in the browser. Has anyone out there gotten Jetty to load their custom RequestHandlers, Analyzers, or Similarities? ... even if you haven't do you have any suggestions on how to do it cleanly? (ie: without deconstructing the war and injecting my jar) Why not deconstruct the WAR? Here's how I have my development environment (and production too) set up with a little bit of Ant, and solr.war checked into our repository as lib/solr.war. It only takes me a few seconds to go from saving a change to a .java file to being up and running in Solr: