Re: Index & search questions; special cases
: > : Nutch has phrase pre-filtering which helps with this. It indexes the : > : phrase fragments as separate terms and uses that set of matches to : > : filter the set of matching documents. : > That reminds me ... i seem to remember someone saying once that Nutch lso : > builds word based n-grams out of it's stop words, so searches on "the" : > or "on" won't match anything because those words are never indexed as a : > single tokens, but if a document contains "the dog in the house" it would : > match a search on "in the" because the Analyzer would treat that as a : > single token "in_the". : This looks like exactly what I'm looking for. Is it related to the above : 'nutch pre-filtering'? This way if I stopword single letters and : numbers, it would still index 'hepatitis_a' as a single token, and match : a search on 'hepatitis a' (non-phrase search) without hitting 'a patient : has hepatitis'? I guess i'd have to apply the filter to the query too, : so it turns the query into hepatitis_a? right ... i think we were both talking baout the same feature, which Otis says is in Nutch's "CommonGrams" class... http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/analysis/CommonGrams.java?view=markup : Any chance at all this kind of filter gets implemented into solr? If : not, indications on how to do it myself would be appreciated - I can't CommonGrams itself seems to have some other dependencies on nutch because of other utilities in the same class, but based on a quick skim, what you really want is the nested "private static class Filter extends TokenFilter" which doesn't really have any external dependencies. If you extract that class into some more specificly named "CommonGramsFilter", all you need after that to use it in Solr is a simple little "FilterFactory" so you can refrence it in your schema.xml ... you can use the StopFilterFactory as a template since you'll need exactly the same initalization (get the name of a word list file from the init params, parse it, and build a word set out of it)... http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/analysis/StopFilterFactory.java?view=markup ...all you really need to change is that the "create" method should return a new "CommonGramsFilter" instead of a StopFilter. Incidently: most of the code in CommonGrams.Filter seems to be dealing with the buffering of tokens ... it may be easier to reimpliment the logic with Solr's BufferedTokenStream as a base class. -Hoss
Re: Index & search questions; special cases
On Nov 14, 2006, at 2:00 PM, Chris Hostetter wrote: CommonGrams itself seems to have some other dependencies on nutch because of other utilities in the same class, but based on a quick skim, what you really want is the nested "private static class Filter extends TokenFilter" which doesn't really have any external dependencies. If you extract that class into some more specificly named "CommonGramsFilter",... Yeah, the Nutch code is highly intertwined with its unique configuration infrastructure and makes it hard to pull pieces of it out like this. Erik
New Feature: ${solr.home}/lib/ dir for "plugins"
Hey folks, I just wanted to let you all know about a new feature just committed yesterday (now available in the solr-2006-11-15 nightly build). While Solr has always had some really handy hooks for loading your own code to do analysis, request handlers, output writers, field types, cache implementations, and all sorts of other goodies, It wasn't until I started trying to document how to *use* custom code in various Servlet Containers that I realized it was pretty much impossible unless you: a) repackaged the Solr WAR with your code inside. or b) used Caucho Resin. (guess which servlet container I've been using) Which brings us to the new functionality: It's now possible to create a "lib/" directory inside of your Solr Home directory, and place JARs in that lib directory containing custom code you'd like to use -- Solr will look at JARs in this directory when resolving class references you may have in your solrconfig.xml and schema.xml. This has been tested in a variety of different servlet containers, and seems to work fine -- but I'd like to request that anyone currently "repacking" the solr.war to include any other code try using this new functionality instead, and report any bugs you may encounter (along with info about your OS, the Servlet Container you are using, the relevant configuration files, etc...) Repacking the solr.war to include your code will always be an option available to Solr users who want a single WAR containing everything they care about, but the goal of this new functionality is to make it easier to use custom plugins without needing to jump through the repacking hoops. More information about Plugins can be found on the wiki... http://wiki.apache.org/solr/SolrPlugins More information about this feature can be found in Jira... http://issues.apache.org/jira/browse/SOLR-68 -Hoss
Re: New Feature: ${solr.home}/lib/ dir for "plugins"
Extremely cool. This is going to be a big help for some things I'm working on and I'm sure for others. Many thanks!! On 11/14/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Hey folks, I just wanted to let you all know about a new feature just committed yesterday (now available in the solr-2006-11-15 nightly build). While Solr has always had some really handy hooks for loading your own code to do analysis, request handlers, output writers, field types, cache implementations, and all sorts of other goodies, It wasn't until I started trying to document how to *use* custom code in various Servlet Containers that I realized it was pretty much impossible unless you: a) repackaged the Solr WAR with your code inside. or b) used Caucho Resin. (guess which servlet container I've been using) Which brings us to the new functionality: It's now possible to create a "lib/" directory inside of your Solr Home directory, and place JARs in that lib directory containing custom code you'd like to use -- Solr will look at JARs in this directory when resolving class references you may have in your solrconfig.xml and schema.xml. This has been tested in a variety of different servlet containers, and seems to work fine -- but I'd like to request that anyone currently "repacking" the solr.war to include any other code try using this new functionality instead, and report any bugs you may encounter (along with info about your OS, the Servlet Container you are using, the relevant configuration files, etc...) Repacking the solr.war to include your code will always be an option available to Solr users who want a single WAR containing everything they care about, but the goal of this new functionality is to make it easier to use custom plugins without needing to jump through the repacking hoops. More information about Plugins can be found on the wiki... http://wiki.apache.org/solr/SolrPlugins More information about this feature can be found in Jira... http://issues.apache.org/jira/browse/SOLR-68 -Hoss