Re: index browsing with solr

2007-02-26 Thread Pierre-Yves LANDRON

Solr has a pluggable request handling framework that lets you easily
write custom logic and takes care of the xml/json/etc writing for you.
Check:

http://wiki.apache.org/solr/SolrPlugins#head-7c0d03515c496017f6c0116ebb096e34a872cb61
http://wiki.apache.org/solr/SolrRequestHandler

Since the exact term browsing mechanism you asked for is not
supported, I suggested writing your own and looking to the
IndexInfoRequestHandler as a simple starting place.

After more thought (and Yonik pointing it out), you are probably best
off if you can use faced browsing to do what you need.
http://wiki.apache.org/solr/SolrFacetingOverview

ryan


Thanks ! the plugins capacity of solr make me more happy to have choosen it, 
but after a few reading, it seems that faced browsing is the way to go to 
avoid too much programing... In fact it seems pretty easy; just perhaps a 
little too time consuming, as I'm forced to throw a query in order to get 
back the indexed terms. But at least, it works.


Thanks again to everybody for your contribution.

Pierre-Yves Landron

_
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/




Multiple instances, wiki out of date?

2007-02-26 Thread galo

Hi there,

I've been following the instruction from 
http://wiki.apache.org/solr/SolrJetty?highlight=%28Multiple%29%7C%28Solr%29%7C%28Webapps%29solr 

to get a few indexes running under the same instance of jetty 6.1.2. If 
I use the webapp descriptors as specified in the wiki (with correct 
paths, I'm just pasting the example here)..



 /*solr*1/*
 /your/path/to/the/*solr*.war
 true
 name="defaultsDescriptor">org/mortbay/jetty/servlet/webdefault.xml

 
   *solr*/home
   /your/path/to/your/*solr*/home/dir
 



 /*solr*2/*
 /your/path/to/the/*solr*.war
 true
 name="defaultsDescriptor">org/mortbay/jetty/servlet/webdefault.xml

 
   *solr*/home
   /your/path/to/your/alternate/*solr*/home/dir
 


Jetty complains that:

2007-02-26 18:36:04.874::INFO:  Logging to STDERR via 
org.mortbay.log.StdErrLog
2007-02-26 18:36:05.066::WARN:  Config error at name="addWebApplication">/solr1/*/your/path/to/the/solr.warname="extractWAR">truename="defaultsDescriptor">org/mortbay/jetty/servlet/webdefault.xmlname="addEnvEntry">solr/hometype="String">/your/path/to/your/solr/home/dir

2007-02-26 18:36:05.066::WARN:  EXCEPTION
java.lang.IllegalStateException: No Method: name="addWebApplication">/solr1/*/your/path/to/the/solr.warname="extractWAR">truename="defaultsDescriptor">org/mortbay/jetty/servlet/webdefault.xmlname="addEnvEntry">solr/hometype="String">/your/path/to/your/solr/home/dir on 
class org.mortbay.jetty.Server

   at org.mortbay.xml.XmlConfiguration.call(XmlConfiguration.java:548)
   at 
org.mortbay.xml.XmlConfiguration.configure(XmlConfiguration.java:241)
   at 
org.mortbay.xml.XmlConfiguration.configure(XmlConfiguration.java:203)

   at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:919)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

   at java.lang.reflect.Method.invoke(Method.java:585)
   at org.mortbay.start.Main.invokeMain(Main.java:183)
   at org.mortbay.start.Main.start(Main.java:497)
   at org.mortbay.start.Main.main(Main.java:115)
2007-02-26 18:36:05.068::INFO:  Shutdown hook executing
2007-02-26 18:36:05.068::INFO:  Shutdown hook complete

I've been looking at the Jetty API and it looks like those methods are 
deprecated in the latest versions of Jetty. Anyway, I can get several 
instances to run together using the descriptor shown below and several 
war files



 
   
 
 default="."/>/webapps-plus

 false
 true
 false
 default="."/>/etc/webdefault.xml

   
 


This is good enough for me but the problem then is that all point to the 
same data/index folder sharing the same index and I need them to use 
different indexes. The question is, how can you configure solr.home 
differently for each of the solr instances deployed in the webapps-plus 
folder?


It would be equally valid if there is a way of fixing the xml in the 
wiki so individual war files can be specified passing a different 
solr.home to each..


thanks,

galo.



MoreLikeThis and term vectors - documentation suggestion

2007-02-26 Thread Ken Krugler

Hi all,

I was trying out the MoreLikeThis support, and getting some odd results.

I realized that unless the fields being used for similarity 
calculation have a stored term vector, the MoreLikeThis code from 
Lucene will re-analyze the field using the StandardAnalyzer. Which, 
in my case, is quite different from what I'm using in the Solr schema.


So the first note is just for anybody using MoreLikeThis, make sure 
you also specify termVectors=true in the Solr schema for any fields 
being passed to the query as mlt.fl parameters.


The second note is that the Wiki page and the example schema might 
want to include some reference to the termVectors field attribute. 
For example, the sample schema says:



   

Re: Multiple instances, wiki out of date?

2007-02-26 Thread Chris Hostetter

Galo: are you using plain vanilla Jetty, or are you using Jetty Plus?

the examples for "Configuring Solr Home with JNDI" and "Multiple Solr
Webapps" both require Jetty Plus (because the JNDI support only exists in
the extra libraries JettyPlus provides)

That may explain the missing method call when trying to do addEnvEntry

if you can't use Jetty Plus to have JNDI support, or if the latest version
of Jetty doesn't support JNDI anymore (can't imagine that would be the
case) then you might be able to find a way to set teh solr.solr.home
system property on a per webapp basis ... how to do that in a Jetty config
may be a better question for the Jetty user community.

(if you do discover that the config syntax for JNDI has changed
significantly in the latest versions of Jetty, by all means please update
the wiki ... we'd probably want seperate sections for the different
versions since not everyone will be running the latest, but it's still
good info to have)





-Hoss



Re: Document boost not reflected in fieldNorm

2007-02-26 Thread Chris Hostetter


i just tried this with the example schema:
  1) changed the "cat" field to have omitNorms="false"
  2) edited solr.xml so there was a second  with all the same data
 except a differnet "id" and a doc boost of 2
  3) restarted port, and reindexed solr.xml

...when i search on cat:search, i definitely see the docboost comeinto
play...


2.5622776 = (MATCH) fieldWeight(cat:search in 19), product of:
  1.0 = tf(termFreq(cat:search)=1)
  2.049822 = idf(docFreq=6)
  1.25 = fieldNorm(field=cat, doc=19)


1.2811388 = (MATCH) fieldWeight(cat:search in 18), product of:
  1.0 = tf(termFreq(cat:search)=1)
  2.049822 = idf(docFreq=6)
  0.625 = fieldNorm(field=cat, doc=18)


...perhaps your docboosts (while diferent) are close enough together that
they encode to the same byte encoded value?


-Hoss



Re: MoreLikeThis and term vectors - documentation suggestion

2007-02-26 Thread Bertrand Delacretaz

On 2/26/07, Ken Krugler <[EMAIL PROTECTED]> wrote:


...I was trying out the MoreLikeThis support, and getting some odd results...


Thanks for the info, I have added a link to your message at
https://issues.apache.org/jira/browse/SOLR-69

-Bertrand


Re: Overriding Ranking in solr

2007-02-26 Thread Chris Hostetter

Your question is broad, and has a lot of potential answers...

1) Lucene has a very configurable Scoring, that allows a lot of
customiztaion -- much of the scoring formula can be tweaked just by
changing the "Similarity" class used, other more complex things can be
achieved by writing your own Query classes

2) Solr allows for a *lot* of customization using "plugins" where just
about any class you can imagine (including Similarity, custom
RequestHandlers, and new Query clases) can be loaded from a JAR you
provide at runtime...

  http://wiki.apache.org/solr/SolrPlugins

3) Solr has a special type of query called a FunctionQuery which makes
writing special Query Scoring based on numeric Document Fields really easy
... some very complicated things can be done right out of the box using
the Function Parsing supported by the SolrQueryParser...

http://lucene.apache.org/solr/api/org/apache/solr/search/QueryParsing.html#parseFunction(java.lang.String,%20org.apache.solr.schema.IndexSchema)

...but more complicated things (like distance searching) would require you
to write a simple ValueSource definining your equation, and using that
ValueSource in a FunctionQuery you constructi na custom RequestHandler.

using FunctionQuery has been discussed on several Lucene lists in the
past, there have even been some fairly in depth discussion about using
it for Geo based scoring...

http://www.nabble.com/forum/Search.jtp?query=FunctionQuery+distance&local=y&forum=44


-Hoss


Re: MoreLikeThis and term vectors - documentation suggestion

2007-02-26 Thread Mike Klaas

On 2/26/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 2/26/07, Ken Krugler <[EMAIL PROTECTED]> wrote:

> ...I was trying out the MoreLikeThis support, and getting some odd results...

Thanks for the info, I have added a link to your message at
https://issues.apache.org/jira/browse/SOLR-69


Is it possible to modify MoreLikeThis to use the schema.xml-defined
analyzer?  That's the way the highlighting code currently works (it
picks the index-time analyzer).

It woudl be nice for as many features as possible to work without term
vectors.  I sometimes wonder whether schema.xml exposes the right
level of abstraction (it is currently very lucene-guts-y).  Options
like compressed are nice as we are free to change the implementation.
canPerformMoreLikeThis=true gives us more flexibility in the future.

Then again, perhaps all that is needed is a nice table... something
like http://wiki.apache.org/solr/FieldOptionsByUseCase?

-Mike


Re: MoreLikeThis and term vectors - documentation suggestion

2007-02-26 Thread Ken Krugler

On 2/26/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 2/26/07, Ken Krugler <[EMAIL PROTECTED]> wrote:

 ...I was trying out the MoreLikeThis support, and getting some 
odd results...


Thanks for the info, I have added a link to your message at
https://issues.apache.org/jira/browse/SOLR-69


Is it possible to modify MoreLikeThis to use the schema.xml-defined
analyzer?  That's the way the highlighting code currently works (it
picks the index-time analyzer).


I looked at that briefly (passing the analyzer to use down to 
MoreLikeThis), but for my fields it's a lot more than just what 
analyzer is used, given all of the filters that are also in play.


Also the performance really stunk when I didn't use stored term vectors.


It woudl be nice for as many features as possible to work without term
vectors.  I sometimes wonder whether schema.xml exposes the right
level of abstraction (it is currently very lucene-guts-y).  Options
like compressed are nice as we are free to change the implementation.
canPerformMoreLikeThis=true gives us more flexibility in the future.

Then again, perhaps all that is needed is a nice table... something
like http://wiki.apache.org/solr/FieldOptionsByUseCase?


That would be nice, yes.

Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"