XML output for Analysis admin functionality

2007-08-28 Thread Stephanie Belton
Hi,

 

I need to programmatically put search terms through the query analyser and
retrieve the result. I thought the easiest way to do this would be to call
the existing /solr/admin/analysis.jsp, but it would be so much nicer if
there was a XML version of it.

 

I noticed that there is an analysis.xsl file in src/webapp/resources/admin/
which seems to indicate that something was done in that respect but can't
find any documentation on it.

 

I have found this:

http://issues.apache.org/jira/browse/SOLR-58

 

and this:

http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200612.mbox/raw/%3C
[EMAIL PROTECTED]/

 

What is the current status on having and XML interface to the admin (or
analysis for me at least!). If it was done, how do I access it?

 

Many thanks

Stephanie



sort problem

2007-08-28 Thread michael ravits
hello solrs,

i have an index with 30M records, weights ~50GB. latest trunk version. heap 
size 1024mb.
queries work fine until I specify a field to sort results by. even if the 
result set consists of only 2 documents,  the CPU jumps high and after about 5 
minutes I get the following exception:

Any idea?
thanks

Java heap space

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:170)
at 
org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:166)
at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:153)
at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54)
at org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:429)
at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:380)
at 
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:383)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:350)
at 
org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:266)
at 
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:182)
at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)
at 
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:155)
at 
org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:862)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:810)
at 
org.apache.solr.search.SolrIndexSearcher.getDocList(SolrIndexSearcher.java:703)
at 
org.apache.solr.handler.StandardRequestHandler.handleRequestBody(StandardRequestHandler.java:125)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:723)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:193)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:161)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)


   
-
Looking for a deal? Find great prices on flights and hotels with Yahoo! 
FareChase.

Re: Embedded about 50% faster for indexing

2007-08-28 Thread Walter Underwood
No need to run a separate web server. I actually do HTTP updates from
an extra servlet configured into the Solr webserver. It might
seem a little odd, but same-system TCP sockets are extremely fast
and low overhead.

The additional flexibility is nice, too. If I find a bug in the
indexing code in production, I can fix it locally and update
from the fixed copy over HTTP while I wait for a push of code
to production.

Modern HTTP and TCP are very fast and very reliable, so don't
count out the HTTP/XML interface before trying it.

wunder
==
Search Guy
Netflix

On 8/27/07 9:18 PM, "climbingrose" <[EMAIL PROTECTED]> wrote:

> Agree. I was actually thinking of developing the embedded version early this
> year for one of my projects. I'm sure it will be needed in cases where
> running another web server is an overkill.
> 
> On 8/28/07, Jonathan Woods <[EMAIL PROTECTED]> wrote:
>> 
>> I don't think you should apologise for highlighting embedded usage.  For
>> circumstances in which you're at liberty to run a Solr instance in the
>> same
>> JVM as an app which uses it, I find it very strange that you should have
>> to
>> use anything _other_ than embedded, and jump through all the unnecessary
>> hoops (XML conversion, HTTP transport) that this implies.  It's a bit like
>> suggesting you should throw away Java method invocations altogether, and
>> write everything in XML-RPC.
>> 
>> Bit of a pet issue of mine!  I'll be creating a JIRA issue on the subject
>> soon.
>> 
>> Jon
>> 
>>> -Original Message-
>>> From: Sundling, Paul [mailto:[EMAIL PROTECTED]
>>> Sent: 28 August 2007 03:24
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Embedded about 50% faster for indexing
>>> 
>>> At this point I think I'm going recommend against embedded,
>>> regardless of any performance advantage.  The level of
>>> documentation is just too low, while the XML API is clearly
>>> documented.  It's clear that XML is preferred.
>>> 
>>> The embedded example on the wiki is pretty good, but until
>>> mutliple core support comes out in the next version, you have
>>> to use multiple SolrCore.  If they are accessed in the same
>>> webapp, then you can't just set JNDI (since you can only have
>>> one value).  So you have to use a Config object as alluded to
>>> in the example.  However, you look at the code and there is
>>> no javadoc for the constructor.  The constructor args are
>>> (String name, InputStream is, String prefix).  I think name
>>> is a unique name for the solr core, but that is a guess.
>>> Inputstream may be a stream to the solr home, but it could be
>>> anything.  Prefix may be a URI prefix.  These are all guesses
>>> without trying to read through the code.
>>> 
>>> When I look at SolrCore, it looks like it's a singleton, so
>>> maybe I can't even access more than one SolrCore using
>>> embedded anyway.  :(  So I apologize for highlighting Embedded.
>>> 
>>> Anyway it's clear how to do multiple solr cores using XML.
>>> You just have different post URI for the difference cores.
>>> You can easily inject that with Spring and externalize the
>>> config.  Simple and easy.  So I concede XML is the way to go. :)
>>> 
>>> Paul Sundling
>>> 
>>> -Original Message-
>>> From: Mike Klaas [mailto:[EMAIL PROTECTED]
>>> Sent: Monday, August 27, 2007 5:50 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Embedded about 50% faster for indexing
>>> 
>>> 
>>> On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote:
>>> 
 Whether embedded solr should give me a performance boost or not, it
 did.
 :)  I'm not surprised, since it skips XML parsing.
>>> Although you never
 know where cycles are used for sure until you profile.
>>> 
>>> It certainly is possible that XML parsing dwarfs indexing, but I'd
>>> expect that only to occur under very light analysis and field
>>> storage
>>> workloads.
>>> 
 I tried doing more records per post (200) and it was
>>> actually slightly
>>> 
 slower and seemed to require more memory.  This makes sense because
 you
 have to take up more memory for the StringBuilder to store the much
 larger XML.  For 10,000 it was much slower.  For that size I would
 need
 to XML streaming or something to make it work.
 
 The solr war was on the same machine, so network overhead was only
 from
 using loopback.
>>> 
>>> The big question is still your connection handling strategy:
>>> are you
>>> using persistent http connections?  Are you threadedly indexing?
>>> 
>>> cheers,
>>> -Mike
>>> 
 Paul Sundling
 
 -Original Message-
 From: climbingrose [mailto:[EMAIL PROTECTED]
 Sent: Monday, August 27, 2007 12:22 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Embedded about 50% faster for indexing
 
 
 Haven't tried the embedded server but I think I have to agree with
 Mike.
 We're currently sending 2000 job batches to SOLR server and
>>> the amount
 of time required to transfer documents over h

Solr and KStem

2007-08-28 Thread Wagner,Harry
There is a version of the KStem stemmer
(http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi) that has been
adapted for Lucene.  What would be the simplest way to implement this in
Solr?  As a plug-in?  Has anyone already done this?

Thanks... harry


Re: Solr and KStem

2007-08-28 Thread Mike Klaas


On 28-Aug-07, at 1:08 PM, Wagner,Harry wrote:


There is a version of the KStem stemmer
(http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi) that has  
been
adapted for Lucene.  What would be the simplest way to implement  
this in

Solr?  As a plug-in?  Has anyone already done this?


You should be able to drop it in lib/ and use it via something like  
this in schema.xml:




If it is a tokenfilter (rather than an Analyzer), you can write a  
little wrapper Factory class (see examples in org/apache/solr/ 
analysis, then use it as follows:


positionIncrementGap="100">

  

  


best,
-Mike


multiple solr home directories

2007-08-28 Thread Yu-Hui Jin
Hi, there,

I have a few basic questions on setting up Solr home directories.

* can we set up multiple Solr home directories within the same Solr
instance?  (I want to use the same Tomcat Solr instance to support indexing
and searching over multiple independent indexes.)

* If so, say I have some customized Solr plugins, ie., jar files, do I have
to add them to each Solr home's lib directory? ( It feels it's a bit
redundant to add them multiple times for the same Solr instance. )


Thanks,

-Hui