XML output for Analysis admin functionality
Hi, I need to programmatically put search terms through the query analyser and retrieve the result. I thought the easiest way to do this would be to call the existing /solr/admin/analysis.jsp, but it would be so much nicer if there was a XML version of it. I noticed that there is an analysis.xsl file in src/webapp/resources/admin/ which seems to indicate that something was done in that respect but can't find any documentation on it. I have found this: http://issues.apache.org/jira/browse/SOLR-58 and this: http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200612.mbox/raw/%3C [EMAIL PROTECTED]/ What is the current status on having and XML interface to the admin (or analysis for me at least!). If it was done, how do I access it? Many thanks Stephanie
sort problem
hello solrs, i have an index with 30M records, weights ~50GB. latest trunk version. heap size 1024mb. queries work fine until I specify a field to sort results by. even if the result set consists of only 2 documents, the CPU jumps high and after about 5 minutes I get the following exception: Any idea? thanks Java heap space java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.SegmentTermEnum.termInfo(SegmentTermEnum.java:170) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:166) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:153) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:54) at org.apache.lucene.index.MultiTermDocs.termDocs(MultiReader.java:429) at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:380) at org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:383) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:350) at org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:266) at org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:182) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:155) at org.apache.lucene.search.FieldSortedHitQueue.(FieldSortedHitQueue.java:56) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:862) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:810) at org.apache.solr.search.SolrIndexSearcher.getDocList(SolrIndexSearcher.java:703) at org.apache.solr.handler.StandardRequestHandler.handleRequestBody(StandardRequestHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:78) at org.apache.solr.core.SolrCore.execute(SolrCore.java:723) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:193) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:161) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) - Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
Re: Embedded about 50% faster for indexing
No need to run a separate web server. I actually do HTTP updates from an extra servlet configured into the Solr webserver. It might seem a little odd, but same-system TCP sockets are extremely fast and low overhead. The additional flexibility is nice, too. If I find a bug in the indexing code in production, I can fix it locally and update from the fixed copy over HTTP while I wait for a push of code to production. Modern HTTP and TCP are very fast and very reliable, so don't count out the HTTP/XML interface before trying it. wunder == Search Guy Netflix On 8/27/07 9:18 PM, "climbingrose" <[EMAIL PROTECTED]> wrote: > Agree. I was actually thinking of developing the embedded version early this > year for one of my projects. I'm sure it will be needed in cases where > running another web server is an overkill. > > On 8/28/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: >> >> I don't think you should apologise for highlighting embedded usage. For >> circumstances in which you're at liberty to run a Solr instance in the >> same >> JVM as an app which uses it, I find it very strange that you should have >> to >> use anything _other_ than embedded, and jump through all the unnecessary >> hoops (XML conversion, HTTP transport) that this implies. It's a bit like >> suggesting you should throw away Java method invocations altogether, and >> write everything in XML-RPC. >> >> Bit of a pet issue of mine! I'll be creating a JIRA issue on the subject >> soon. >> >> Jon >> >>> -Original Message- >>> From: Sundling, Paul [mailto:[EMAIL PROTECTED] >>> Sent: 28 August 2007 03:24 >>> To: solr-user@lucene.apache.org >>> Subject: RE: Embedded about 50% faster for indexing >>> >>> At this point I think I'm going recommend against embedded, >>> regardless of any performance advantage. The level of >>> documentation is just too low, while the XML API is clearly >>> documented. It's clear that XML is preferred. >>> >>> The embedded example on the wiki is pretty good, but until >>> mutliple core support comes out in the next version, you have >>> to use multiple SolrCore. If they are accessed in the same >>> webapp, then you can't just set JNDI (since you can only have >>> one value). So you have to use a Config object as alluded to >>> in the example. However, you look at the code and there is >>> no javadoc for the constructor. The constructor args are >>> (String name, InputStream is, String prefix). I think name >>> is a unique name for the solr core, but that is a guess. >>> Inputstream may be a stream to the solr home, but it could be >>> anything. Prefix may be a URI prefix. These are all guesses >>> without trying to read through the code. >>> >>> When I look at SolrCore, it looks like it's a singleton, so >>> maybe I can't even access more than one SolrCore using >>> embedded anyway. :( So I apologize for highlighting Embedded. >>> >>> Anyway it's clear how to do multiple solr cores using XML. >>> You just have different post URI for the difference cores. >>> You can easily inject that with Spring and externalize the >>> config. Simple and easy. So I concede XML is the way to go. :) >>> >>> Paul Sundling >>> >>> -Original Message- >>> From: Mike Klaas [mailto:[EMAIL PROTECTED] >>> Sent: Monday, August 27, 2007 5:50 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Embedded about 50% faster for indexing >>> >>> >>> On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote: >>> Whether embedded solr should give me a performance boost or not, it did. :) I'm not surprised, since it skips XML parsing. >>> Although you never know where cycles are used for sure until you profile. >>> >>> It certainly is possible that XML parsing dwarfs indexing, but I'd >>> expect that only to occur under very light analysis and field >>> storage >>> workloads. >>> I tried doing more records per post (200) and it was >>> actually slightly >>> slower and seemed to require more memory. This makes sense because you have to take up more memory for the StringBuilder to store the much larger XML. For 10,000 it was much slower. For that size I would need to XML streaming or something to make it work. The solr war was on the same machine, so network overhead was only from using loopback. >>> >>> The big question is still your connection handling strategy: >>> are you >>> using persistent http connections? Are you threadedly indexing? >>> >>> cheers, >>> -Mike >>> Paul Sundling -Original Message- From: climbingrose [mailto:[EMAIL PROTECTED] Sent: Monday, August 27, 2007 12:22 AM To: solr-user@lucene.apache.org Subject: Re: Embedded about 50% faster for indexing Haven't tried the embedded server but I think I have to agree with Mike. We're currently sending 2000 job batches to SOLR server and >>> the amount of time required to transfer documents over h
Solr and KStem
There is a version of the KStem stemmer (http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi) that has been adapted for Lucene. What would be the simplest way to implement this in Solr? As a plug-in? Has anyone already done this? Thanks... harry
Re: Solr and KStem
On 28-Aug-07, at 1:08 PM, Wagner,Harry wrote: There is a version of the KStem stemmer (http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi) that has been adapted for Lucene. What would be the simplest way to implement this in Solr? As a plug-in? Has anyone already done this? You should be able to drop it in lib/ and use it via something like this in schema.xml: If it is a tokenfilter (rather than an Analyzer), you can write a little wrapper Factory class (see examples in org/apache/solr/ analysis, then use it as follows: positionIncrementGap="100"> best, -Mike
multiple solr home directories
Hi, there, I have a few basic questions on setting up Solr home directories. * can we set up multiple Solr home directories within the same Solr instance? (I want to use the same Tomcat Solr instance to support indexing and searching over multiple independent indexes.) * If so, say I have some customized Solr plugins, ie., jar files, do I have to add them to each Solr home's lib directory? ( It feels it's a bit redundant to add them multiple times for the same Solr instance. ) Thanks, -Hui