Re: SolrCloud Feedback
Hi Mark, hi all, I just got a customer request to conduct an analysis on the state of SolrCloud. He wants to see SolrCloud part of the next solr 1.5 release and is willing to sponsor our dev time to close outstanding bugs and open issues that may prevent the inclusion of SolrCloud in the next release. I need to give him a listing of issues and an estimation how long it will take us to fix them. I did https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+SOLR+AND+(summary+~+cloud+OR+description+~+cloud+OR+comment+~+cloud)+AND+resolution+%3D+Unresolved which returns me 8 bug. Do you consider this a comprehensive list of open issues or are there missing some important ones in this list? I read http://wiki.apache.org/solr/SolrCloud and it is talking about a branch of its own however when I review https://issues.apache.org/jira/browse/SOLR-1873 I get the impression that the work is already merged back into trunk, right? So what is the best to start testing the branch or trunk? TIA for any informations salu2 -- Thorsten Scherler codeBusters S.L. - web based systems http://www.codebusters.es/ -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Feedback-tp2290048p2467091.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud - Example C not working
Hi all, I followed http://wiki.apache.org/solr/SolrCloud and everything worked fine till I tried "Example C:". I start all 4 server but all of them keep looping through: "java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:16 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/127.0.0.1:9983 Feb 14, 2011 1:31:16 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:16 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9900 Feb 14, 2011 1:31:16 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:17 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983 Feb 14, 2011 1:31:17 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:19 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:8574 Feb 14, 2011 1:31:19 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) Feb 14, 2011 1:31:20 PM org.apache.log4j.Category info INFO: Opening socket connection to server localhost/127.0.0.1:8574 Feb 14, 2011 1:31:20 PM org.apache.log4j.Category warn WARNING: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn $SendThread.run(ClientCnxn.java:1078) The problem seems that the zk instances can not connects to the different nodes and so do not get up at all. I am using revision 1070473 for the tests. Anybody has an idea? salu2 -- Thorsten Scherler codeBusters S.L. - web based systems http://www.codebusters.es/ smime.p7s Description: S/MIME cryptographic signature
Re: SolrCloud - Example C not working
Hmm, nobody has an idea, for everybody the example c is working fine. salu2 On Mon, 2011-02-14 at 14:08 +0100, Thorsten Scherler wrote: > Hi all, > > I followed http://wiki.apache.org/solr/SolrCloud and everything worked > fine till I tried "Example C:". > > I start all 4 server but all of them keep looping through: > > "java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn > $SendThread.run(ClientCnxn.java:1078) > Feb 14, 2011 1:31:16 PM org.apache.log4j.Category info > INFO: Opening socket connection to server localhost/127.0.0.1:9983 > Feb 14, 2011 1:31:16 PM org.apache.log4j.Category warn > WARNING: Session 0x0 for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn > $SendThread.run(ClientCnxn.java:1078) > Feb 14, 2011 1:31:16 PM org.apache.log4j.Category info > INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9900 > Feb 14, 2011 1:31:16 PM org.apache.log4j.Category warn > WARNING: Session 0x0 for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn > $SendThread.run(ClientCnxn.java:1078) > Feb 14, 2011 1:31:17 PM org.apache.log4j.Category info > INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983 > Feb 14, 2011 1:31:17 PM org.apache.log4j.Category warn > WARNING: Session 0x0 for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn > $SendThread.run(ClientCnxn.java:1078) > Feb 14, 2011 1:31:19 PM org.apache.log4j.Category info > INFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:8574 > Feb 14, 2011 1:31:19 PM org.apache.log4j.Category warn > WARNING: Session 0x0 for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn > $SendThread.run(ClientCnxn.java:1078) > Feb 14, 2011 1:31:20 PM org.apache.log4j.Category info > INFO: Opening socket connection to server localhost/127.0.0.1:8574 > Feb 14, 2011 1:31:20 PM org.apache.log4j.Category warn > WARNING: Session 0x0 for server null, unexpected error, closing socket > connection and attempting reconnect > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.zookeeper.ClientCnxn > $SendThread.run(ClientCnxn.java:1078) > > The problem seems that the zk instances can not connects to the > different nodes and so do not get up at all. > > I am using revision 1070473 for the tests. Anybody has an idea? > > salu2 -- Thorsten Scherler codeBusters S.L. - web based systems http://www.codebusters.es/ smime.p7s Description: S/MIME cryptographic signature
[solrCloud] Distributed IDF - scoring in the cloud
Hi all, doing the solrCloud examples and one thing I am not clear about is the scoring in a distributed search. I did a small test where I used the "Example A: Simple two shard cluster" from wiki:SolrCloud and additional added java -Durl=http://localhost:7574/solr/collection1/update -jar post.jar ipod_other.xml java -Durl=http://localhost:8983/solr/collection1/update -jar post.jar monitor2.xml Now requesting http://localhost:8983/solr/collection1/select?distrib=true&q=electronics&fl=score&shards=localhost:8983/solr,localhost:7574/solr for both host will return the same result. Here we get the score for each hit based on the shard specific score and merge them into one result doc. However when I add monitor2.xml as well to 7574 which previously did not contained this, the scoring changes depending on the server I request. The score returned for 8983 is always 0.09289607 being distrib=true|false The score returned for 7574 is always 0.121383816 being distrib=true|false So is it correct to assume that if a document is indexed in both shards the score which will predominate is the one from the host which has been requested? My client plan to distribute the current index into different shards. For example each "Consejería" (counseling) should be hosted in a shard. The critical point for the client is that the scoring is the same as in the big unique index they use right now for a distributed search. As I understand the current solrCloud implementation there is no concern about harmonizing the score. In my research I came across http://markmail.org/message/bhhfwymz5y7lvoj7 "The "IDF" part of the relevancy score is the only place that distributed search scoring won't "match up" with no distributed scoring because the document frequency used for the term is local to every core instead of global. If you distribute your documents fairly randomly to the different shards, this won't matter. There is a patch in the works to add global idf, but I think that even when it's committed, it will default to off because of the higher cost associated with it." the patch is https://issues.apache.org/jira/browse/SOLR-1632 However last comment is from 26/Jul/10 reporting the patch failed and a comment from Yonik give the impression that is not ready to use: "It looks like the issue is this: rewrite() doesn't work for function queries (there is no propagation mechanism to go through value sources). This is a problem when real queries are embedded in function queries." Is there a general interest to bring 1632 to the trunk (especially for solrCloud)? Or may it be better to look into something that aims to scale the index into hbase so he does not lose the scoring. TIA for your feedback -- Thorsten Scherler codeBusters S.L. - web based systems http://www.codebusters.es/ smime.p7s Description: S/MIME cryptographic signature
big index vs. lots of small ones
Hi all, I have to do an analyses about following usecase. I am working as consultant in a public company. We are talking about to offer in the future each public institution its own search server (probably) based on Apache Solr. However the user of our portal should be able to search all indexes. The problematic part for our customer is that a meta search on various indexes which then later merges the response will change the scoring. Imagine you have the two indexes - public health department (A) - press relations department (B) Now you have 300 documents in A and only one in B about "influenza A". The B server will return the only document in its index with a very high score, since being the only one it gets a very high "base" score, correct? On the other hand A may have much more important documents but they will not get the same "base" score. Meaning on a merge most likely the document from Server B will be top of the list. To prevent this phenomenon we are looking into merging all the standalone indexes in on big index but that will lead us in other problems because it will become pretty big pretty fast. So here my questions: - What are other people doing to solve this problem? - What is the best way with Solr to solve the problem of the "base" scoring? - What is the best way to have multiple indexes in solr? - Is it possible to get rid of the "base" scoring in solr? TIA for any informations. salu2 -- Thorsten Scherler Open Source Java Sociedad Andaluza para el Desarrollo de la Sociedad de la Información, S.A.U. (SADESI)
Re: big index vs. lots of small ones
On Wed, 2010-01-20 at 08:38 -0800, Marc Sturlese wrote: > Check out this patch witch solve the distributed IDF's problem: > https://issues.apache.org/jira/browse/SOLR-1632 > I think it fixes what you are explaining. The price you pay is that there > are 2 requests per shard. If I am not worng the first is to get term > frequencies and needed info and the second one is the proper search request. > The patch also includes caching for terms in the first request. > Nice! Thank you very much, Mark. Como van las cosas en Barcelona? salu2 > > Thorsten Scherler-3 wrote: > > > > Hi all, > > > > I have to do an analyses about following usecase. > > > > I am working as consultant in a public company. We are talking about to > > offer in the future each public institution its own search server > > (probably) based on Apache Solr. However the user of our portal should > > be able to search all indexes. > > > > The problematic part for our customer is that a meta search on various > > indexes which then later merges the response will change the scoring. > > > > Imagine you have the two indexes > > - public health department (A) > > - press relations department (B) > > > > Now you have 300 documents in A and only one in B about "influenza A". > > The B server will return the only document in its index with a very high > > score, since being the only one it gets a very high "base" score, > > correct? > > > > On the other hand A may have much more important documents but they will > > not get the same "base" score. > > > > Meaning on a merge most likely the document from Server B will be top of > > the list. > > > > To prevent this phenomenon we are looking into merging all the > > standalone indexes in on big index but that will lead us in other > > problems because it will become pretty big pretty fast. > > > > So here my questions: > > > > - What are other people doing to solve this problem? > > - What is the best way with Solr to solve the problem of the "base" > > scoring? > > - What is the best way to have multiple indexes in solr? > > - Is it possible to get rid of the "base" scoring in solr? > > > > TIA for any informations. > > > > salu2 > > -- > > Thorsten Scherler > > Open Source Java > > > > Sociedad Andaluza para el Desarrollo de la Sociedad > > de la Información, S.A.U. (SADESI) > > > > > > > > > > > > > -- Thorsten Scherler Open Source Java Sociedad Andaluza para el Desarrollo de la Sociedad de la Información, S.A.U. (SADESI)
Re: The mechanism of data replciation in Solr?
On Wed, 2007-09-05 at 15:56 +0800, Dong Wang wrote: > Hello, everybody:-) > I'm interested with the mechanism of data replciation in Solr, In the > "Introduction to the solr enterprise Search Server", Replication is > one of features of Solr, but I can't find anything about replication > issues on the Web site and documents, including how to split the > index, how to distribute the chunks of index, how to placement the > replica, eager replicaton or lazy replication..etc. I think they are > different from the problem in HDFS. > Can anybody help me? Thank you in advance. http://wiki.apache.org/solr/CollectionDistribution HTH > > Best Wishes. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Indexing very large files.
On Thu, 2007-09-06 at 08:55 +0200, Brian Carmalt wrote: > Hello again, > > I run Solr on Tomcat under windows and use the tomcat monitor to start > the service. I have set the minimum heap > size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of > ram. The error that I get after sending > approximately 300 MB is: > > java.lang.OutOfMemoryError: Java heap space > at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2947) > at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026) > at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384) > at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) > at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) > at > org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332) > at > org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162) > at > org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:261) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:581) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > > After sleeping on the problem I see that it does not directly stem from > Solr, but from the > module org.xmlpull.mxp1.MXParser. Hmmm. I'm open to sugestions and ideas. Which version do you use of solr? http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java?view=markup The trunk version of the XmlUpdateRequestHandler is now based on StAX. You may want to try whether that is working better. Please try and report back. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Tagging using SOLR
On Thu, 2007-09-06 at 12:59 +0530, Doss wrote: > Dear all, > > We are running an appalication built using SOLR, now we are trying to build > a tagging system using the existing SOLR indexed field called > "tag_keywords", this field has different keywords seperated by comma, please > give suggestions on how can we build tagging system using this field? http://wiki.apache.org/solr/ConfiguringSolr http://wiki.apache.org/solr/SchemaXml Define a new field named keyword and use the "text_ws" as type. Instead of comma use whitespaces instead. ... ... HTH salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Indexing very large files.
On Thu, 2007-09-06 at 11:26 +0200, Brian Carmalt wrote: > Hallo again, > > I checked out the solr source and built the 1.3-dev version and then I > tried to index the same file to the new server. > I do get a different exception trace, but the result is the same. > > java.lang.OutOfMemoryError: Java heap space > at java.util.Arrays.copyOf(Arrays.java:2882) > at > java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) It seems that you are reaching the limits because of the StringBuilder. Did you try to raise the mem to the max like: java -Xms1536m -Xmx1788m -jar start.jar Anyway you will have to look into SolrInputDocument readDoc(XMLStreamReader parser) throws XMLStreamException { ... StringBuilder text = new StringBuilder(); ... case XMLStreamConstants.CHARACTERS: text.append( parser.getText() ); break; ... The problem is that the "text" object is bigger then heaps, maybe invoking garbage collection before will help. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
RSS syndication Plugin
Hi all, I am curious whether somebody has written a rss plugin for solr. The idea is to provide a rss syndication link for the current search. It should be really easy to implement since it would be just a transformation solrXml -> RSS which easily can be done with a simple xsl. Has somebody already done this? salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: RSS syndication Plugin
On Thu, 2007-09-06 at 09:07 -0400, Ryan McKinley wrote: > perhaps: > https://issues.apache.org/jira/browse/SOLR-208 > > in http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/xslt/ > > check: > example_atom.xsl > example_rss.xsl Awesome. Thanks very much Ryan to point me into the right direction and Brian Whitman for his contribution. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Strange behavior when searching with accents
On Thu, 2007-09-20 at 10:11 +0200, Thierry Collogne wrote: > Hello, > > We are experiencing some strange behavior while searching with words > containing accents. > We are using two examples "rené" and "matthé" > > When we search for "rené" or for "rene", we get the same results, so that is > ok. > But when we search for "matthé" or for "matthe", we get two totally > different results. > > Can someone tell me why this happens? We would like the results to be the > same. That highly depends on your schema. Do you use ? I am using the following an it works like a charm HTH salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Strange behavior when searching with accents
On Thu, 2007-09-20 at 13:33 +0200, Thierry Collogne wrote: > We are using this schema definition > Thierry, try to move the solr.ISOLatin1AccentFilterFactory up the filter cue, like: ... ... for both indexing and query. This way you make sure that all accent are gone before you do further filtering. You may need to reindex all documents to make sure we are not going to use the old index. HTH salu2 > > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > > > > > > > ignoreCase="true" expand="true"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > > > > > > I will take a look at the analyzer took. > > Thank you both for the quick response. > > On 20/09/2007, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote: > > > > On 9/20/07, Thierry Collogne <[EMAIL PROTECTED]> wrote: > > > > > ..when we search for "matthé" or for "matthe", we get two totally > > > different results > > > > The analyzer admin tool should help you find out what's happening, see > > > > http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9 > > > > -Bertrand > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Strange behavior when searching with accents
On Thu, 2007-09-20 at 14:01 +0200, Thierry Collogne wrote: > I have entered the the matthé term in the the analyzer, but as far as I > understand, it should be ok. I have made some screenshots with the results. > > http://farm2.static.flickr.com/1407/1412619772_0b697789cd_o.jpg > > http://farm2.static.flickr.com/1245/1412619774_3351b287bc_o.jpg > > I find it strange that the second screenshost doesn"t give any matches. > > Can someone take a look at them and perhaps clarify why it does not work? See my other response, but the 2nd screenshoot has changed the the "query" field using the non accent way. Further you want to use the "verbose output" option to better analyze. salu2 > > Thank you. > > > On 20/09/2007, Thierry Collogne < [EMAIL PROTECTED]> wrote: > > > > We are using this schema definition > > > > > > > > > > > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > > catenateNumbers="1" catenateAll="0"/> > > > > > > > > > > > > > > > > > ignoreCase="true" expand="true"/> > > > > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > > catenateNumbers="0" catenateAll="0"/> > > > > > > > > > > > > > > > > I will take a look at the analyzer took. > > > > Thank you both for the quick response. > > > > On 20/09/2007, Bertrand Delacretaz < [EMAIL PROTECTED] > wrote: > > > > > > On 9/20/07, Thierry Collogne < [EMAIL PROTECTED]> wrote: > > > > > > > ..when we search for "matthé" or for "matthe", we get two totally > > > > different results > > > > > > The analyzer admin tool should help you find out what's happening, see > > > http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9 > > > > > > > > > -Bertrand > > > > > > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Strange behavior when searching with accents
On Thu, 2007-09-20 at 15:27 +0200, Bertrand Delacretaz wrote: > On 9/20/07, Thierry Collogne <[EMAIL PROTECTED]> wrote: > > > ...Thank you very much. Moving the up in the chain fixed it > > Yes, the problem was the EnglishPorterFilterFactory before the accents > removal: the stemmer doesn't know about accents, so no stemming > occured on "matthé" whereas "matthe" was stemmed to "matth". > > BTW, your "rené" example makes me think you're indexing french, if > that's the case you might want to use a stemmer configured for that > language, for example > >class="Solr.SnowballPorterFilterFactory" > language="French"/> Betrand, does the French Snowball work fine? A colleague of mine exchanged mails with Porter about the Spanish filter and he came to the conclusion that it is not really working well for Spanish: "So -orio on the whole changes meaning too much (acceso = access, accessorio = accessory differ as much in Spanish as English; -atorio similarly (aclarar to rinse, clear (in a very general sense), brighten up; aclaratorio = explanatory). Diminutives, augmentatives usually fall under (a) and (c). -illo, -ote, -isimo are in this category. -al and -iz look like plausible candidates for ending removal, but, unlike their English counterparts, removing them makes little difference or improvement. Similarly with -ion removal after -s. There is a difficulty with pure vowel endings, and the stemmer can't always get this right. So in English 'academic' is stemmed to 'academ' but 'academy' does not lose the final -y (or -i). This explains the residual vowels with -io, -ia endings etc." salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
RE: Strange behavior when searching with accents
On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote: > English and French are messy, so heuristic methods are the only possible. > Spanish is rigorously clean, and stemming should be done from the declension > rules and irregular conjugation tables. This involves large (fast) tables in > ram rather than small (slow) string-shuffling. > Interesting do you a link for some documentation how to implement this? salu2 > Lance Norskog > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of > Bertrand Delacretaz > Sent: Thursday, September 20, 2007 8:11 AM > To: solr-user@lucene.apache.org > Subject: Re: Strange behavior when searching with accents > > On 9/20/07, Thorsten Scherler <[EMAIL PROTECTED]> > wrote: > > ...Betrand, does the French Snowball work fine?... > > I've seen some weirdnesses, like "tennis" and "tenir" (means to hold) both > stemmed to "ten", but in all of our (simple) tests it was ok. > > The application where we're using it does not require high precision though, > so it looked good enough and we didn't do create very extensive tests for > it. > > -Bertrand > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Scripts not working on cron - always asking for password
>> > >>>>> Hi, there, > >>>>> > >>>>> I used an absolute path for the "dir" param in the solrconfig.xml as > >>>>> below: > >>>>> > >>>>> > >>>>> snapshooter > >>>>> /var/SolrHome/solr/bin > >>>>> true > >>>>>arg1 arg2 > >>>>>MYVAR=val1 > >>>>> > >>>>> > >>>>> However, I got "snapshooter: not found" exception thrown in > >>>> catalina.out. > >>>>> I don't see why this doesn't work. Anything I'm missing? > >>>>> > >>>>> > >>>>> Many thanks, > >>>>> > >>>>> -Hui > >>>>> > >>>> > >>> > >>> > >>> > >>> -- > >>> Regards, > >>> > >>> -Hui > >>> > >> > > > > > > > http://www.bbc.co.uk/ > This e-mail (and any attachments) is confidential and may contain personal > views which are not the views of the BBC unless specifically stated. > If you have received it in error, please delete it from your system. > Do not use, copy or disclose the information in any way nor act in reliance > on it and notify the sender immediately. > Please note that the BBC monitors e-mails sent or received. > Further communication will signify your consent to this. > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: How to get all the search results - python
On Mon, 2007-09-24 at 14:34 +0530, Roopesh P Raj wrote: > Hi, > > I am using solr setup in Tomcat 5.5 with python 2.4 using python client > solr.py. > > When I search, all the results are not returned. > > The method call for searching is as follows : rows specifies the number of > rows. > data = c.search(q='query', fl='id score unique_id Message-ID To From > Subject',rows=50, wt='python') > > I want to specify that I want all the rows. How can I do that ? Hi Roopesh, I am not sure whether I understand your problem. Is it the limitation of rows/pagination? If so why not using a real high number (like rows=100)? salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: How to get all the search results - python
On Mon, 2007-09-24 at 16:29 +0530, Roopesh P Raj wrote: > > Hi Roopesh, > > > I am not sure whether I understand your problem. > > > Is it the limitation of rows/pagination? > > If so why not using a real high number (like rows=100)? > > > salu2 > > Hi, > > Assigning a high number will solve my problem. (I thought that there will > something like rows='all' to do it). > > Can I do pagination using the python client? I am not a python expert but I think so. > How can I specify the starting position, offset etc for > pagination through the python client? http://wiki.apache.org/solr/CommonQueryParameters It should work as described in the above document (with the start parameter. e.g. data = c.search(q='query', fl='id score unique_id Message-ID To From Subject',rows=50, wt='python',start=50) HTH -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: How to get all the search results - python
On Tue, 2007-09-25 at 10:03 +0530, Roopesh P Raj wrote: DISCLAIMER: Please, I am subscribed to the user list and there is no need to write me directly nor cc me in your response. More since we are an open source project off-list communication is suboptimal and harmful to the community. The community has many eyes which can see possible problems with some solution and propose better ones. Further the mailing list has an archive and proofed solution can be searched. If we all share off-list mailings no solutions go into the archive and we always have to repeat the same mails. PLEASE write to the ml! > > http://wiki.apache.org/solr/CommonQueryParameters > > > It should work as described in the above document (with the start > > parameter. > > > e.g. > > data = c.search(q='query', fl='id score unique_id Message-ID To From > > Subject',rows=50, wt='python',start=50) > > > HTH > > -- > > Hi, > > I my application there is a provision to copy the archive based on date > indexed. > In this case the number of search results may exceed the high number I have > assigned to rows, say rows=1000. I wanted to avoid this situation. In > this > situation I don't want paginated queries. > > Can you please tell me how to approach this particular situation. I think the best way is to 1) get the first response document (rows=50,start=0) 2) parse the response to see how many results you have 3) do a loop (rows=50,start=50*x) and call solr till you have all results. Like Jérôme stated: On Mon, 2007-09-24 at 12:45 +0100, Jérôme Etévé wrote: > By design, it's not very efficient to ask for a large number of > results with solr/lucene. I think you will face performance and memory > problems if you do that. HTH salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Problem with html code inside xml
On Tue, 2007-09-25 at 12:06 +0100, Jérôme Etévé wrote: > If I understand, you want to keep the raw html code in solr like that > (in your posting xml file): > > > > > > I think you should encode your content to protect these xml entities: > < -> < > > -> > > " -> " > & -> & > > If you use perl, have a look at HTML::Entities. AFAIR you cannot use tags, they always are getting transformed to entities. The solution is to have a xsl transformation after the response that transforms the entities back to tags. Have a look at the thread http://marc.info/?t=11677583791&r=1&w=2 and especially at http://marc.info/?l=solr-user&m=116782664828926&w=2 HTH salu2 > > > On 9/25/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > Hello, > > > > I've got some problem with html code who is embedded in xml file: > > > > Sample source . > > > > > > > > > > Les débats > > > > > > Le premier tour des élections fédérales se > > déroulera le 21 > > octobre prochain. D'ici là, La 1ère vous propose plusieurs rendez- > > vous, dont plusieurs grands débats à l'enseigne de Forums. > > > > > > > > > > my para textehere > > > > > > Vous trouverez sur cette page toutes les > > dates et les heures de > > ces différents rendez-vous ainsi que le nom et les partis des > > débatteurs. De plus, vous pourrez également écouter ou réécouter > > l'ensemble de ces émissions. > > > > > > > > - > > When a make a query on solr I've got something like that in the > > source code of the xml result: > > > > http://www.w3.org/1999/xhtml";> > > < > > div > > class > > = > > "paragraph" > > > > > < > > div > > class > > = > > "paragraphTitle" > > /> > > − > > < > > ... > > > > It is not exactly what I want. I want to keep the html tags, that all > > without formatting. > > > > So the br tags and a tags are well formed in xml and json result, but > > the div tags are not kept. > > - > > In the schema.xml I've got this for the html content > > > > > > > >> stored="true" multiValued="true"/> > > > > - > > > > Any help would be appreciate. > > > > Thanks in advance. > > > > S. Christin > > > > > > > > > > > > > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Converting German special characters / umlaute
On Thu, 2007-09-27 at 13:26 -0400, J.J. Larrea wrote: > At 12:13 PM -0400 9/27/07, Steven Rowe wrote: > >Chris Hostetter wrote: ... > As for implementation, the first part could easily and flexibly accomplished > with the current PatternReplaceFilter, and I'm thinking the second could be > done with an extension to that or better yet a new Filter which allows > parsing synonymous tokens from a flat to overlaid format, e.g. something on > the order of: > > pattern="(.*)(ü|ue)(.*)" > replacement="$1ue$3|$1u$3" > tokensep="|" > replace="first"/> > > or perhaps better, > > pattern="(.*)(ü|ue)(.*)" > replacement="$1ue$3|$1u$3" > replace="first"/> > tokensep="|"/> > > which in my fantasy implementation would map: > > Müller -> Mueller|Muller > Mueller -> Mueller|Muller > Muller -> Muller > > and could be run at index-time and/or query-time as appropriate. > > >Does anyone know if there are other (Latin-1-utilizing) languages > >besides German with standardized diacritic substitutions that involve > >something other than just stripping the diacritics? > > I'm curious about this too. > I am German, but working in Spain so I have not faced the problem so far. Anyhow, IMO Müller -> Mueller Mueller -> Mueller is right to further shorten the word does not seems right since one is changing the meaning too much. Further: groß -> gross gross -> gross ß is pronounced 'sz' but only replaced by 'ss'. salu2 > - J.J. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Search results problem
On Wed, 2007-10-17 at 20:44 +1000, Pieter Berkel wrote: > There is a configuration option called "" in > solrconfig.xmlwith the default value of 10,000. You may need to > increase this value if > you are indexing fields that are longer. > Is there a way to define a unlimited value? Like -1? TIA salu2 > > > On 17/10/2007, Maximilian Hütter <[EMAIL PROTECTED]> wrote: > > > > Daniel Naber schrieb: > > > On Tuesday 16 October 2007 12:03, Maximilian Hütter wrote: > > > > > >> the content of one document is completely contained in another, > > >> but search for a special word I only get one document as result. > > >> I am absolutely sure it is contained in the other document, but I will > > >> only get the "parent" doc if I add a word. > > > > > > You should try debugging the problem with Luke, e.g. use "reconstruct & > > > edit" to see if the term is really indexed in both documents. > > > > > > Regards > > > Daniel > > > > > > > Thank you for the tip, after using luke I can see that the term is > > really missing in the other document. > > Is there a size restriction for field content in Solr/Lucene? Because > > from the "fulltext" field I use as default field (after luke > > reconstruction) seem to be missing a lot strings I expected to find there. > > > > Best regards, > > > > Max > > > > -- > > Maximilian Hütter > > blue elephant systems GmbH > > Wollgrasweg 49 > > D-70599 Stuttgart > > > > Tel: (+49) 0711 - 45 10 17 578 > > Fax: (+49) 0711 - 45 10 17 573 > > e-mail : [EMAIL PROTECTED] > > Sitz : Stuttgart, Amtsgericht Stuttgart, HRB 24106 > > Geschäftsführer: Joachim Hörnle, Thomas Gentsch, Holger Dietrich > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Tagging in solr
On Fri, 2007-10-19 at 11:01 +0100, Spas Poptchev wrote: > Hi, > > what i want to do is to store tags that belong to products. Each tag should > also store information about how often it was used with a certain product. > So for example: > > product1 > cool 5=> product1 was tagged 5 times with cool > > What would be the best way to implement this kind of stuff in solr? There is a wiki page on some brainstorming on how to implement tagging within Solr: <http://wiki.apache.org/solr/UserTagDesign> It's easy enough to have a tag_keywords field, but updating a single tag_keywords field is not so straightforward without sending the entire document to Solr every time it is tagged. See SOLR-139's extensive comments and patches to see what you're getting into. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: escaping characters and security
On Tue, 2007-11-06 at 11:52 -0500, Micah Wedemeyer wrote: > Are there any security risks to passing a query directly to Solr without > doing any sort of escaping? I am using URL encoding, so '&' and such > are being encoded into their %XX equivalents. > > Still, should I be doing anything else? Is there such a thing as a > Solr-injection attack? http://wiki.apache.org/solr/mySolr "Typically it's not recommended do have your front end users/clients hitting Solr directly as part of an HTML form submit ... the more conventional way to think of it is that Solr is a backend service, which your application can talk to over HTTP -- if you were dealing with a database, you wouldn't expect that you could generate an HTML form for your clients and then have them submit that form in some way that resulted in their browser using JDBC (or ODBC) to communicate directly with your database, their client would communicate with your App, which would validate their input, impose some security checks on the input, and then execute the underlying query to your database -- working with Solr should be very similar, it just so happens that instead of using JDBC or some other binary protocol, Solr uses HTTP, and you *can* talk to it directly from a web browser, but that's really more of a debugging feature then anything else." HTH salu2 > > Thanks, > Micah -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Help with Debian solr/jetty install?
On Tue, 2007-11-20 at 22:50 -0800, Otis Gospodnetic wrote: > Phillip, > > I won't go into details, but I'll point out that the Java compiler is called > javac and if memory serves me well, it is defined in one of Jetty's XML > config files in its etc/ dir. The java compiler is used to compile JSPs that > Solr uses for the admin UI. So, make sure you have javac and make sure Jetty > can find it. > e.g. cd ~ vim .bashrc ... export JAVA_HOME=/home/thorsten/opt/java export PATH=$JAVA_HOME/bin:$PATH The important thing is that $JAVA_HOME points to the JDK and it is first in your path! salu2 > Otis > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > - Original Message > From: Phillip Farber <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, November 20, 2007 5:55:27 PM > Subject: Help with Debian solr/jetty install? > > > Hi, > > I've successfully run as far as the example admin page on Debian linux > 2.6. > > So I installed the solr-jetty packaged for Debian testing which gives > me > Jetty 5.1.14-1 and Solr 1.2.0+ds1-1. Jetty starts fine and so does the > > Solr home page at http://localhost:8280/solr > > But I get an error when I try to run http://localhost:8280/solr/admin > > HTTP ERROR: 500 > No Java compiler available > > I have sun-java6-jre and sun-java6-jdk packages installed. I'm new to > servlet containers and java webapps. What should I be looking for to > fix this or what information could I provide the list to get me moving > forward from here? > > I've included the trace from the Jetty log, and the java properties > dump > from the example below. > > Thanks, > Phil > > --- > > Java properties (from the example): > -- > > sun.boot.library.path = /usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/i386 > java.vm.version = 1.6.0-b105 > java.vm.name = Java HotSpot(TM) Client VM > user.dir = /tmp/apache-solr-1.2.0/example > java.runtime.version = 1.6.0-b105 > os.arch = i386 > java.io.tmpdir = /tmp > > java.library.path = > /usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/i386/client:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/i386:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib > java.class.version = 50.0 > jetty.home = /tmp/apache-solr-1.2.0/example > sun.management.compiler = HotSpot Client Compiler > os.version = 2.6.22-2-686 > java.class.path = > /tmp/apache-solr-1.2.0/example:/tmp/apache-solr-1.2.0/example/lib/jetty-6.1.3.jar:/tmp/apache-solr-1.2.0/example/lib/jetty-util-6.1.3.jar:/tmp/apache-solr-1.2.0/example/lib/servlet-api-2.5-6.1.3.jar:/tmp/apache-solr-1.2.0/example/lib/jsp-2.1/ant-1.6.5.jar:/tmp/apache-solr-1.2.0/example/lib/jsp-2.1/core-3.1.1.jar:/tmp/apache-solr-1.2.0/example/lib/jsp-2.1/jsp-2.1.jar:/tmp/apache-solr-1.2.0/example/lib/jsp-2.1/jsp-api-2.1.jar:/usr/share/ant/lib/ant.jar > java.home = /usr/lib/jvm/java-6-sun-1.6.0.00/jre > java.version = 1.6.0 > java.ext.dirs = > /usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/ext:/usr/java/packages/lib/ext > sun.boot.class.path = > /usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/resources.jar:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/rt.jar:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/jsse.jar:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/jce.jar:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/lib/charsets.jar:/usr/lib/jvm/java-6-sun-1.6.0.00/jre/classes > > > > > Jetty log (from the error under Debian Solr/Jetty): > > > org.apache.jasper.JasperException: No Java compiler available > at > org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:460) > at > org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:367) > at > org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329) > at > org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:428) > at > org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:473) > at > org.mortbay.jetty.servlet.Dispatcher.dispatch(Dispatcher.java:286) > at > org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:171) > at org.mortbay.jetty.servlet.Default.handleGet(Default.java:302) > at org.mortbay.jetty.servlet.Default.service(Default.java:223) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) > at > org.mo
Get last updated/committed document
Hi all, I need to ask solr to return me the id of the last committed document. Is there a way to archive this via a standard lucene query or do I need a custom connector that gives me this information? TIA for any information salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Get last updated/committed document
On Sat, 2007-11-24 at 00:17 +1100, climbingrose wrote: > Assuming that you have the timestamp field defined: > q=*:*&sort=timestamp desc > Thanks. salu2 > On Nov 23, 2007 10:43 PM, Thorsten Scherler > <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I need to ask solr to return me the id of the last committed document. > > > > Is there a way to archive this via a standard lucene query or do I need > > a custom connector that gives me this information? > > > > TIA for any information > > > > salu2 > > -- > > Thorsten Scherler thorsten.at.apache.org > > Open Source Java consulting, training and solutions > > > > > > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: solr to work for my web application
On Wed, 2008-02-13 at 00:06 -0800, newBea wrote: > hi > > I am new to solr/lucene...I have installed solr nightly version..its working > very fine. > > But it is working for the exampledocs present in the example folder of the > nightly version of solr. I need solr to work for my current web > application...I am using tomcat5.5.23 for the application(Windows)...using > jetty to start solr from outside of the webapps folder. > > Is there any way to start the jetty using tomcat? > > Help would be appreciated... some links that you may get started: http://wiki.apache.org/solr http://wiki.apache.org/solr/mySolr http://wiki.apache.org/solr/SolrTomcat salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: solr to work for my web application
On Wed, 2008-02-13 at 03:42 -0800, newBea wrote: > Hi Thorsten, > > I have my application running on 8080 port with tomcat 5.5.23I am > starting solr on port 8983 with jetty server using command "java -jar > start.jar". > > Both the server gets started...now any search I make on tomcat application > is interacting with solr very well. The problem is "schema.xml" and > "solrconfig.xml" in the conf directory are default one. But after adding > customized schema name parameter and required fields, solr is not working as > required. Can you post the modification you made to both files? > > Customized code for parsing the xml generated from solr is working > fine...but it is unable to find the uniquekey field which we set for all the > documents in the schema documentand thus result is 0 means nothing. > Hmm, what is your update command and your unique key? We would need to see this modification to tell you what may be wrong. Did you try http://YOUR_HOST:8983/solr/admin/luke?wt=xslt&tr=luke.xsl What does this gives? salu2 > I am not able to find the solution for this one... any suggestions wud be > appreciated...thanks in advance. > > Thorsten Scherler-3 wrote: > > > > On Wed, 2008-02-13 at 00:06 -0800, newBea wrote: > >> hi > >> > >> I am new to solr/lucene...I have installed solr nightly version..its > >> working > >> very fine. > >> > >> But it is working for the exampledocs present in the example folder of > >> the > >> nightly version of solr. I need solr to work for my current web > >> application...I am using tomcat5.5.23 for the > >> application(Windows)...using > >> jetty to start solr from outside of the webapps folder. > >> > >> Is there any way to start the jetty using tomcat? > >> > >> Help would be appreciated... > > > > some links that you may get started: > > http://wiki.apache.org/solr > > http://wiki.apache.org/solr/mySolr > > http://wiki.apache.org/solr/SolrTomcat > > > > salu2 > > -- > > Thorsten Scherler thorsten.at.apache.org > > Open Source Java consulting, training and solutions > > > > > > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: solr to work for my web application
On Wed, 2008-02-13 at 05:04 -0800, newBea wrote: > I havnt used luke.xsl. Ya but the link provided by u gives me "Solr Luke > Request Handler Response"... > > is simple string as: csid So you have: csid and > > till now I am updating docs thru command prompt as : post.jar *.xml > http://localhost:8983/update how do the docs look like? I mean since you changed the sample config you send changed documents as well, right? How do they look? > > I am not clear on how do I post xml docs Well like you said, with the post.jar and then you will send your modified docs but there are many ways to trigger an add command to solr. > or wud xml docs be posted while I > request solr thru tomcat at the time of searching text... To search text from tomcat you will need to have a servlet or something similar that contacts the solr server for the search result and the handle the response (e.g. apply custom xsl to the results). > > This manually procedure when I update the xml docs on exampledocs folder > inside distribution package restrict it to exampledocs itself No, either copy the jar to the folder where you have your documents or add it to the PATH. > ...I am not > getting a way where my sites text get searched by solr...Do I need to copy > start.jar and relevant folders in my working directory for web application. Hmm, it seems that you not have understood the second paragraph of http://wiki.apache.org/solr/mySolr "Typically it's not recommended to have your front end users/clients hitting Solr directly as part of an HTML form submit ... the more conventional way to think of it is that Solr is a backend service, which your application can talk to over HTTP ..." Meaning you have two different server running. Alternatively you can run solr in the same tomcat as you application. If you follow SolrTomcat from the wiki it will be install as "solr" servlet. Your application will then communicate with this serlvet. salu2 > > any help? > > Thorsten Scherler-3 wrote: > > > > On Wed, 2008-02-13 at 03:42 -0800, newBea wrote: > >> Hi Thorsten, > >> > >> I have my application running on 8080 port with tomcat 5.5.23I am > >> starting solr on port 8983 with jetty server using command "java -jar > >> start.jar". > >> > >> Both the server gets started...now any search I make on tomcat > >> application > >> is interacting with solr very well. The problem is "schema.xml" and > >> "solrconfig.xml" in the conf directory are default one. But after adding > >> customized schema name parameter and required fields, solr is not working > >> as > >> required. > > > > Can you post the modification you made to both files? > > > >> > >> Customized code for parsing the xml generated from solr is working > >> fine...but it is unable to find the uniquekey field which we set for all > >> the > >> documents in the schema documentand thus result is 0 means nothing. > >> > > > > Hmm, what is your update command and your unique key? > > > > We would need to see this modification to tell you what may be wrong. > > > > Did you try http://YOUR_HOST:8983/solr/admin/luke?wt=xslt&tr=luke.xsl > > > > What does this gives? > > > > salu2 > > > >> I am not able to find the solution for this one... any suggestions wud be > >> appreciated...thanks in advance. > >> > >> Thorsten Scherler-3 wrote: > >> > > >> > On Wed, 2008-02-13 at 00:06 -0800, newBea wrote: > >> >> hi > >> >> > >> >> I am new to solr/lucene...I have installed solr nightly version..its > >> >> working > >> >> very fine. > >> >> > >> >> But it is working for the exampledocs present in the example folder of > >> >> the > >> >> nightly version of solr. I need solr to work for my current web > >> >> application...I am using tomcat5.5.23 for the > >> >> application(Windows)...using > >> >> jetty to start solr from outside of the webapps folder. > >> >> > >> >> Is there any way to start the jetty using tomcat? > >> >> > >> >> Help would be appreciated... > >> > > >> > some links that you may get started: > >> > http://wiki.apache.org/solr > >> > http://wiki.apache.org/solr/mySolr > >> > http://wiki.apache.org/solr/SolrTomcat > >> > > >> > salu2 > >> > -- > >> > Thorsten Scherler > >> thorsten.at.apache.org > >> > Open Source Java consulting, training and > >> solutions > >> > > >> > > >> > > >> > > -- > > Thorsten Scherler thorsten.at.apache.org > > Open Source Java consulting, training and solutions > > > > > > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: solr to work for my web application
On Thu, 2008-02-14 at 23:16 -0800, newBea wrote: > Hi Thorsten... > > SOrry for giving u much trouble but I need some answer regarding solr...plz > help... > > Question1 > I am using tomcat 5.5.23 so for JNDI setup of solr, adding solr.xml with > context fragment as below in the tomcat5.5/...catalina/localhost. > > > value="D:/Projects/csdb/solr" override="true" /> > > > Is it the correct way of doing it? Yes as I understand the wiki page. > Or do I need to add context fragment in > the server.xml of tomcat5.5? > > Question2 > I am starting solr server using start.jar from another location on C: > drive...whereas my home location indicated on D: drive. Is it the root coz I > am not getting the search result? Hmm, as I understand it you are starting two instance of solr! One as a tomcat and the other as jetty. Why do you want that? If you have solr on tomcat you do not need the jetty anymore. I does make 0 sense under normal circumstances to do this. > > Question3 > I have added parameter as C:\solr\data in > solrconfig.xml... That seems to be wrong. It should read ${solr.data.dir:C:\solr \dat} but I am not using windows so I am not sure whether you may need to escape the path. salu2 > but the indexes are not getting stored there...indexes for > search are getting stored in the default dir of solr...any suggestions > > Thanks in advance... > > > Thorsten Scherler wrote: > > > > On Wed, 2008-02-13 at 05:04 -0800, newBea wrote: > >> I havnt used luke.xsl. Ya but the link provided by u gives me "Solr Luke > >> Request Handler Response"... > >> > >> is simple string as: csid > > > > So you have: > > csid > > > > and > > > required="true" /> > > > > > >> > >> till now I am updating docs thru command prompt as : post.jar *.xml > >> http://localhost:8983/update > > > > how do the docs look like? I mean since you changed the sample config > > you send changed documents as well, right? How do they look? > > > >> > >> I am not clear on how do I post xml docs > > > > Well like you said, with the post.jar and then you will send your > > modified docs but there are many ways to trigger an add command to solr. > > > >> or wud xml docs be posted while I > >> request solr thru tomcat at the time of searching text... > > > > To search text from tomcat you will need to have a servlet or something > > similar that contacts the solr server for the search result and the > > handle the response (e.g. apply custom xsl to the results). > > > > > > > >> > >> This manually procedure when I update the xml docs on exampledocs folder > >> inside distribution package restrict it to exampledocs itself > > > > No, either copy the jar to the folder where you have your documents or > > add it to the PATH. > > > >> ...I am not > >> getting a way where my sites text get searched by solr...Do I need to > >> copy > >> start.jar and relevant folders in my working directory for web > >> application. > > > > Hmm, it seems that you not have understood the second paragraph of > > http://wiki.apache.org/solr/mySolr > > > > "Typically it's not recommended to have your front end users/clients > > hitting Solr directly as part of an HTML form submit ... the more > > conventional way to think of it is that Solr is a backend service, which > > your application can talk to over HTTP ..." > > > > Meaning you have two different server running. Alternatively you can run > > solr in the same tomcat as you application. If you follow SolrTomcat > > from the wiki it will be install as "solr" servlet. Your application > > will then communicate with this serlvet. > > > > salu2 > > > >> > >> any help? > >> > >> Thorsten Scherler-3 wrote: > >> > > >> > On Wed, 2008-02-13 at 03:42 -0800, newBea wrote: > >> >> Hi Thorsten, > >> >> > >> >> I have my application running on 8080 port with tomcat 5.5.23I am > >> >> starting solr on port 8983 with jetty server using command "java -jar > >> >> start.jar". > >> >> > >> >> Both the server gets started...now any search I make on tomcat > >> >> application > >> >> is interacting with solr very well. The problem is "schema.xml" and > >> >> &q
Re: How do I secure solr server?
On Thu, 2008-02-21 at 01:46 -0500, Mel Brand wrote: > Hi guys, > > I run solr on a separate server from the application server and I'd > like to know how to protect it. best with a firewall. > I'd like to know how to prevent > someone from communicating to the server and also prevent unauthorized > access (through the web) to admin page. I would not expose http://yourServer:8983 at all. I would use an Apache httpd server as proxy and implement the ac there. salu2 > > Any help is extremely appreciated!! :) > > Thanks, > > Mel -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: solr to work for my web application
On Fri, 2008-02-22 at 04:11 -0800, newBea wrote: > Hi Thorsten, > > Many thanks for ur replies so far...finally i set up correct environment for > Solr. Its working:clap: :) Congrats, glad you got it running. > > Solr Rocks! Indeed. :) salu2 > > Thorsten Scherler wrote: > > > > On Thu, 2008-02-14 at 23:16 -0800, newBea wrote: > >> Hi Thorsten... > >> > >> SOrry for giving u much trouble but I need some answer regarding > >> solr...plz > >> help... > >> > >> Question1 > >> I am using tomcat 5.5.23 so for JNDI setup of solr, adding solr.xml with > >> context fragment as below in the tomcat5.5/...catalina/localhost. > >> > >> > >> >> value="D:/Projects/csdb/solr" override="true" /> > >> > >> > >> Is it the correct way of doing it? > > > > Yes as I understand the wiki page. > > > >> Or do I need to add context fragment in > >> the server.xml of tomcat5.5? > >> > >> Question2 > >> I am starting solr server using start.jar from another location on C: > >> drive...whereas my home location indicated on D: drive. Is it the root > >> coz I > >> am not getting the search result? > > > > Hmm, as I understand it you are starting two instance of solr! One as a > > tomcat and the other as jetty. Why do you want that? If you have solr on > > tomcat you do not need the jetty anymore. I does make 0 sense under > > normal circumstances to do this. > > > >> > >> Question3 > >> I have added parameter as C:\solr\data in > >> solrconfig.xml... > > > > That seems to be wrong. It should read ${solr.data.dir:C:\solr > > \dat} but I am not using windows so I am not sure whether you > > may need to escape the path. > > > > salu2 > > > >> but the indexes are not getting stored there...indexes for > >> search are getting stored in the default dir of solr...any suggestions > >> > >> Thanks in advance... > >> > >> > >> Thorsten Scherler wrote: > >> > > >> > On Wed, 2008-02-13 at 05:04 -0800, newBea wrote: > >> >> I havnt used luke.xsl. Ya but the link provided by u gives me "Solr > >> Luke > >> >> Request Handler Response"... > >> >> > >> >> is simple string as: csid > >> > > >> > So you have: > >> > csid > >> > > >> > and > >> > >> > required="true" /> > >> > > >> > > >> >> > >> >> till now I am updating docs thru command prompt as : post.jar *.xml > >> >> http://localhost:8983/update > >> > > >> > how do the docs look like? I mean since you changed the sample config > >> > you send changed documents as well, right? How do they look? > >> > > >> >> > >> >> I am not clear on how do I post xml docs > >> > > >> > Well like you said, with the post.jar and then you will send your > >> > modified docs but there are many ways to trigger an add command to > >> solr. > >> > > >> >> or wud xml docs be posted while I > >> >> request solr thru tomcat at the time of searching text... > >> > > >> > To search text from tomcat you will need to have a servlet or something > >> > similar that contacts the solr server for the search result and the > >> > handle the response (e.g. apply custom xsl to the results). > >> > > >> > > >> > > >> >> > >> >> This manually procedure when I update the xml docs on exampledocs > >> folder > >> >> inside distribution package restrict it to exampledocs itself > >> > > >> > No, either copy the jar to the folder where you have your documents or > >> > add it to the PATH. > >> > > >> >> ...I am not > >> >> getting a way where my sites text get searched by solr...Do I need to > >> >> copy > >> >> start.jar and relevant folders in my working directory for web > >> >> application. > >> > > >> > Hmm, it seems that you not have understood the second paragraph of > >> > http://wiki.apache.org/solr/mySolr > >> > > >> > "Typically it's not recommended to have your
Re: out of memory every time
On Mon, 2008-03-03 at 21:43 +0200, Justin wrote: > I'm indexing a large number of documents. > > As a server I'm using the /solr/example/start.jar > > No matter how much memory I allocate it fails around 7200 documents. How do you allocate the memory? Something like: java -Xms512M -Xmx1500M -jar start.jar You may have a closer look as well at http://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html HTH salu2 > I am committing every 100 docs, and optimizing every 300. > > all of my xml's contain on doc, and can range in size from 2k to 700k. > > when I restart the start.jar it again reports out of memory. > > > a sample document looks like this: > > > > 1851 > TRAJ20 > 12049 >name="ft:external_ids.SourceAccession:15532">ENSG0211869 > 28735 > HUgn28735 > TRA_ > TRAJ20 > 9953837 >name="ft:external_ids.SourceAccession:15538">ENSG0211869 > T cell receptor alpha > joining 20 > 14q11.2 > 14q11 > 14q11.2 > AE000662.1 > M94081.1 > CH471078.2 > NC_14.7 > NT_026437.11 > NG_001332.2 > 8188290 > The human T-cell receptor > TCRAC/TCRDC (C alpha/C delta) region: organization,sequence, and evolution > of 97.6 kb of DNA. > Koop B.F. > Rowen L. > Hood L. > Wang K. > Kuo C.L. > Seto D. > Lenstra J.A. > Howard S. > Shan W. > Deshpande P. > 31311_at > > > > > > the schema is (in summary): > > multiValued="false" omitNorms="true"/> > multiValued="true" omitNorms="true"/> > > stored="true" omitNorms="true"/> > omitNorms="true"/> > > > > PK > text > > > > > > > and my conf is: >false > 100 > 900 > 2147483647 > 1 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Beginner questions: Jetty and solr with utf-8 + cached page + dedup
On Tue, 2008-03-25 at 10:56 -0700, Vinci wrote: > Hi, > > Thank for your reply. > Question for apply xslt: If I use saxon, where should the saxon.jar located > if I using the example jetty server? lib/ inside example/ or outside the > example/? http://wiki.apache.org/solr/mySolr "... Typically it's not recommended to have your front end users/clients hitting Solr directly as part of an HTML form submit ..." In the above page there you find answers to many of your questions. HTH salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
search engine for regional bulletins
Hi all, I am developing a search engine for a governmental body. This search engine has to index pure xml documents which follow a custom xml schema. The xml documents contain information about laws and official announcements for Andalusia. I need to implement different filter for the search. The current search engine which can be found here [1] would need to be extended by ranges about organizational bodies, kind of announcement (law, resolution,...), ... I played a bit with Nutch 0.8 and asked myself whether it is best tool for the task. I got nutch to index the xml documents and I can as well search the index, but I would need to add filter conditions for the search. The alternative I see would be pure lucene since I am actually not really "crawling" the site since the documents are not linked with each other but put all the files (which have to be indexed) in the urls/bulletin file. Then Zaheed pointed me to Solr and I had played a wee bit around. To give you a better impression of the underlying architecture and xml documents, each weekday there is a new bulletin (containing approx. 100 - 200 pages) eg [2]. This bulletin is stored on the file system and need to be indexed. We have two different document types summaries and dispositions. The summary looks like: 1. DISPOSICIONES GENERALES Decreto 178/2006, de 10 de octubre, por el que se establecen normas de protección de la avifauna para las instalaciones eléctricas de alta tensión Resolución de 10 de octubre de 2006, de la Dirección General de Tesorería y Deuda Pública, por la que se realiza una convocatoria de subasta de carácter ordinario dentro del Programa de Emisión de Bonos y Obligaciones de la Junta de Andalucía. Following the tutorial and looking at the examples it seems that solr only supports one document type. 3007WFP Dell Widescreen UltraSharp 3007WFP The root element add is "just" the command for the server that we want to add the document. Does that mean I would need to stick with this doctype and transform our internal format for adding the document information? Further since the project is for a customer I would need a released version when I put my engine in production. When does this community expect to make its first release, or better asked which are the blockers? TIA for any information. salu2 [1] http://andaluciajunta.es/portal/aj-bojaBuscador/0,22815,,00.html [2] http://andaluciajunta.es/portal/boletines/2006/11/aj-bojaVerPagina-2006-11/0,23167,bi%253D693228039889,00.html
Re: search engine for regional bulletins
On Tue, 2006-11-28 at 10:00 +0100, Bertrand Delacretaz wrote: > Hi Thorsten, good to see you here! :) Hi Bertrand, thanks very much for this warm welcome and I am as well glad to meet you here. > > On 11/28/06, Thorsten Scherler > <[EMAIL PROTECTED]> wrote: > > > ...Following the tutorial and looking at the examples it seems that solr > > only supports one document type. > > > > > > 3007WFP > > Dell Widescreen UltraSharp 3007WFP > > > > ... > > That's right, to add documents to a Solr index you need to transform > them to this model. You're basically creating fields to be indexed, > and the Solr schema.xml allows you to define precisely how you want > each field to be indexed, including strict data types, pluggable > Lucene analyzers, etc. > > This means some work in converting your content model to an "indexing > model", but it's very worth it as it gives you very precise control > about what you index and how. > Yeah, I thought about it last night and I came to the same conclusion. The "extra" work involved is "just" a xsl transformation in my use case, so not really the biggest part of this project. > > ...Further since the project is for a customer I would need a released > > version when I put my engine in production. When does this community > > expect to make its first release, or better asked which are the > > blockers?... > > I'm relatively new here so I'll let others complete this info, but > IIUC the only work needed to do a first release is to make sure all > source files are "clean" w.r.t required Apache license notices. I > don't think there are any technical blockers for a release, many of us > are happily using Solr on production sites. That is good to hear, so if somebody (e.g. me) would check all files for cleanness then we could release, right? Perfect. > > You might want to look at these links for more info: > http://wiki.apache.org/solr/SolrResources > http://wiki.apache.org/solr/PublicServers Thanks very much Bertrand, I will look at this information. I am still evaluating what is best for this project, but solr sounds very interesting ATM. salu2 > > -Bertrand
Re: search engine for regional bulletins
On Tue, 2006-11-28 at 11:30 -0500, Yonik Seeley wrote: > On 11/28/06, Thorsten Scherler > <[EMAIL PROTECTED]> wrote: > > That is good to hear, so if somebody (e.g. me) would check all files for > > cleanness then we could release, right? Perfect. > > Correct. All IP issues have been cleared, so It's just a matter of > taking the time to put the release into a form that will be accepted > by the incubator. I expect we will be making a release candidate > within a few weeks. Of course the incubator guys always finds > problems, so getting an actual release out takes longer. > Yeah, I have been in the incubator with lenya and we made some valuable experience back then. Further I see many committer here with some experience in different Apache PMC's so hopefully we get it straight right away and the incubator PMC does not find many issues. I will try to help the best I can. > -Yonik Thanks Yonik. salu2
solr index reusable with nutch?
Hi all, is it possible to directly use the solr index in nutch? My client is creating a portal search based on nutch. In this portal there is as well my project and ATM I prefer to go with solr instead of nutch since it its much better for my use case. Now the question is whether the portal search engine could use the solr index for my part of the portal. Can somebody point me to related documentation? TIA salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: solr index reusable with nutch?
On Wed, 2006-12-13 at 07:45 -0800, Otis Gospodnetic wrote: > Hi, > > Solr should be able to search any Lucene index, ok, good to know. :) So can I guess that the same is true for nutch? Meaning the index solr is creating could be used by a nutch searcher. > not just those created by Solr itself, as long as you configure it properly > via schema.xml. http://wiki.apache.org/solr/SchemaXml?highlight=%28schema%29 > Thus, you should be able to use Solr to search an index created by Nutch. In my use case I need the reverse. Nutch searches the index created by my solr application. The application is just one component in the portal and the portal will provide a "global" search engine which should use the index from solr. > Haven't tried it. It would be nice if you could contribute the > configuration for doing this. > As I figure it out I will keep you informed. Thanks for the feedback. salu2 > Otis > > - Original Message > From: Thorsten Scherler <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Wednesday, December 13, 2006 8:26:51 AM > Subject: solr index reusable with nutch? > > Hi all, > > is it possible to directly use the solr index in nutch? > > My client is creating a portal search based on nutch. In this portal > there is as well my project and ATM I prefer to go with solr instead of > nutch since it its much better for my use case. > > Now the question is whether the portal search engine could use the solr > index for my part of the portal. > > Can somebody point me to related documentation? > > TIA > > salu2
Re: solr index reusable with nutch?
On Thu, 2006-12-14 at 11:14 -0800, Chris Hostetter wrote: > : In my use case I need the reverse. Nutch searches the index created by > : my solr application. The application is just one component in the portal > : and the portal will provide a "global" search engine which should use > : the index from solr. > > If you have a compatible schema, then it should be possible ... but if > your goal is to make an index with a biz object specific schema and then > use it as a single collection/source in a nutch installation, that may not > sork ... Yeah, that makes sense. > i'm not sure how flexible Nutch is about the indexes it can > hanlde: it's probably a question best asked on the Nutch user list. > Yeah, you are right. Thanks for the feedback. salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: solr index reusable with nutch?
On Thu, 2006-12-14 at 11:14 -0800, Chris Hostetter wrote: > : In my use case I need the reverse. Nutch searches the index created by > : my solr application. The application is just one component in the portal > : and the portal will provide a "global" search engine which should use > : the index from solr. > > If you have a compatible schema, then it should be possible ... but if > your goal is to make an index with a biz object specific schema and then > use it as a single collection/source in a nutch installation, that may not > sork ... i'm not sure how flexible Nutch is about the indexes it can > hanlde: it's probably a question best asked on the Nutch user list. I did some testing with nutch searching over a solr index. Like Chris said "compatible schema" are the only important point on this issue. To put it in other words, nutch uses by default to search and returns some fields by default. So if you are not keen to write your own nutch plugin for your custom solr schema, just make sure that you use the field name="content" to store your main text. You can further enhance the integration by using the "nutch" names for "important" fields. Further I have in my schema and it is the only field that I see in the response of nutch. sh bin/nutch org.apache.nutch.searcher.NutchBean presidencia Total hits: 3 0 null//2006/209/disposition/19923-a.html 1 null//2006/209/disposition/20246-a.html 2 null//2006/209/disposition/20034-a.html This is good enough for my client and me since I can transform that afterward. :) Thanks Chris and Otis for your feedback. salu2 > > > > > -Hoss >
Re: Realtime directory change...
On Thu, 2006-12-21 at 12:23 -0800, escher2k wrote: > Hi, > We currently use Lucene to do index user data every couple of hours - the > index is completely rebuilt, > the old index is archived and the new one copied over to the directory. > Example - > > /bin/cp ${LOG_FILE} ${CRON_ROOT}/index/help/ > /bin/rm -rf ${INDEX_ROOT}/archive/help.${DATE} > /bin/cp -R ${CRON_ROOT}/index/help ${INDEX_ROOT}/help.new > /bin/mv ${INDEX_ROOT}/help ${INDEX_ROOT}/archive/help.${DATE} > /bin/mv ${INDEX_ROOT}/help.new ${INDEX_ROOT}/help > > This works fine since the index is retrieved every time from the disk. Is it > possible to do the same with Solr ? > Assuming we also use caching to speed up the retrieval, is there a way to > invalidate some/all caches when > this done ? > Did you look into http://wiki.apache.org/solr/CollectionDistribution http://wiki.apache.org/solr/SolrCollectionDistributionScripts http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline I am still very new to solr but it sounds like it is exactly what you need (like as well said by others). HTH salu2 > Thanks. >
Re: Help with spellchecker integration
On Thu, 2006-12-21 at 21:27 -0800, Otis Gospodnetic wrote: > Hi, > I'm trying to integrate the Lucene-based spellchecker > (http://wiki.apache.org/jakarta-lucene/SpellChecker + contrib/spellchecker > under Lucene) with Solr (http://issues.apache.org/jira/browse/SOLR-81) in > order to provide a query spellchecking service (you enter Speers and it > suggest pant^H^H ... Spears). I've created a generic NGramTokenizer (+ > NGramTokenizerFactory + unit test) that I'll attach to SOLR-81 shortly. > > What I'm not yet sure about is: > 1) integration of this generic n-grammer with that Lucene SpellChecker code - > SpellChecker & TRStringDistance classes in particular. Hmm, reading SOLR-81, you actually have everything you need. > 2) mapping n-gram Tokens that come out of my NGramTokenizer to specific field > names, like 3start, 4start, gram1, gram2, gram3 is there is scheme.xml > trick one can use to accomplish this? It is in the issue: ... ... The above shows how to configure the second (spellcheck) index, however if you want to update both indexes at the same time you need to write your own implementation of the update servlet. > 3) once 2) is done, getting the request handler(?) to n-gram the query > appropriately and hit the SpellChecker index to try and find alternative > spelling suggestions. hmm, not sure, actually IMHO that highly depends on how you plan to use it in the end. I mean there is more then one way to use spell check. In the issue they talked about AJAX suggestions but that would be IMO before the actual search request. If you want to have it in the request handler then you need to decide how and when the spellchecker comes into place. Like if the normal search does not return a result or parallel. Parallel would search in the spell check index for alternatives, use this alternatives to dispatch the alternative word query and later on parse the result of directly into the output writer. Here you have again different alternatives, you can attack the solr index directly (loosing all the cool feature) Or you want the google thingy "Did you mean". ... in any form start with: public class NGramRequestHandler extends StandardRequestHandler implements SolrRequestHandler, SolrInfoMBean { public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) { // Depending on the use case do your processing here } } This way you just need to implement the class specific methods. > > Damn, that's a lot of unknowns... on top of that my computer started freezing > every half an hour. Hi Murphy. > > > > Any pointers will be greatly appreciated. Thanks, HTH a wee bit. salu2 > Otis > > >
Re: Solr 1.1 released
On Fri, 2006-12-22 at 17:07 -0500, Yonik Seeley wrote: > Solr 1.1 is now available for download! Very nice. :) Thanks a lot to this community and especially to Yonik who packed the release. salu2
Is there a BasicSummarizer for solr?
Hi all, I need to implement a summary function with solr like there is in nutch. Basically it returns x words before and after the query term to show the content where the term is embedded (like as google does). In nutch this functionality is provided by http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary-basic/ and especially http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary-basic/src/java/org/apache/nutch/summary/basic/BasicSummarizer.java?view=markup There is another similar plugin/class in http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary-lucene/ Is there something similar in solr? If not which is the best way to implement this functionality? TIA for any tips. salu2
Re: Is there a BasicSummarizer for solr?
On Tue, 2007-01-02 at 08:14 -0500, Erik Hatcher wrote: > Thorsten - there is support for the Lucene Highlighter built into > Solr. You can see details of how to use it here: > > <http://wiki.apache.org/solr/HighlightingParameters> > >Erik > :) Cheers Erik, with this information and a small change in my schema changed stored="false" to stored="true" on my main content, I get exactly what I needed. Now I have to see the effect of storing the content in the index regarding size and response time. Thanks again. salu2 > > On Jan 2, 2007, at 7:26 AM, Thorsten Scherler wrote: > > > Hi all, > > > > I need to implement a summary function with solr like there is in > > nutch. > > Basically it returns x words before and after the query term to > > show the > > content where the term is embedded (like as google does). > > > > In nutch this functionality is provided by > > http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary- > > basic/ > > and especially > > http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary- > > basic/src/java/org/apache/nutch/summary/basic/BasicSummarizer.java? > > view=markup > > > > There is another similar plugin/class in > > http://svn.apache.org/viewvc/lucene/nutch/trunk/src/plugin/summary- > > lucene/ > > > > Is there something similar in solr? > > > > If not which is the best way to implement this functionality? > > > > TIA for any tips. > > > > salu2 >
How to tell the highlighter not to escape?
Hi all, I am playing around with the highlighter and found that all highlight terms get escaped. I mean solr will return TERM and not TERM I am not sure where this escaping is happening but I would need the highlighting to NOT escape the hl.simple.pre and hl.simple.post tag since it is horror to work with cdata sections in xsl. I had a look in the lucene highlighter and it seem that it does not escape the tags. Can somebody point me to code which is responsible for escaping and maybe give me a tip how I can patch to make it configurable. TIA salu2
Re: How to tell the highlighter not to escape?
On Wed, 2007-01-03 at 02:16 +, Edward Garrett wrote: > thorsten, > > see the following for discussion. your case is indeed an annoyance--the > thread below discusses motivations for it and ways of working around it. (i > too confess that i wish it were not so.) > > http://www.mail-archive.com/solr-user@lucene.apache.org/msg01483.html Thanks Edward, the problem is with the suggestion in the above thread is that: "just create an XSL that generates XML and unescapes the fields you know will contain wellformed XML data -- then apply your second transform client side" Is not possible with xsl. See e.g. http://www.biglist.com/lists/xsl-list/archives/200109/msg00318.html "> How can I match the Cdata Section?!? > You can't, the XPath data model regards CDATA as merely an input shortcut, not as an information-bearing part of the XML content. In other words, "" and "x" look exactly the same to the XSLT processor. Mike Kay" Michael Kay is the xsl guru and I can say as well from my own experience one would need to write a custom parser since is equal to <em>TERM</em> and this in xsl is a string (XPath would match text()). IMO the highlighter should really return pure xml and not escape it. I will have a look in the XmlResponseWriter maybe I find a way to change this. salu2 > > -edward > > On 1/2/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > > Hi Thorsten, > > > > The highlighter does not escape anything itself: you are seeing the > > results of solr's automatic escaping of xml data within its xml > > response. This should be transparent (your xml decoder should > > un-escape the values on the way out). I'm not really familiar with > > xslt so I'm unsure why that isn't so (perhaps it is automatically > > html-escaping the values after un-xml-escaping them?) > > > > Be careful of documents containing html fragments natively. > > > > cheers, > > -MIke > > > > On 1/2/07, Thorsten Scherler <[EMAIL PROTECTED]> > > wrote: > > > Hi all, > > > > > > I am playing around with the highlighter and found that all highlight > > > terms get escaped. > > > > > > I mean solr will return > > > <em>TERM</em> and not > > > TERM > > > > > > I am not sure where this escaping is happening but I would need the > > > highlighting to NOT escape the hl.simple.pre and hl.simple.post tag > > > since it is horror to work with cdata sections in xsl. > > > > > > I had a look in the lucene highlighter and it seem that it does not > > > escape the tags. > > > > > > Can somebody point me to code which is responsible for escaping and > > > maybe give me a tip how I can patch to make it configurable. > > > > > > TIA > > > > > > salu2 > > > > > > > > > > > -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: How to tell the highlighter not to escape?
On Wed, 2007-01-03 at 12:06 +, Edward Garrett wrote: > for what it's worth, i wrote a recursive template in xsl that replaces the > escaped characters with actual elements. here, the variable $val would be > the tag, e.g. "em". this has been working okay for me so far. Yeah, many thanks for posting this template. This is actually "imitating" a parser. However I still think the highlighter should return unescaped tags for highlighting. There is IMO no benefit for the current behavior. Thanks again Edward. salu2 > > > > > > > > select="substring($insideEm, string-length($preEm)+5)"/> > > > > > > > > > > > > > > On 1/3/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > > > > On Wed, 2007-01-03 at 02:16 +, Edward Garrett wrote: > > > thorsten, > > > > > > see the following for discussion. your case is indeed an annoyance--the > > > thread below discusses motivations for it and ways of working around it. > > (i > > > too confess that i wish it were not so.) > > > > > > http://www.mail-archive.com/solr-user@lucene.apache.org/msg01483.html > > > > Thanks Edward, the problem is with the suggestion in the above thread is > > that: > > "just create an XSL that > > generates XML and unescapes the fields you know will contain wellformed > > XML data -- then apply your second transform client side" > > > > Is not possible with xsl. See e.g. > > http://www.biglist.com/lists/xsl-list/archives/200109/msg00318.html > > "> How can I match the Cdata Section?!? > > > > > You can't, the XPath data model regards CDATA as merely an input shortcut, > > not as an information-bearing part of the XML content. In other words, > > "" and "x" look exactly the same to the XSLT processor. > > > > Mike Kay" > > > > Michael Kay is the xsl guru and I can say as well from my own experience > > one would need to write a custom parser since > > is equal to <em>TERM</em> and this in xsl is a string (XPath > > would match text()). > > > > IMO the highlighter should really return pure xml and not escape it. > > I will have a look in the XmlResponseWriter maybe I find a way to change > > this. > > > > salu2 > > > > > > > > > > -edward > > > > > > On 1/2/07, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi Thorsten, > > > > > > > > The highlighter does not escape anything itself: you are seeing the > > > > results of solr's automatic escaping of xml data within its xml > > > > response. This should be transparent (your xml decoder should > > > > un-escape the values on the way out). I'm not really familiar with > > > > xslt so I'm unsure why that isn't so (perhaps it is automatically > > > > html-escaping the values after un-xml-escaping them?) > > > > > > > > Be careful of documents containing html fragments natively. > > > > > > > > cheers, > > > > -MIke > > > > > > > > On 1/2/07, Thorsten Scherler < > > [EMAIL PROTECTED]> > > > > wrote: > > > > > Hi all, > > > > > > > > > > I am playing around with the highlighter and found that all > > highlight > > > > > terms get escaped. > > > > > > > > > > I mean solr will return > > > > > <em>TERM</em> and not > > > > > TERM > > > > > > > > > > I am not sure where this escaping is happening but I would need the > > > > > highlighting to NOT escape the hl.simple.pre and hl.simple.post tag > > > > > since it is horror to work with cdata sections in xsl. > > > > > > > > > > I had a look in the lucene highlighter and it seem that it does not > > > > > escape the tags. > > > > > > > > > > Can somebody point me to code which is responsible for escaping and > > > > > maybe give me a tip how I can patch to make it configurable. > > > > > > > > > > TIA > > > > > > > > > > salu2 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > thorsten > > > > "Together we stand, divided we fall!" > > Hey you (Pink Floyd) > > > > > > > > -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
[ANN] Apache Forrest/Cocoon based solr client plugin
Hi all, I am happy to announce that I just add a Apache Forrest based Apache Solr client plugin to the forrest whiteboard. It may be from interest for the ones using Apache Cocoon based Apache Forrest and Apache Lucene based Apache Solr. org.apache.forrest.plugin.output.solr generates Apache Solr documents from Apache Forrest xdos. Further when run with the Apache Forrest Dispatcher it provides a GUI to manage your project in solr and a search interface to search your solr server. The documentation and a couple of screenshots can be found at http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/ The source code can be found at http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/ Have fun with it and please provide feedback to this list. salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: Seeking FAQs
On Sat, 2007-01-06 at 10:25 -0500, David Halsted wrote: > I wonder what would happen if we used a clustering engine like Carrot > to categorize either the e-mails in the archive or the results of > searches against them? Perhaps we'd find some candidates for the FAQ > that way. Not sure about tools but IMO this works fine done by user/committer. I think the one that asked the question on the list is a likely candidate to add an entry in the faq. The typical scenario should be: user asks question -> user get answers from community -> user adds FAQ entry with the solution that worked for her This way the one asking the question can give a little something back to the community. If you follow the lists a bit one can identify some faq's right away: - Searching multiple indeces - Clustering solr (custom scorer, highlighter, ...) - ... > > Dave > > On 1/5/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > Hey everybody, > > > > I was lookin at the FAQ today, and I realized it hasn't really changed > > much in the past year ... in fact, only two people besides myself have > > added questions (thanks Thorsten and Darren) in the entire time Solr > > has been in incubation -- which is not to say that Erik and Respaldo's > > efforts to fix my typo's aren't equally helpful :) > > > > http://wiki.apache.org/solr/FAQ > > > > In my experience, FAQs are one of the few pieces of documentation that are > > really hard for developers to write, because we are so use to dealing with > > the systems we work on, we don't allways notice when a question has been > > asked more then once or twice (unless it gets asked over and over and > > *over*). The best source of FAQ updates tend to come from users who have > > a question, and either find the answer in the mailing list archives, or > > notice the same question asked by someone else later. > > Yes, I totally agree. Sometimes the content for the solution can be found in the wiki. One would just need to link to the wiki page from the FAQ. > > So If there are any "gotchas" you remember having when you first started > > using Solr, or questions you've noticed asked more then once please feel > > free to add them to the wiki. The Convention is to only add a question if > > you're also adding an answer, but even if you don't think a satisfactory > > answer has ever been given, or you're not sure how to best summarize > > multiple answers given in the past, just including links to > > instances in the mailing list archives where the question was asked is > > helpful -- both in the short term as pointers for people looking for help, > > and in the long term as starter points for people who want to flesh out a > > detailed answer. > > In the long run the content of the wiki that has proved solution should IMO go directly in the official documentation. salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: newbie question on determining fieldtype
On Mon, 2007-01-08 at 10:29 -0300, mike topper wrote: > Hi, > > I have a question that I couldn't find the exact answer to. > > I have some fields that I want to add to my schema but will never be > searched on. They are only used as additional information about a > document when retrieved. They are integers, so should i just have the > field be: > > stored="true"/> > > I'm pretty sure this is right, but I just wanted to check that I'm not > missing any speedups from using a different field > or adding some other parameters. > Seems pretty right to me. Did you read http://wiki.apache.org/solr/SchemaXml and saw the comment: HTH salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: Performance tuning
On Thu, 2007-01-11 at 14:57 +, Stephanie Belton wrote: > Hello, > > > > Solr is now up and running on our production environment and working great. > However it is taking up a lot of extra CPU and memory (CPU usage has doubled > and memory is swapping). Is there any documentation on performance tuning? > There seems to be a lot of useful info in the server output but I don’t > understand it. > > > > E.g. > filterCache{lookups=0,hits=0,hitratio=0.00,inserts=537,evictions=0,size=337,cumulative_lookups=4723,cumulative_hits=3708,cumulative_hitratio=0.78,cumulative_inserts=4647,cumulative_evictions=72} > > > queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=256,evictions=0,size=256,cumulative_lookups=3779,cumulative_hits=552,cumulative_hitratio=0.14,cumulative_inserts=3632,cumulative_evictions=0} > > > documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,cumulative_lookups=66005,cumulative_hits=2460,cumulative_hitratio=0.03,cumulative_inserts=63545,cumulative_evictions=4195} > > > > etc. what should I be watching out for? > Hi Stephanie, did you see http://wiki.apache.org/solr/SolrPerformanceFactors? Further you may consider to balance the load via http://wiki.apache.org/solr/CollectionDistribution HTH salu2 > > > Thanks > > Stephanie >
Re: How can I update a specific field of an existing document?
On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote: > Hello everybody, > I want update a specific field in a document, but i don't find how do it > in the documentation of Solr. > Is that posible?, I need to index only a field for a document, Do i have > to index all the document for this? > The problem is that i have to transform a bizdata object to a file > content xml in java, i should to build all the document xml step by > step, field by field, retrieving all the bizdata of database to be > passed to Solr. > On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote: > In Lucene to update a document the operation is really a delete > followed by an add. You will need to add the complete document as > there is no such "update only a field" semantics in Lucene. This is from a thread in the dev list. So no it is not possible to just update one field. HTH salu2 > Thanks in advance. >
Re: How can I update a specific field of an existing document?
On Thu, 2007-01-11 at 17:48 +0100, Thorsten Scherler wrote: > On Thu, 2007-01-11 at 10:19 -0600, Iris Soto wrote: > > Hello everybody, > > I want update a specific field in a document, but i don't find how do it > > in the documentation of Solr. > > Is that posible?, I need to index only a field for a document, Do i have > > to index all the document for this? No, just the one document. Let's say you have a CMS and you edit one document. You will need to re-index this document only by using the the add solr statement for the whole document (not one field only). > > The problem is that i have to transform a bizdata object to a file > > content xml in java, i should to build all the document xml step by > > step, field by field, retrieving all the bizdata of database to be > > passed to Solr. see above only for the document where the field are changed. I wrote a small cocoon based plugin in forrest doing the cms related example. It adds an document related solr gui for a cms like system. Maybe that gives you some ideas for your own app. > > > > On Thu, 2007-01-11 at 06:43 -0500, Erik Hatcher wrote: > > In Lucene to update a document the operation is really a delete > > followed by an add. You will need to add the complete document as > > there is no such "update only a field" semantics in Lucene. > > This is from a thread in the dev list. could not access the archive the first time: http://www.nabble.com/forum/ViewPost.jtp?post=8275908&framed=y HTH salu2 > > So no it is not possible to just update one field. > > HTH > > salu2 > > > Thanks in advance. > > > -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: [ANN] Apache Forrest/Cocoon based solr client plugin
On Tue, 2007-01-09 at 22:50 -0500, Yonik Seeley wrote: > Thanks Thorsten, > > Knowing nothing about cocoon and little about forrest, I'm not sure > exactly what this does :-) > jeje, fair enough. You know forrest from the solr webpage. What I did is a small generic way to access the solr server with cocoon/forrest. What it does is mainly solving (basic) SOLR-20 & SOLR-30 for cocoon. You can update and select content from the solr server connecting to the http interface. The nice thing is the power of cocoon that is Bertrand always talking about. ;) We use the output of the solr server as is and use it in the transformation pipeline. The update interface is http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/images/gui-actionbar.png and it returns a small success/error page (depending of the solr response). This interface is half way url specific (add and delete) and you can execute the commit and optimize commands on ever page. It is based on the solr generator which is a wrapper of http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/src/java/org/apache/forrest/http/client/PostFile.java?view=markup Which is a simple class to post a file from one url to another. The response body is provide stream and as string. I wrote this simple class since the patches of SOLR-20 & SOLR-30 are not yet applied. > I'll take a guess in non-cocoon/forrest speech: does it allow you to > update a Solr server with the content of your website at the same time > you generate (or change) the site? Well, it is not working so for in the static build meaning "forrest" (not sure ATM why myself) which would exactly do what you say regarding generating the site. In "forrest run", the dynamic mode of forrest, however it lets ... >So it's a push model of web > indexing instead of spidering? Exactly. To finish above sentence ... you push update commands to the server based on each selected page. > The search-box I understand, but > presumably that needs to point to a running Solr server somewhere. Yes. http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/index.html "... The host server urls can be configured by adding the following properties to your project forrest.properties.xml in case you do not use the default values. http://localhost:8983/solr/select"/> http://localhost:8983/solr/update"/> ..." The forrest.properties.xml is new in 0.8-dev. The result will be transformed to something like: http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/images/result.png I added a transformer that adds the paginator part to the solr select result. The paginator is the "Result pages" part of above screenshot. Hmm, that makes me think whether that (the paginator) would be better directly in solr core. wdyt? salu2 > > -Yonik > > On 1/7/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > > Hi all, > > > > I am happy to announce that I just add a Apache Forrest based Apache > > Solr client plugin to the forrest whiteboard. It may be from interest > > for the ones using Apache Cocoon based Apache Forrest and Apache Lucene > > based Apache Solr. > > > > org.apache.forrest.plugin.output.solr generates Apache Solr documents > > from Apache Forrest xdos. Further when run with the Apache Forrest > > Dispatcher it provides a GUI to manage your project in solr and a search > > interface to search your solr server. > > > > The documentation and a couple of screenshots can be found at > > http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/ > > > > The source code can be found at > > http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/ > > > > Have fun with it and please provide feedback to this list. -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: XML querying
On Mon, 2007-01-15 at 12:23 +, Luis Neves wrote: > Hello. > What I do now to index XML documents it's to use a Filter to strip the > markup, > this works but it's impossible to know where in the document is the match > located. > What would it take to make possible to specify a filter query that accepts > xpath > expressions?... something like: > > fq=xmlField:/book/content/text() > > This way only the "/book/content/" element was searched. > > Did I make sense? Is this possible? AFAIK short answer: no. The field is ALWAYS plain text. There is no xmlField type. ...but why don't you just add your text in multiple field when indexing. Instead of plain stripping the markup do above xpath on your document and create different fields. Like Makes sense? HTH salu2 > > -- > Luis Neves
Re: Calling Solr requests from java code - examples?
On Tue, 2007-01-16 at 12:52 +0100, [EMAIL PROTECTED] wrote: > Thanks! > > and how would you do it calling it from another web application, let's > say from a servlet or so? I need to do some stuff in my web java code, > then call the Solr service and do some more stuff afterwards > Have a look at https://issues.apache.org/jira/browse/SOLR-86 HTH salu2
Re: Converting Solr response back to pojo's, experiences?
On Tue, 2007-01-16 at 14:58 +0100, [EMAIL PROTECTED] wrote: > Anyone having experience converting xml responses back to pojo's, > which technologies have you used? > > Anyone doing json <-> pojo's? Using pure xml myself but have a look at https://issues.apache.org/jira/browse/SOLR-20 and https://issues.apache.org/jira/secure/attachment/12348567/solr-client.zip HTH salu2 > > Grtz >
Re: solr + cocoon problem
On Tue, 2007-01-16 at 16:19 -0500, Walter Lewis wrote: > [EMAIL PROTECTED] wrote: > > Any ideas on how to implement a cocoon layer above solr? I just finished a forrest plugin (in the whiteboard, our testing ground in forrest) that is doing what you asked for and some pagination. Forrest is cocoon based so you just have to build the plugin jar and add it to your cocoon project. Please ask on the forrest list if you have problems. http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/ > You're far from the only one approaching solr via cocoon ... :) > > The approach we took, passes the search parameters to a "solrsearch" > stylesheet, the heart of which is a block that embeds the > solr results. A further transformation prepares the results of the solr > query for display. That was my first version for above plugin as well, but since forrest makes use of the cocoon crawler I needed something with a default search string for offline generation. You should have a closer look on http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/output.xmap?view=markup and http://svn.apache.org/viewvc/forrest/trunk/whiteboard/plugins/org.apache.forrest.plugin.output.solr/input.xmap?view=markup For the original use case of this thread I added a generator: and as well a paginator transformer that calculates the next pages based on start, rows and numFound: We use it as follows: You may be interested in the update generator as well. Please give feedback to [EMAIL PROTECTED] It really needs more testing besides myself, you could be the first to provide feedback. HTH salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: solr + cocoon problem
On Tue, 2007-01-16 at 16:02 -0500, [EMAIL PROTECTED] wrote: > Hi, > > I am trying to implement a cocoon based application using solr for searching. > In particular, I would like to forward the request from my response page to > solr. I have tried several alternatives, but none of them worked for me. > Please see http://wiki.apache.org/solr/SolrForrest. salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: Calling Solr requests from java code - examples?
On Tue, 2007-01-16 at 13:56 +0100, Bertrand Delacretaz wrote: > On 1/16/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > > > ...Have a look at > > https://issues.apache.org/jira/browse/SOLR-86... > > Right, I should have mentioned this one as well. I have linked SOLR-20 > and SOLR-86 now, so that people can see the various options for Java > clients. Cheers, mate. :) salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: XML querying
On Mon, 2007-01-15 at 13:42 +, Luis Neves wrote: > Hi! > > Thorsten Scherler wrote: > > > On Mon, 2007-01-15 at 12:23 +, Luis Neves wrote: > >> Hello. > >> What I do now to index XML documents it's to use a Filter to strip the > >> markup, > >> this works but it's impossible to know where in the document is the match > >> located. > >> What would it take to make possible to specify a filter query that accepts > >> xpath > >> expressions?... something like: > >> > >> fq=xmlField:/book/content/text() > >> > >> This way only the "/book/content/" element was searched. > >> > >> Did I make sense? Is this possible? > > > > AFAIK short answer: no. > > > > The field is ALWAYS plain text. There is no xmlField type. > > > > ...but why don't you just add your text in multiple field when indexing. > > > > Instead of plain stripping the markup do above xpath on your document > > and create different fields. Like > > > select="/book/content/text()"/> > > > > > > Makes sense? > > Yes, but I have documents with different schemas on the same "xml field", > also, > that way I would have to know the schema of the documents being indexed > (which > I don't). > > The schema I use is something like: > > > > Where each distinct DocumentType has its own schema. > > I could revise this approach to use an Solr instance for each DocumentType > but I > would have to find a way to "merge" results from the different instances > because > I also need to search across different DocumentTypes... I guess I'm SOL :-( > I think you should explain your use case a wee bit more. >>> What I do now to index XML documents it's to use a Filter to strip the markup, > >> this works but it's impossible to know where in the document is the match > >> located. why do you need to know where? Maybe we can think of something. salu2 -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: XML querying
On Wed, 2007-01-17 at 09:36 +, Luis Neves wrote: > Hi, > > Thorsten Scherler wrote: > > On Mon, 2007-01-15 at 13:42 +, Luis Neves wrote: > > > > > I think you should explain your use case a wee bit more. > > > >>>> What I do now to index XML documents it's to use a Filter to strip > > the markup, > >>>> this works but it's impossible to know where in the document is the > >>>> match located. > > > > why do you need to know where? > > Poorly phrased from my part. Ideally I want to apply "lucene filters" to the > xml > content. > Something like what Nux does: > <http://dsd.lbl.gov/nux/api/nux/xom/pool/FullTextUtil.html> > http://dsd.lbl.gov/nux/#Google-like realtime fulltext search via Apache Lucene engine If you have a look at this you will see that the lucene search is plain and not xquery based. It is more that you can define relations like in SQL connecting tow tables via keys. Like I understand it, it will return the docs that have the xpath /books/book[author="James" and the lucene:match(abstract, $query) where the lucene match is based on a normal lucene query. I reckon it should be very easy to do something like this in a client environment like cocoon/forrest. See the nux code for getting an idea. If I would need to solve this I would look for a component that allows me XQuery like nux and a component that let me do query against a solr server. Then you "just" need to match the documents that return for both components a result with a custom method. salu2 > > -- > Luis Neves
Re: solr + cocoon problem
On Wed, 2007-01-17 at 10:25 -0500, [EMAIL PROTECTED] wrote: > Hi, > > I agree, this is not a legal URL. But the thing is that cocoon itself is > sending the unescaped URL. ...because you told it so. You use http://hostname/solr/select/?q={request-param:q}"; type="file" > The request param module will not escape the param by default. salu2
Re: Solr "autostart"
On Sun, 2007-01-28 at 10:34 -0500, Tim Archambault wrote: > Using Solr with Jetty on linux VPS server. When I ssh and run "start.jar" I > can go to a web browser and with success to the /solr/admin page. I acn > query with the whole "nine" no problems. > However when I close out my terminal session (iBook) I cannot access the > solr web interface. My intuition is that when my terminal closes, port 8983 > is no longer available. Try starting the server with java -jar start.jar 2>&1 & That should keep it up after you disconnect. HTH salu2 > > How can I set my VPS up so that SOLR just works without manual prompting? > Server restart, program failure, etc. > > Thanks for any help. > > Tim -- thorsten "Together we stand, divided we fall!" Hey you (Pink Floyd)
Re: Posting from Ant
On Thu, 2007-02-01 at 16:48 -0500, Erik Hatcher wrote: > The benefit to having a solution (now I'm beginning to > speak like a Rubyist, eh? Ever toyed with Rake, Peter?) is that you > can handle errors yourself. > > I never really expected the pipeline to be XML files -> XSLT *files* - > > HTTP POST -> Solr. > > The *files* part here is key. Can't ya get your Cocoon-skinned cap > on and roll a pipeline that does it all on the fly with badass > compiled style sheet performance, IoC configurable, da works. Right > Bess? I'd be happy to collaborate with Bess to wire in a Cocoon > kinda Ant task wrapper if the world would be a better place with it. I wrote something like this. http://wiki.apache.org/solr/SolrForrest I am using it in my project in an ant task that will call the forrest site target and request the indexing actions url (e.g. index.solr.add.do) http://forrest.apache.org/pluginDocs/plugins_0_80/org.apache.forrest.plugin.output.solr/ salu2 > > Erik > > On Feb 1, 2007, at 11:43 AM, Binkley, Peter wrote: > > > Thanks, I'll try that out. I hope there aren't any encoding issues... > > Nah, how likely is that? I'll report back. > > > > Peter > > > > -Original Message- > > From: Erik Hatcher [mailto:[EMAIL PROTECTED] > > Sent: Thursday, February 01, 2007 6:38 AM > > To: solr-user@lucene.apache.org > > Subject: Fwd: Posting from Ant > > > > Ok, we have it on good authority that is the way to go > > for Ant > > -> POST -> Solr. > > > > Erik > > > > > > Begin forwarded message: > > > >> From: Steve Loughran <[EMAIL PROTECTED]> > >> Date: February 1, 2007 8:34:33 AM EST > >> To: Erik Hatcher <[EMAIL PROTECTED]> > >> Subject: Re: Posting from Ant > >> > >> On 01/02/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > >>> cool, thanks. it only posts a single file, it looks like, but i > >>> suppose the ant-contrib task would be the way to go to > >>> post > >>> a directory full of .xml files? or is there now something in ant > >>> that can do that iteration that i'm unaware of? > >> > >> well, someone could add multifile post, but foreach makes mores sense > >> > >>> > >>> woefully ignorant of the latest stuff in ant, > >>> Erik > >>> > >>> On Feb 1, 2007, at 2:52 AM, Steve Loughran wrote: > >>> > >>>> yes, there is an antlib (not released, you need to build it > >>> yourself) > >>>> that does posts, including http forms posting. > >>>> > >>>> http://svn.apache.org/viewvc/ant/sandbox/antlibs/http/trunk/ > >>>> > >>>> On 01/02/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > >>>>> Steve, > >>>>> > >>>>> Know of any HTTP POST tasks that could take a directory .xml files > >>>>> and post them to Solr? We do it with curl like this, with Solr's > >>>>> post.sh: > >>>>> > >>>>>FILES=$* > >>>>>URL=http://localhost:8983/solr/update > >>>>> > >>>>>for f in $FILES; do > >>>>> echo Posting file $f to $URL > >>>>> curl $URL --data-binary @$f -H 'Content-type:text/xml; > >>>>> charset=utf-8' > >>>>> echo > >>>>>done > >>>>> > >>>>>#send the commit command to make sure all the changes are > >>> flushed > >>>>> and visible > >>>>>curl $URL --data-binary '' > >>>>> > >>>>> But something more Ant-centric would be tasty. > >>>>> > >>>>> Thanks, > >>>>> Erik > >>>>> > >>>>> > >>>>> > >>>>> Begin forwarded message: > >>>>> > >>>>>> From: "Binkley, Peter" <[EMAIL PROTECTED]> > >>>>>> Date: January 31, 2007 1:56:06 PM EST > >>>>>> To: > >>>>>> Subject: Posting from Ant > >>>>>> Reply-To: solr-user@lucene.apache.org > >>>>>> > >>>>>> Is there an Ant task out there somewhere that can POST > >>> bunches of > >>>>>> files > >>>>>> to Solr, doing what the post.sh script does but with filesets? > >>>>>> > >>>>>> I've found the http post task > >>>>>> (http://antelope.tigris.org/nonav/docs/manual/bk03ch17.html), > >>>>> but it > >>>>>> just posts name-value pairs, not files; and Slide's set of > >>> webdav > >>>>>> client > >>>>>> tasks > >>>>>> (http://gulus.usherbrooke.ca/pub/appl/apache/jakarta/slide/ > >>>>> binaries/ > >>>>>> jaka > >>>>>> rta-slide-ant-webdav-bin-2.1.zip) has PUT and GET but not > >>> POST. It > >>>>>> shouldn't be hard to adapt one of these, but something pre- > >>> existing > >>>>>> would be better. > >>>>>> > >>>>>> Peter > >>>>>> > >>>>>> Peter Binkley > >>>>>> Digital Initiatives Technology Librarian Information Technology > >>>>>> Services 4-30 Cameron Library University of Alberta Libraries > >>>>>> Edmonton, Alberta Canada T6G 2J8 > >>>>>> Phone: (780) 492-3743 > >>>>>> Fax: (780) 492-9243 > >>>>>> e-mail: [EMAIL PROTECTED] > >>>>> > >>>>> > >>> > >>> > -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XMLconsulting, training and solutions
Re: Analyzers and Tokenizers?
On Tue, 2007-02-06 at 17:27 +0100, rubdabadub wrote: > Hi: > > Are there more filters/tokenizers then the ones mentioned here..? > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > I have found in the example/schema.xml, which are new ... > > sortMissingLast="true" omitNorms="true"> > > > more > > > Is there any complete list somewhere ..or how can I find more info about them? http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/analysis/ HTH salu2 > > Kind regards, -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XML consulting, training and solutions
Re: crawler feed?
On Wed, 2007-02-07 at 11:09 +0100, rubdabadub wrote: > Hi: > > Are there relatively stand-alone crawler that are > suitable/customizable for Solr? has anyone done any trials.. I have > seen some discussion about coocon crawler.. was that successfull? http://wiki.apache.org/solr/SolrForrest I am using this approach in a custom project that is cocoon based and is working very fine. However cocoons crawler is not standalone but using the cocoon cli. I am using the solr/forrest plugin for the commit and dispatching the update. The indexing transformation in the plugin is a wee bit different then the one in my project since I needed to extract more information from the documents to create better filters. However since the cocoon cli is not anymore in 2.2 (cocoon-trunk) and forrest uses this as its main component, I am keen to write a simple crawler that could be reused for cocoon, forrest, solr, nutch, ... I may will start something pretty soon (I guess I will open a project in Apache Labs) and will keep this list informed. My idea is to write simple crawler which could be easily extended by plugins. So if a project/app needs special processing for a crawled url one could write a plugin to implement the functionality. A solr plugin for this crawler would be very simple, basically it would parse the e.g. html page and dispatches an update command for the extracted fields. I think one should try to reuse much code from nutch as possible for this parsing. If somebody is interested in such a standalone crawler project, I welcome any help, ideas, suggestion, feedback and/or questions. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XML consulting, training and solutions
Re: crawler feed?
On Wed, 2007-02-07 at 18:03 +0200, Sami Siren wrote: > rubdabadub wrote: > > Hi: > > > > Are there relatively stand-alone crawler that are > > suitable/customizable for Solr? has anyone done any trials.. I have > > seen some discussion about coocon crawler.. was that successfull? > > There's also integration path available for Nutch[1] that i plan to > integrate after 0.9.0 is out. sounds very nice, I just finished to read. Thanks. Today a submitted a proposal for an Apache Labs project called Apache Druids. http://mail-archives.apache.org/mod_mbox/labs-labs/200702.mbox/browser Basic idea is to create a flexible crawler framework. The core should be a simple crawler which could be easily expended by plugins. So if a project/app needs special processing for a crawled url one could write a plugin to implement the functionality. salu2 > > -- > Sami Siren > > [1]http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XMLconsulting, training and solutions
[Droids] Re: crawler feed?
On Thu, 2007-02-08 at 14:40 +0100, rubdabadub wrote: > Thorsten: > > First of all I read your lab idea with great interest as I am in need > of such crawler. However there are certain things that I like to > discuss. I am not sure what forum will be appropriate for this but I > will do my idea shooting here first then please tell me where should I > post further comments. Since it is not an official lab project yet, I am unsure myself, but I think we should discuss details on [EMAIL PROTECTED] Please reply to to the labs ml. > > A vertical search engine that will focus on a specific set of data i.e > use solr for example cos it provides the maximum field flexibility > would greatly benefit from such crawler. i.e the next big technorati > or the next big event finding solution can use your crawler to crawl > feeds using a feed-plugin (maybe nutch plugins) or scrape websites for > event info using some x-path/xquery stuff (personally I think xpath is > a pain in the a... :-) This like you pointed out are surely some use cases for the crawler in combination with plugins. Another is the wget like crawl that application can use to export a static site (e.g. CMS, etc.). > > What I worry about is those issue that has to deal with > > - updating crawls Actually if you only see the crawl there is no differences between updating or any other crawl. > - how many threads per host should be configurable. > - scale etc. you mean a crawl cluster? > > All the maintainers headaches! That is why droids is a labs proposal. http://labs.apache.org/bylaws.html All apache committer have write access and when a lab is promoted, the files are moved over to the incubation area. > I know you will use as much code as > you can from Nutch plus are not planning to re-invent the wheel. But > wouldn't be much easier to jump into Sami's idea and make it better > and more stand-alone and still benefit from the Nutch community? I will start a thread on nutch dev and see whether or not it is possible to extract the crawler from the core, but the main idea is to keep droids simple. Imaging something like the following pseudo code: public void crawl(String url) { // resolving the stream InputStream stream = new URL(url).openStream(); // Lookup plugins that is registered for the stream Plugin plugin = lookupPlugin(stream); // extract links // link pattern matcher Links[] links = plugin.extractLinks(stream); // Match patterns plugins for storing/excluding links links = plugin.handleLinks(links); // pass the stream to the plugin for further processing plugin.main(stream); } > I > wonder wouldn't it be easy to push/purse a route where nutch crawler > becomes a standalone crawler? no? I read a post about it on the list. > Can you provide some links to get some background information? TIA. > I would like to hear more about how your plan will evolve in terms of > druid and why not join forces with Sami and co.? I am more familiar with solr then nutch I have to admit. Like said all committer have write access on droids and everybody is welcome to join the effort. Who knows maybe the first droid is a standalone nutch crawler with plugin extension points if some nutch committer joins the lab. Thanks rubdabadub for your feedback. salu2 > > Regards > > On 2/7/07, Thorsten Scherler <[EMAIL PROTECTED]> wrote: > > On Wed, 2007-02-07 at 18:03 +0200, Sami Siren wrote: > > > rubdabadub wrote: > > > > Hi: > > > > > > > > Are there relatively stand-alone crawler that are > > > > suitable/customizable for Solr? has anyone done any trials.. I have > > > > seen some discussion about coocon crawler.. was that successfull? > > > > > > There's also integration path available for Nutch[1] that i plan to > > > integrate after 0.9.0 is out. > > > > sounds very nice, I just finished to read. Thanks. > > > > Today a submitted a proposal for an Apache Labs project called Apache > > Druids. > > > > http://mail-archives.apache.org/mod_mbox/labs-labs/200702.mbox/browser > > > > Basic idea is to create a flexible crawler framework. The core should be > > a simple crawler which could be easily expended by plugins. So if a > > project/app needs special processing for a crawled url one could write a > > plugin to implement the functionality. > > > > salu2 > > > > > > > > -- > > > Sami Siren > > > > > > [1]http://blog.foofactory.fi/2007/02/online-indexing-integrating-nutch-with.html > > -- > > Thorsten Scherler thorsten.at.apache.org > > Open Source Java & XMLconsulting, training and solutions > > > > -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XMLconsulting, training and solutions
RE: Using cocoon to update index
On Mon, 2007-03-26 at 09:30 -0400, Winona Salesky wrote: > Thanks Chris, I'll take another look at the forest plugin. Have a look as well at http://wiki.apache.org/solr/SolrForrest it points out the cocoon components. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XMLconsulting, training and solutions
Re: SolrSearchGenerator for Cocoon (2.1)
On Tue, 2007-03-27 at 10:53 -0400, [EMAIL PROTECTED] wrote: > Hi, > > I looked at the SolrSearchGenerator (this is the part which is of interest to > me), but I could not get it work for Cocoon 2.1 yet. > > It seems that the there is no getParameters method for the > org.apache.cocoon.environment interface: > http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/environment/Request.html > I guess you using the getParameterNames and getParameter methods instead > should > do the trick. > > Or am I missing something? No, you are right the "getParameters" is cocoon-trunk specific. I just changed the code to be cocoon-2.1.x compatible. http://svn.apache.org/viewvc?view=rev&rev=523081 Thanks for the feedback Mirko. Now in cocoon-2.1.x to use the plugin in your custom project please do the following 1) svn co http://svn.apache.org/repos/asf/forrest/trunk forrest (this checkout is our $FORRESST_HOME) 2) cd $FORRESST_HOME/main; ./build.sh 3) cd $FORRESST_HOME/whiteboard/plugins/org.apache.forrest.plugin.output.solr 4) $FORRESST_HOME/tools/ant/bin/ant local-deploy 5) cp \ $FORRESST_HOME/whiteboard/plugins/org.apache.forrest.plugin.output.solr/build/org.apache.forrest.plugin.output.solr.jar $cocoon-2.1.x_webapp/WEB-INF/lib/ >From there you can use the cocoon components as usual in your project. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java & XMLconsulting, training and solutions
Re: Solr logo poll
B Graffiti style. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
[Standings] Solr logo poll
Hi all, I did a small count till now we have: a) 21 b) 13 salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: unsubscribe
On Thu, 2007-05-10 at 10:05 +0100, Kainth, Sachin wrote: > unsubscribe Hi Sachin, you need to send to a different mailing address: [EMAIL PROTECTED] HTH salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Packaging solr for Debian: using debian-supplied lucene-*.jar
On Sun, 2007-06-03 at 09:55 +0200, Jan-Pascal van Best wrote: > Hi all, > > I'm working on packaging Solr for Debian. Very nice. :) Since this is a developer topic I think this topic should be discussed on our dev list. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
RE: storing the document URI in the index
On Tue, 2007-06-12 at 16:33 +0200, Ard Schrijvers wrote: > Thanks Yonik and Walter, > > putting it that way, it does make good sense to not store the transient xml > file which it is most of the usecases (I was thinking differently because I > do have xml files on file system or over http, like from a webdav call) > > Anyway, thx for all answers, and again, sry for mails not indenting properly > at the moment, it irritates me as well :-) > > Regards Ard Hi Ard, you may want to have a look at http://wiki.apache.org/solr/SolrForrest salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions