Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory
Hi , Product : Solr (Embedded)Version : 1.2 Problem Description : While trying to add and search over the index, we are stumbling on this error again and again. Do note that the SolrCore is committed and closed suitably in our Embedded Solr. Error (StackTrace) : Sep 19, 2007 9:41:41 AM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.io.FileNotFoundException: no segments* file found in org.apache.lucene.store.FSDirectory@/data/pub/index: files: at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run( SegmentInfos.java:516) at org.apache.lucene.index.IndexReader.open(IndexReader.java:185) at org.apache.lucene.index.IndexReader.open(IndexReader.java:148) at org.apache.solr.search.SolrIndexSearcher.( SolrIndexSearcher.java:87) at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122) at com.serendio.diskoverer.core.entextor.CreateSolrIndex.( CreateSolrIndex.java:70) at org.apache.jsp.AddToPubIndex_jsp._jspService (AddToPubIndex_jsp.java:57) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java :70) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.jasper.servlet.JspServletWrapper.service( JspServletWrapper.java:393) at org.apache.jasper.servlet.JspServlet.serviceJspFile( JspServlet.java:320) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266) at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:175) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:263) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:584) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run( JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Extra Information : /data/pub is the Solr Home. /data/pub/index contains the index. CreateSolrIndex.java is our program that creates and searches over the index Regards, Venkat --
multithread update client causes exceptions and dropped documents
TestJettyLargeVolume.java Description: Binary data we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large problem. we're creating small documents with an id and one field of 1 term only submitting them in batches of 200 with commits every 5000 docs. when we run the client with 1 thread everything is fine. when we run it win >1 threads things go south (stack trace is below). i've attached the junit test which shows the problem. this happens on both a mac and a pc and when running solr in both jetty and tomcat. i'll create a junit issue if necessary but i thought i'd see if anyone else had run into this problem first. (output from junit test)Started thread: 0Started thread: 1org.apache.solr.common.SolrException: Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjeCurrent_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjerequest: http://localhost:8983/solr/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:230) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:199) at org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:46) at org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:61) at org.apache.solr.client.solrj.embedded.TestJettyLargeVolume$DocThread.run(TestJettyLargeVolume.java:69)Exception in thread "
Index/Update Problems with Solrj/Tomcat and Larger Files
I am using Tomcat 6 and Solr 1.2 on a Windows 2003 server using the following java code. I am trying to index pdf files, and I'm constantly getting errors on larger files (the same ones). SolrServer server = new CommonsHttpSolrServer(solrPostUrl); SolrInputDocument addDoc = new SolrInputDocument(); addDoc.addField("url", url); addDoc.addField("site", site); addDoc.addField("author", author); addDoc.addField("title", title); addDoc.addField("subject", subject); addDoc.addField("keywords", keywords); addDoc.addField("text", docText); UpdateRequest ur = new UpdateRequest(); ur.setAction( UpdateRequest.ACTION.COMMIT, false, false ); //Auto Commits on Update... ur.add(addDoc); UpdateResponse rsp = ur.process(server); The java error I received is: class org.apache.solr.client.solrj.SolrServerException (java.net.SocketException: Software caused connection abort: recv failed) Tomcat Log: SEVERE: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.ja va:716) at org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRe ad(InternalInputBuffer.java:746) at org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInpu tFilter.java:116) at org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer. java:675) at org.apache.coyote.Request.doRead(Request.java:428) at org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java :297) at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:405) at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:312) at org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.j ava:193) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158) at java.io.InputStreamReader.read(InputStreamReader.java:167) at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972) at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026) at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384) at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequest Handler.java:332) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH andler.java:162) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd ateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 63) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 4) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:584) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) This happens when I try to index a field containing the contents of the PDF file. It's string length is 189002. If I only do a substring on the field of say length 15, it usually will work. Does anyone have any idea on why this might be happening? I have had this and other files in
Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory
What files are there in your /data/pub/index directory? Bill On 9/19/07, Venkatraman S <[EMAIL PROTECTED]> wrote: > > Hi , > > Product : Solr (Embedded)Version : 1.2 > > Problem Description : > While trying to add and search over the index, we are stumbling on this > error again and again. > Do note that the SolrCore is committed and closed suitably in our Embedded > Solr. > > Error (StackTrace) : > Sep 19, 2007 9:41:41 AM org.apache.catalina.core.StandardWrapperValveinvoke > SEVERE: Servlet.service() for servlet jsp threw exception > java.io.FileNotFoundException: no segments* file found in > org.apache.lucene.store.FSDirectory@/data/pub/index: files: > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run( > SegmentInfos.java:516) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:185) > at org.apache.lucene.index.IndexReader.open(IndexReader.java:148) > at org.apache.solr.search.SolrIndexSearcher.( > SolrIndexSearcher.java:87) > at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122) > at com.serendio.diskoverer.core.entextor.CreateSolrIndex.( > CreateSolrIndex.java:70) > at org.apache.jsp.AddToPubIndex_jsp._jspService > (AddToPubIndex_jsp.java:57) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java > :70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > at org.apache.jasper.servlet.JspServletWrapper.service( > JspServletWrapper.java:393) > at org.apache.jasper.servlet.JspServlet.serviceJspFile( > JspServlet.java:320) > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java > :266) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( > ApplicationFilterChain.java:290) > at org.apache.catalina.core.ApplicationFilterChain.doFilter( > ApplicationFilterChain.java:206) > at org.apache.catalina.core.StandardWrapperValve.invoke( > StandardWrapperValve.java:233) > at org.apache.catalina.core.StandardContextValve.invoke( > StandardContextValve.java:175) > at org.apache.catalina.core.StandardHostValve.invoke( > StandardHostValve.java:128) > at org.apache.catalina.valves.ErrorReportValve.invoke( > ErrorReportValve.java:102) > at org.apache.catalina.core.StandardEngineValve.invoke( > StandardEngineValve.java:109) > at org.apache.catalina.connector.CoyoteAdapter.service( > CoyoteAdapter.java:263) > at org.apache.coyote.http11.Http11Processor.process( > Http11Processor.java:844) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( > Http11Protocol.java:584) > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run( > JIoEndpoint.java:447) > at java.lang.Thread.run(Thread.java:619) > > Extra Information : > /data/pub is the Solr Home. > /data/pub/index contains the index. > CreateSolrIndex.java is our program that creates and searches over the > index > > Regards, > Venkat > -- >
Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory
Quite inetersting actually (this is for 5 documents that were indexed) : _0.fdt _0.prx _1.fnm _1.tis _2.nrm _3.fdx _3.tii _4.frq segments.gen _0.fdx _0.tii _1.frq _2.fdt _2.prx _3.fnm _3.tis _4.nrm segments_6 _0.fnm _0.tis _1.nrm _2.fdx _2.tii _3.frq _4.fdt _4.prx _0.frq _1.fdt _1.prx _2.fnm _2.tis _3.nrm _4.fdx _4.tii _0.nrm _1.fdx _1.tii _2.frq _3.fdt _3.prx _4.fnm _4.tis On 9/19/07, Bill Au <[EMAIL PROTECTED]> wrote: > > What files are there in your /data/pub/index directory? > > Bill > > On 9/19/07, Venkatraman S <[EMAIL PROTECTED]> wrote: > > > > Hi , > > > > Product : Solr (Embedded)Version : 1.2 > > > > Problem Description : > > While trying to add and search over the index, we are stumbling on this > > error again and again. > > Do note that the SolrCore is committed and closed suitably in our > Embedded > > Solr. > > > > Error (StackTrace) : > > Sep 19, 2007 9:41:41 AM > org.apache.catalina.core.StandardWrapperValveinvoke > > SEVERE: Servlet.service() for servlet jsp threw exception > > java.io.FileNotFoundException: no segments* file found in > > org.apache.lucene.store.FSDirectory@/data/pub/index: files: > > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run( > > SegmentInfos.java:516) > > at org.apache.lucene.index.IndexReader.open(IndexReader.java > :185) > > at org.apache.lucene.index.IndexReader.open(IndexReader.java > :148) > > at org.apache.solr.search.SolrIndexSearcher.( > > SolrIndexSearcher.java:87) > > at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122) > > at com.serendio.diskoverer.core.entextor.CreateSolrIndex.( > > CreateSolrIndex.java:70) > > at org.apache.jsp.AddToPubIndex_jsp._jspService > > (AddToPubIndex_jsp.java:57) > > at org.apache.jasper.runtime.HttpJspBase.service( > HttpJspBase.java > > :70) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > at org.apache.jasper.servlet.JspServletWrapper.service( > > JspServletWrapper.java:393) > > at org.apache.jasper.servlet.JspServlet.serviceJspFile( > > JspServlet.java:320) > > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java > > :266) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) > > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( > > ApplicationFilterChain.java:290) > > at org.apache.catalina.core.ApplicationFilterChain.doFilter( > > ApplicationFilterChain.java:206) > > at org.apache.catalina.core.StandardWrapperValve.invoke( > > StandardWrapperValve.java:233) > > at org.apache.catalina.core.StandardContextValve.invoke( > > StandardContextValve.java:175) > > at org.apache.catalina.core.StandardHostValve.invoke( > > StandardHostValve.java:128) > > at org.apache.catalina.valves.ErrorReportValve.invoke( > > ErrorReportValve.java:102) > > at org.apache.catalina.core.StandardEngineValve.invoke( > > StandardEngineValve.java:109) > > at org.apache.catalina.connector.CoyoteAdapter.service( > > CoyoteAdapter.java:263) > > at org.apache.coyote.http11.Http11Processor.process( > > Http11Processor.java:844) > > at > > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( > > Http11Protocol.java:584) > > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run( > > JIoEndpoint.java:447) > > at java.lang.Thread.run(Thread.java:619) > > > > Extra Information : > > /data/pub is the Solr Home. > > /data/pub/index contains the index. > > CreateSolrIndex.java is our program that creates and searches over the > > index > > > > Regards, > > Venkat > > -- > > > --
Re: multithread update client causes exceptions and dropped documents
one other note. the errors pop up when running against the 1.3 trunk but do not appear to happen when run against 1.2. - will On 9/19/07, Will Johnson <[EMAIL PROTECTED]> wrote: > > > > > > we were doing some performance testing for the updating aspects of solr and > ran into what seems to be a large problem. we're creating small documents > with an id and one field of 1 term only submitting them in batches of 200 > with commits every 5000 docs. when we run the client with 1 thread > everything is fine. when we run it win >1 threads things go south (stack > trace is below). i've attached the junit test which shows the problem. > this happens on both a mac and a pc and when running solr in both jetty and > tomcat. i'll create a junit issue if necessary but i thought i'd see if > anyone else had run into this problem first. > > (output from junit test) > Started thread: 0 > Started thread: 1 > org.apache.solr.common.SolrException: > Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje > > Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje > > request: > http://localhost:8983/solr/update?wt=xml&version=2.2 > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:230) > at > org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:199) > at > org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:46) > at > or
Re: multithread update client causes exceptions and dropped documents
Can you start a JIRA issue and attach the patch? I have not seen this happen, but I bet it is caused by something from: https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.ext.subversion:subversion-commits-tabpanel Can we add that test to trunk? By default it does not need to be a long running test, but its nice to have in there so we can twiddle it for specific testing. thanks ryan Will Johnson wrote: one other note. the errors pop up when running against the 1.3 trunk but do not appear to happen when run against 1.2. - will On 9/19/07, Will Johnson <[EMAIL PROTECTED]> wrote: we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large problem. we're creating small documents with an id and one field of 1 term only submitting them in batches of 200 with commits every 5000 docs. when we run the client with 1 thread everything is fine. when we run it win >1 threads things go south (stack trace is below). i've attached the junit test which shows the problem. this happens on both a mac and a pc and when running solr in both jetty and tomcat. i'll create a junit issue if necessary but i thought i'd see if anyone else had run into this problem first. (output from junit test) Started thread: 0 Started thread: 1 org.apache.solr.common.SolrException: Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandl erhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandl erhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpPa
Re: How can i make a distribute search on Solr?
On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley <[EMAIL PROTECTED]> wrote: > Stu is referring to Federated Search - where each index has some of the > data and results are combined before they are returned. This is not yet > supported out of the "box" Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? cheers, B _ {Beto|Norberto|Numard} Meijome "With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead." [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On 9/19/07, Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Wed, 19 Sep 2007 01:46:53 -0400 > Ryan McKinley <[EMAIL PROTECTED]> wrote: > > > Stu is referring to Federated Search - where each index has some of the It really should be Distributed Search I think (my mistake... I started out calling it Federated). I think Federated search is more about combining search results from different data sources. > > data and results are combined before they are returned. This is not yet > > supported out of the "box" > > Maybe this is related. How does this compare to the map-reduce functionality > in Nutch/Hadoop ? map-reduce is more for batch jobs. Nutch only uses map-reduce for parallel indexing, not searching. -Yonik
Re: Index/Update Problems with Solrj/Tomcat and Larger Files
I have had this and other files index correctly using a different combination version of Tomcat/Solr without any problem (using similar code, I re-wrote it because I thought it would be better to use Solrj). I get the same error whether I use a simple StringBuilder to created the add manually or if I use Solrj. I have manually encoded each field before passing it in to the add function as well, so I don't believe it is a content problem. I have tried to change every setting in Tomcat and Solr that I can think of, but I'm newer to both of them. So it works if you build an XML file with the same content and send it to the server using the example post.sh/post.jar tool? Have you tried messing with the connection settings? SolrServer server = new CommonsHttpSolrServer( url ); ((CommonsHttpSolrServer)server).setConnectionTimeout(5); ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); a timeout of 5ms is probably too short... ryan
Re: How can i make a distribute search on So lr?
Nutch implements federated search separately from their index generation. My understanding is that MapReduce jobs generate the indexes (Nutch calls them segments) from raw data that has been downloaded, and then makes them available to be searched via remote procedure calls. Queries never pass through MapReduce in any shape or form, only the raw data and indexes. If you take a look at the "org.apache.nutch.searcher.DistributedSearch" class, specifically the #Client.search method, you can see how they handle the actual federation of results. Thanks, Stu -Original Message- From: Norberto Meijome Sent: Wednesday, September 19, 2007 10:23am To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley wrote: > Stu is referring to Federated Search - where each index has some of the > data and results are combined before they are returned. This is not yet > supported out of the "box" Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? cheers, B _ {Beto|Norberto|Numard} Meijome "With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead." [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
Jarvis wrote: > Thanks for your reply, > I need the Federated Search. You mean this is not yet > supported out of the "box". So I have a question that > in this situation what can Collection Distribution used for? > The collection distribution scripts help you get duplicate copies of the same index distributed across many computers. This lets you put a load balancer in front of each server and lets you share the load across N servers. The collection distribution scripts are particularly useful since NFS and lucene don't play well together. ryan
RE: Index/Update Problems with Solrj/Tomcat and Larger Files
I have tried changing those settings, for example, as: SolrServer server = new CommonsHttpSolrServer(solrPostUrl); ((CommonsHttpSolrServer)server).setConnectionTimeout(60); ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); However, still no luck. I took the SimplePostTool.java file from the wiki, changed the URL, compiled it and ran it with the output of the command: UpdateRequest ur = new UpdateRequest(); ur.add(addDoc); String xml = ur.getXML(); This works. It seems that it must be a communication setting, but I'm stumped. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 10:31 AM To: solr-user@lucene.apache.org Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files > > I have had this and other files index correctly using a different > combination version of Tomcat/Solr without any problem (using similar > code, I re-wrote it because I thought it would be better to use Solrj). > I get the same error whether I use a simple StringBuilder to created the > add manually or if I use Solrj. I have manually encoded each field > before passing it in to the add function as well, so I don't believe it > is a content problem. I have tried to change every setting in Tomcat > and Solr that I can think of, but I'm newer to both of them. > So it works if you build an XML file with the same content and send it to the server using the example post.sh/post.jar tool? Have you tried messing with the connection settings? SolrServer server = new CommonsHttpSolrServer( url ); ((CommonsHttpSolrServer)server).setConnectionTimeout(5); ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); a timeout of 5ms is probably too short... ryan
Re: Index/Update Problems with Solrj/Tomcat and Larger Files
Daley, Kristopher M. wrote: I have tried changing those settings, for example, as: SolrServer server = new CommonsHttpSolrServer(solrPostUrl); ((CommonsHttpSolrServer)server).setConnectionTimeout(60); ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); However, still no luck. Have you tried anything larger then 60? 60ms is not long... try 1 (10s) and see if it works.
RE: Index/Update Problems with Solrj/Tomcat and Larger Files
I tried 1 and 6, same result. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 11:18 AM To: solr-user@lucene.apache.org Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files Daley, Kristopher M. wrote: > I have tried changing those settings, for example, as: > > SolrServer server = new CommonsHttpSolrServer(solrPostUrl); > ((CommonsHttpSolrServer)server).setConnectionTimeout(60); > ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); > ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); > > However, still no luck. > Have you tried anything larger then 60? 60ms is not long... try 1 (10s) and see if it works.
Re: Index/Update Problems with Solrj/Tomcat and Larger Files
I'm stabbing in the dark here, but try fiddling with some of the other connection settings: getConnectionManager().getParams().setSendBufferSize( big ); getConnectionManager().getParams().setReceiveBufferSize( big ); http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apache/commons/httpclient/params/HttpConnectionManagerParams.html Daley, Kristopher M. wrote: I tried 1 and 6, same result. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 11:18 AM To: solr-user@lucene.apache.org Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files Daley, Kristopher M. wrote: I have tried changing those settings, for example, as: SolrServer server = new CommonsHttpSolrServer(solrPostUrl); ((CommonsHttpSolrServer)server).setConnectionTimeout(60); ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); However, still no luck. Have you tried anything larger then 60? 60ms is not long... try 1 (10s) and see if it works.
RE: Index/Update Problems with Solrj/Tomcat and Larger Files
Ok, I'll try to play with those. Any suggestion on the size? Something else that is very interesting is that I just tried to do an aggregate add of a bunch of docs, including the one that always returned the error. I called a function to create a SolrInputDocument and return it. I then did the following: Collection docs = new ArrayList(); SolrServer server = new CommonsHttpSolrServer(solrPostUrl); UpdateRequest ur = new UpdateRequest(); ur.setAction( UpdateRequest.ACTION.COMMIT, false, false ); //Auto Commits on Update... ur.add(docs); UpdateResponse rsp = ur.process(server); In doing this, the program simply hangs after the last command. If I let it sit there for an amount of time, it eventually returns with the error: class org.apache.solr.client.solrj.SolrServerException (java.net.SocketException: Connection reset by peer: socket write error) However, if I go to the tomcat server and restart it after I have issued the process command, the program returns and the documents are all posted correctly! Very strange behavioram I somehow not closing the connection properly? -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 11:49 AM To: solr-user@lucene.apache.org Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files I'm stabbing in the dark here, but try fiddling with some of the other connection settings: getConnectionManager().getParams().setSendBufferSize( big ); getConnectionManager().getParams().setReceiveBufferSize( big ); http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apac he/commons/httpclient/params/HttpConnectionManagerParams.html Daley, Kristopher M. wrote: > I tried 1 and 6, same result. > > -Original Message- > From: Ryan McKinley [mailto:[EMAIL PROTECTED] > Sent: Wednesday, September 19, 2007 11:18 AM > To: solr-user@lucene.apache.org > Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files > > Daley, Kristopher M. wrote: >> I have tried changing those settings, for example, as: >> >> SolrServer server = new CommonsHttpSolrServer(solrPostUrl); >> ((CommonsHttpSolrServer)server).setConnectionTimeout(60); >> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100); >> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100); >> >> However, still no luck. >> > > Have you tried anything larger then 60? 60ms is not long... > > try 1 (10s) and see if it works. > >
Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets
Hi We want to (mis)use facet search to get the number of (unique) field values appearing in a document resultset. I thought facet search perfect for this, because it already gives me all the (unique) field values. But for us to be used for this special problem, we don't want all the values listed in response as there might be over 1 and we don't need the values at all, just the count of how many! I looked at http://wiki.apache.org/solr/SimpleFacetParameters and hoped to find a parameter like facet.sizeOnly = true (or facet.showSize=true , combined with facet.limit=1 or other small value) Would you accept a patch with such a feature ? It should probably be relatively easy, though not sure if fits into the concept of facets.. I looked at the code, maybe add an extra Value to returned NamedList of getFacetCounts() in SimpleFacets ?! ps: Other user having same request AFAIU : http://www.nabble.com/showing--range-facet-example-%3D-by-Range-%28-1-to-1000-%29-t3660704.html#a10229069 thanks, Laurent Hoss
Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets
But for us to be used for this special problem, we don't want all the values listed in response as there might be over 1 and we don't need the values at all, just the count of how many! check the LukeReqeustHandler http://wiki.apache.org/solr/LukeRequestHandler It gives you lots of field based stats.
RE: Triggering snapshooter through web admin interface
Is there a ticket for this yet? I have a bug report and request: I just did a snapshot while indexing 700 records/sec. and got an inconsistency. I was tarring off the snapshot and tar reported that a file changed while it was being copied. The error rolled off my screen, so I cannot report the file name or extension. If a solr command to do a snapshot is implemented, please make sure that it is 100% consistent. Thanks, Lance Norskog -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 18, 2007 11:11 AM To: solr-user@lucene.apache.org Subject: RE: Triggering snapshooter through web admin interface : [Wu, Daniel] That sounds great. Do I need to create a JIRA ticket? Sure, JIRA is a great way to track feature requests (since they can be "watched" and "voted" on, and if you want to start on an implementation you can attach patches... http://wiki.apache.org/solr/HowToContribute -Hoss
"Select distinct" in Solr
I believe I saw in the Javadocs for Lucene that there is the ability to return the unique values for one field for a search, rather than each record. Is it possible to add this feature to Solr? It is the equivalent of 'select distinct' in SQL. Thanks, Lance Norskog
Re: "Select distinct" in Solr
Lance Norskog wrote: I believe I saw in the Javadocs for Lucene that there is the ability to return the unique values for one field for a search, rather than each record. Is it possible to add this feature to Solr? It is the equivalent of 'select distinct' in SQL. Look into faceting: http://wiki.apache.org/solr/SimpleFacetParameters or maybe the Luke request handler: http://wiki.apache.org/solr/LukeRequestHandler ryan
useColdSearcher = false... not working in 1.2?
Anyone else using this, and finding it not working in Solr 1.2? Since we've got an automated release process, I really need to be able to have the appserver not see itself as done warming up until the firstSearcher is ready to go... but with 1.2 this no longer seems to be the case. adam
Re: useColdSearcher = false... not working in 1.2?
On 9/19/07, Adam Goldband <[EMAIL PROTECTED]> wrote: > Anyone else using this, and finding it not working in Solr 1.2? Since > we've got an automated release process, I really need to be able to have > the appserver not see itself as done warming up until the firstSearcher > is ready to go... but with 1.2 this no longer seems to be the case. I took a quick peek at the code, and it should still work (it's pretty simple). false is also the default. How are you determining that it isn't working? -Yonik
Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets
On 9/19/07, Laurent Hoss <[EMAIL PROTECTED]> wrote: > We want to (mis)use facet search to get the number of (unique) field > values appearing in a document resultset. We have paging of facets, so just like normal search results, it does make sense to list the total number of facets matching. The main problem with implementing this is trying to figure out where to put the info in a backward compatible manner. Here is how the info is currently returned (JSON format): "facet_fields":{ "cat":[ "camera",1, "card",2, "connector",2, "copier",1, "drive",2 ] }, Unfortunately, there's not a good place to put this extra info without older clients choking on it. Within "cat" there should have been another element called "values" or something... then we could easily add extra fields like "nvalues": "cat":{ "nvalues":5042, "values":[ "camera",1, "card",2, "connector",2, "copier",1, "drive",2 ] } -Yonik
Exact phrase highlighting
Hi out of there, I just walked through the mailing list archive, but I did not find an appropriate answer for phrase highlighting. I do not have any highlighting section (and no dismax handler definition) in solrconfig.xml. This way (AFAIK :-)), the standard lucene query syntax should be supported in it's full functionality. But, in this case double quoting the search expressions does not have any effect on highlighting, i.e. Assume we have the following text (of field type text) It is hard work to do the hard complex work A query for "hard work" (with the double quotes) results the highlighted section It is hard work to do the hard complex work Although I would guess that the correct answer should be It is hard work to do the hard complex work Does anyone of the SOLR experts have a good answer for me? (I guess that I still did not understand the functional relationship between highlighting, query specification and index specification...) Thanks for your help marc
Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory
: Product : Solr (Embedded)Version : 1.2 : java.io.FileNotFoundException: no segments* file found in : org.apache.lucene.store.FSDirectory@/data/pub/index: files: According to that, the FSDirectory was empty when it ws opened (a file list is suppose to come after that "files: " part) you imply that you are building your index using embedded solr, but based on your stack trace it seems you are using Solr in a servlet container ... i assume to search the index you've already built? Is the embeddd core completley closed before your servlet cotainer running Solr is started? what does hte directly list look like in between the finish of A and the start of B? -Hoss
Re: Exact phrase highlighting
On 19-Sep-07, at 1:12 PM, Marc Bechler wrote: Hi out of there, I just walked through the mailing list archive, but I did not find an appropriate answer for phrase highlighting. I do not have any highlighting section (and no dismax handler definition) in solrconfig.xml. This way (AFAIK :-)), the standard lucene query syntax should be supported in it's full functionality. But, in this case double quoting the search expressions does not have any effect on highlighting, i.e. Assume we have the following text (of field type text) It is hard work to do the hard complex work A query for "hard work" (with the double quotes) results the highlighted section It is hard work to do the hard complex work Although I would guess that the correct answer should be It is hard work to do the hard complex work Does anyone of the SOLR experts have a good answer for me? (I guess that I still did not understand the functional relationship between highlighting, query specification and index specification...) It currently is not supported by Solr. There is work in lucene that supports this (see https://issues.apache.org/jira/browse/LUCENE-794? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12526803), but it is currently not integrated. It would make a great project to get one's hands dirty contributing, though :) -Mike
Re: DisMax queries referencing undefined fields
: I noticed that the "field list" (fl) parameter ignores field names that it : cannot locate, while the "query fields" (qf) parameter throws an exception : when fields cannot be located. Is there any way to override this behavior and : have qf also ignore fields it cannot find? Those parameters are radically different. FL isn't evaluated untill after a query is executed and it's time to return documents ... just because the current range of documents being returned doesn't have a value doesn't mean there is a problem with the FL -- other documents in the same DocSet might have those values. It's not that Solr ignores fields in the "fl" that it can't locate, as it is that Solr tests each field a document to be returned has, and only returns it if the field is in the FL. In theory, field names in the FL should be tested to see if a matching field or dynamic field exists that would match and generate a warning/error if it's not -- i would consider that an FL bug. The semantics of QF follow directly from the semantics of the standard query parser: if you tell it to query against a field which does not exist for any document, then something is wrong with the request. Unlike the FL case (which is lazy for not checking that the feild exists) dismax has to check each field because it needs to know how to analyze the input for every field in the QF -- if the field doesn't exist, it can't do that. : This would be pretty helpful for us, as we're going to have a large number of : dynamic, user-specific fields defined in our schema. These fields will have : canonical name formats (e.g. userid-comment), but they may not be defined for : every document. In fact, some fields may be defined for no documents, which I : gather would be the ones that would throw exceptions. It would be nice to : provide solr a set of fields that could be searched and have it use the subset : of those fields that exist. i supose it would be possible to make an option for dismax to ignore any field it can't find, but that would be fairly kludgy and would introduce some really confusing edge cases (ie: what happens if non of the fields in QF can be found) A better option would probably be to use something like this from the sample schema.xml... ...then any field name you want will work regardless of wether you are using dismax or hte standard request handler. Hmmm: it might be better though if the "ignored" field type was a TextField with an Analyzer that produced no tokens ... then it would drop out of the query completely ... anyone want to submit a "NoOpAnalyzer" :) -Hoss
analysis page and search not in sync - no result for "t-shirt"?
Hello, I have an issue, that "T-Shirt" is not found, even if there are documents with the title "T-Shirt". The analysis page shows that both the index-analyzer and the query-analyzer create "t" and "shirt" of this. However, when I search for "t-shirt", I don't find anything. The product title is copied to the field "text", this has the type "text", which is configured like this: What might be the reason for this? Thanx && cheers, Martin signature.asc Description: This is a digitally signed message part
Re: Exact phrase highlighting
Hi Mike, thanks for the quick response. > It would make a great project to get one's hands dirty contributing, though :) ... sounds like giving a broad hint ;-) Sounds challenging... Regards from Germany marc
Re: Exact phrase highlighting
On 19-Sep-07, at 2:39 PM, Marc Bechler wrote: Hi Mike, thanks for the quick response. > It would make a great project to get one's hands dirty contributing, though :) ... sounds like giving a broad hint ;-) Sounds challenging... I'm not sure about that--it is supposed to be a drop-in replacement for Highlighter. I expect most of the work will consist of figuring the right of way of packaging it in a jar for solr inclusion. -Mike
Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets
: The main problem with implementing this is trying to figure out where : to put the info in a backward compatible manner. Here is how the info 1) this seems like the kind of thing that would only be returend if requested -- so we probably don't have to be overly concerned about backwards compatibility. if people are going to request this information, they have to make their client code to look for it, so they can also make their client code to know how to distinguish it from the existing counts. 2) the counts themselves are "first order" faceting data, so i think it makes sense to leave them where they are ... "metadata" about the field could be included as an sub-list at the start of the field list (much like the missing count is included as an unnamed int at the end of the field list) this sub-list could be unnamed to help distinguish it from field term values -- but frankly i don't think that's a huge deal -- the counts will standa out for being integers, while this metadata would be a nested NamedList (aka: map, aka hash, aka however it's represented in the format used) structure could be something like... ...&facet.field=cat&facet.limit=3&facet.mincount=5&facet.missing=true 42 678 30 20 10 5 -Hoss
RE: Triggering snapshooter through web admin interface
lance: since the topic you are describing is not directly related to triggering a snapshot from the web interface can you please start a new thread with a unique subejct describing in more details exactly what it was you were doing and the problem you encountered? this will make it easier for your problem to get visibility (some people don't read every thread, and archive searching is frequently done by thread, so people looking for similar problems may not realize this new thread is burried inside an old one) -Hoss : Date: Wed, 19 Sep 2007 11:33:30 -0700 : From: Lance Norskog <[EMAIL PROTECTED]> : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: RE: Triggering snapshooter through web admin interface : : Is there a ticket for this yet? I have a bug report and request: I just did : a snapshot while indexing 700 records/sec. and got an inconsistency. I was : tarring off the snapshot and tar reported that a file changed while it was : being copied. The error rolled off my screen, so I cannot report the file : name or extension. : : If a solr command to do a snapshot is implemented, please make sure that it : is 100% consistent. : : Thanks, : : Lance Norskog
rsync start and enable for multiple solr instances within one tomcat
Hi, there, So we are using the Tomcat's JNDI method to set up multiple solr instances within a tomcat server. Each instance has a solr home directory. Now we want to set up collection distribution for all these solr home indexes. My understanding is: 1. we only need to run rsync-start once use the script under any of the solr home dirs. 2. we need to run each of the rsync-enable scripts under the solr home's bin dirs. 3. the twiki page at http://wiki.apache.org/solr/SolrCollectionDistributionScripts keeps refering to solr/xxx. Is this "solr" the example solr home dir? If so, would it be hard-coded in any of the scripts? For example, I saw in snappuller line 226 (solr 1.2): ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ ${data_dir}/${name}-wip Is the above "solr" a hard-coded solr home name? If so, it's not desirable since we have multiple solr homes with different names. If not, what is this "solr"? thanks, -Hui
Re: Index/Update Problems with Solrj/Tomcat and Larger Files
However, if I go to the tomcat server and restart it after I have issued the process command, the program returns and the documents are all posted correctly! Very strange behavioram I somehow not closing the connection properly? What version is the solr you are connecting to? 1.2 or 1.3-dev? (I have not tested against 1.2) Does this only happen with tomcat? If you run with jetty do you get the same behavior? (again, just stabs in the dark) If you can make a small repeatable problem, post it in JIRA and I'll look into it. ryan
setting absolute path for snapshooter in solrconfig.xml doesn't work
Hi, there, I used an absolute path for the "dir" param in the solrconfig.xml as below: snapshooter /var/SolrHome/solr/bin true arg1 arg2 MYVAR=val1 However, I got "snapshooter: not found" exception thrown in catalina.out. I don't see why this doesn't work. Anything I'm missing? Many thanks, -Hui catalina.out logs: = .. Sep 19, 2007 6:17:20 PM org.apache.solr.handler.XmlUpdateRequestHandlerupdate INFO: added id={SOLR1000} in 67ms Sep 19, 2007 6:17:20 PM org.apache.solr.core.SolrCore execute INFO: /update 0 86 Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2doDeletions INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2 Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2doDeletions INFO: DirectUpdateHandler2 docs deleted=0 Sep 19, 2007 6:17:21 PM org.apache.solr.core.SolrException log SEVERE: java.io.IOException: java.io.IOException: snapshooter: not found at java.lang.UNIXProcess.(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:451) at java.lang.Runtime.exec(Runtime.java:591) at org.apache.solr.core.RunExecutableListener.exec( RunExecutableListener.java:70) at org.apache.solr.core.RunExecutableListener.postCommit( RunExecutableListener.java:97) at org.apache.solr.update.UpdateHandler.callPostCommitCallbacks( UpdateHandler.java:99) at org.apache.solr.update.DirectUpdateHandler2.commit( DirectUpdateHandler2.java:514) at org.apache.solr.handler.XmlUpdateRequestHandler.update( XmlUpdateRequestHandler.java:214) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody (XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest( RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute( SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter( SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.valves.AccessLogValve.invoke( AccessLogValve.java:526) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:856) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection (Http11Protocol.java:7 44) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher INFO: Opening [EMAIL PROTECTED] main Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00 ,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming result for [EMAIL PROTECTED] main filterCache{lookups=0,hits=0,hitratio=0.00 ,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=0,cumulative_evictions=0} Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main queryResultCache{lookups=1,hits=0,hitratio=0.00 ,inserts=1,evictions=0,size=1,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio= 0.00,cumulative_inserts=1,cumulative_eviction
RE: How can i make a distribute search on Solr?
Nutch has two ways to make a distributed query - through HDFS(hadoop file system) or RPC call that is in "org.apache.nutch.searcher.DistributedSearch" class. But I think these are both not good enough. If we use HDFS to service the user's query. Stability is a problem. We must all do the crawl , index , query on HDFS and use mapreduce. Can we trust in hadoop all the time?:) If we use the RPC call in nutch . Manually separate the index is required . We will receive reduplicate result if there is reduplicate index document on different servers. And also the data updating and single server's error is hard to deal with. Thanks, Jarvis -Original Message- From: Stu Hood [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 10:37 PM To: solr-user@lucene.apache.org Subject: Re: How can i make a distribute search on Solr? Nutch implements federated search separately from their index generation. My understanding is that MapReduce jobs generate the indexes (Nutch calls them segments) from raw data that has been downloaded, and then makes them available to be searched via remote procedure calls. Queries never pass through MapReduce in any shape or form, only the raw data and indexes. If you take a look at the "org.apache.nutch.searcher.DistributedSearch" class, specifically the #Client.search method, you can see how they handle the actual federation of results. Thanks, Stu -Original Message- From: Norberto Meijome Sent: Wednesday, September 19, 2007 10:23am To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Wed, 19 Sep 2007 01:46:53 -0400 Ryan McKinley wrote: > Stu is referring to Federated Search - where each index has some of the > data and results are combined before they are returned. This is not yet > supported out of the "box" Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? cheers, B _ {Beto|Norberto|Numard} Meijome "With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead." [RFC1925 - section 2, subsection 3] I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Filter by Group
Hey all, Let's say I have an index of one hundred documents, and these documents are grouped into 4 groups A, B, C, and D. The groups do in fact overlap. What would people recommend as the best way to apply a search query and return only the documents that are in group A? Also, how about if we run the same search query but return only those documents in groups A, C and D? I imagine that I could do this by indexing a text field populated with the group names and adding something like "groups:A" to the query but I'm wondering if there's a better solution. Thanks in advance, Mark mark angelillo snooth inc. o: 646.723.4328 c: 484.437.9915 [EMAIL PROTECTED] snooth -- 1.7 million ratings and counting...
Re: How can i make a distribute search on Solr?
On Wed, 19 Sep 2007 10:29:54 -0400 "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > Maybe this is related. How does this compare to the map-reduce > > functionality in Nutch/Hadoop ? > > map-reduce is more for batch jobs. Nutch only uses map-reduce for > parallel indexing, not searching. I see... so in nutch all nodes have all the date indexed ? Thanks, _ {Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop "Imagination is more important than knowledge." Albert Einstein, On Science I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Term extraction
I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. I've been experimenting with MoreLikeThis and values returned by the "mlt.interestingTerms" parameter and so far this approach has worked well. However, I'd like to be able to analyze documents more intelligently to recognize phrase keywords such as "open source", "Microsoft Office", "Bill Gates" rather than splitting each word into separate tokens (the field is never used in search queries so matching is not an issue). I've been looking at SynonymFilterFactory as a possible solution to this problem but haven't been able to work out the specifics of how to configure it for phrase mappings. Has anybody else dealt with this problem before or able to offer any insights into achieve the desired results? Thanks in advance, Pieter
RE: How can i make a distribute search on Solr?
I think index data which stored in HDFS and generated by map-reduce function is used for searching in NUTCH-0.9 You can see the code in "org.apache.nutch.searcher.NutchBean" class . :) Jarvis -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Thursday, September 20, 2007 9:52 AM To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Wed, 19 Sep 2007 10:29:54 -0400 "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > > Maybe this is related. How does this compare to the map-reduce functionality in Nutch/Hadoop ? > > map-reduce is more for batch jobs. Nutch only uses map-reduce for > parallel indexing, not searching. I see... so in nutch all nodes have all the date indexed ? Thanks, _ {Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop "Imagination is more important than knowledge." Albert Einstein, On Science I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: setting absolute path for snapshooter in solrconfig.xml doesn't work
See this recent thread for some helpful info: http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792 You'll probably want to configure your exe with an absolute path rather than the dir: /var/SolrHome/solr/bin/snapshooter . In order to get the snapshooter working correctly. cheers, Piete On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > > Hi, there, > > I used an absolute path for the "dir" param in the solrconfig.xml as > below: > > > snapshooter > /var/SolrHome/solr/bin > true >arg1 arg2 >MYVAR=val1 > > > However, I got "snapshooter: not found" exception thrown in catalina.out. > I don't see why this doesn't work. Anything I'm missing? > > > Many thanks, > > -Hui >
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 09:37:51 +0800 "Jarvis" <[EMAIL PROTECTED]> wrote: > If we use the RPC call in nutch . Hi, I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this league to be suggesting architecture stuff :) but i imagine there's nothing wrong with using what they've built if it addresses solr's needs. > Manually separate the index is required . hmm i imagine this really depends on the application. In my case, this separation of which docs go where happens @ a completely different layer. > We will receive reduplicate result if there is reduplicate index document on > different servers. Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? eg, 1) get the job (a query) 2) map it to workers ( servers that provide search results from their own indexing) 3) wait for the results from all workers that reply within acceptable timeframe. 4) comb through the lot of results from all workers, reduce them according to your own biz rules (eg, remove dupes, sort them by quality / priority... here possibly relying on the original parameters of the query in 1) 5) return the reduced results to the frontend. > And also the data updating and single server's error is > hard to deal with. this really depends on your infrastructure + design. Having the indexing , searching and providing of results in different layers should make for some interesting design options... If each searcher (or wherever the index resides) is really a small cluster of servers , the issue of data safety / server error is addressed @ that point. You can also have repeated data across indexes (again, independent indexes) and that's a more ... randomised :) way of keeping the docs safe... For example, IIRC, googleFS keeps copies of each file in 3 servers or more... cheers, B _ {Beto|Norberto|Numard} Meijome "He uses statistics as a drunken man uses lamp-posts ... for support rather than illumination." Andrew Lang (1844-1912) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Term extraction
On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: I'm currently looking at methods of term extraction and automatic keyword generation from indexed documents. We do it manually (not in solr, but we put the results in solr.) We do it the usual way - chunk (into n-grams, named entities & noun phrases) and count (tf & df). It works well enough. There is a bevy of literature on the topic if you want to get "smart" -- but be warned smart and fast are likely not very good friends. A lot depends on the provenance of your data -- is it clean text that uses a lot of domain specific terms? Is it webtext?
Re: Filter by Group
Sounds like you're on the right track, if your groups overap (i.e. a document can be in group A and B), then you should ensure your "groups" field is multivalued. If you are searching for "foo" in documents contained in group "A", then it might be more efficient to use a filter query (fq) like: q=foo&fq=groups:A See the wiki page on common query parameters for more info: http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002 cheers, Piete On 20/09/2007, mark angelillo <[EMAIL PROTECTED]> wrote: > > Hey all, > > Let's say I have an index of one hundred documents, and these > documents are grouped into 4 groups A, B, C, and D. The groups do in > fact overlap. What would people recommend as the best way to apply a > search query and return only the documents that are in group A? Also, > how about if we run the same search query but return only those > documents in groups A, C and D? > > I imagine that I could do this by indexing a text field populated > with the group names and adding something like "groups:A" to the > query but I'm wondering if there's a better solution. > > Thanks in advance, > Mark > > mark angelillo > snooth inc. > o: 646.723.4328 > c: 484.437.9915 > [EMAIL PROTECTED] > snooth -- 1.7 million ratings and counting... > > >
RE: How can i make a distribute search on Solr?
HI, What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component existed can do the distributed search based on Solr. Thanks Jarvis. -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Thursday, September 20, 2007 10:06 AM To: solr-user@lucene.apache.org Cc: [EMAIL PROTECTED] Subject: Re: How can i make a distribute search on Solr? On Thu, 20 Sep 2007 09:37:51 +0800 "Jarvis" <[EMAIL PROTECTED]> wrote: > If we use the RPC call in nutch . Hi, I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this league to be suggesting architecture stuff :) but i imagine there's nothing wrong with using what they've built if it addresses solr's needs. > Manually separate the index is required . hmm i imagine this really depends on the application. In my case, this separation of which docs go where happens @ a completely different layer. > We will receive reduplicate result if there is reduplicate index document on > different servers. Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? eg, 1) get the job (a query) 2) map it to workers ( servers that provide search results from their own indexing) 3) wait for the results from all workers that reply within acceptable timeframe. 4) comb through the lot of results from all workers, reduce them according to your own biz rules (eg, remove dupes, sort them by quality / priority... here possibly relying on the original parameters of the query in 1) 5) return the reduced results to the frontend. > And also the data updating and single server's error is > hard to deal with. this really depends on your infrastructure + design. Having the indexing , searching and providing of results in different layers should make for some interesting design options... If each searcher (or wherever the index resides) is really a small cluster of servers , the issue of data safety / server error is addressed @ that point. You can also have repeated data across indexes (again, independent indexes) and that's a more ... randomised :) way of keeping the docs safe... For example, IIRC, googleFS keeps copies of each file in 3 servers or more... cheers, B _ {Beto|Norberto|Numard} Meijome "He uses statistics as a drunken man uses lamp-posts ... for support rather than illumination." Andrew Lang (1844-1912) I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Term extraction
Thanks Brian, I think the "smart" approaches you refer to might be outside the scope of my current project. The documents I am indexing already have manually-generated keyword data, moving forward I'd like to have these keywords automatically generated, selected from a pre-defined list of keywords (i.e. the "simple" approach). The data is fairly clean and domain-specific so I don't expect there will be more than several hundred of these phrase terms to deal with, which is why I was exploring the SynonymFilterFactory option. Pieter On 20/09/2007, Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote: > > > I'm currently looking at methods of term extraction and automatic > > keyword > > generation from indexed documents. > > We do it manually (not in solr, but we put the results in solr.) We > do it the usual way - chunk (into n-grams, named entities & noun > phrases) and count (tf & df). It works well enough. There is a bevy > of literature on the topic if you want to get "smart" -- but be > warned smart and fast are likely not very good friends. > > A lot depends on the provenance of your data -- is it clean text that > uses a lot of domain specific terms? Is it webtext? > >
Re: How can i make a distribute search on Solr?
On 19-Sep-07, at 7:21 PM, Jarvis wrote: HI, What you say is done by hadoop that support Hardware Failure、Data Replication and some else . If we want to implement such a good system by ourselves without HDFS but Solr , it's a very very complex work I think. :) I just want to know whether there is a component existed can do the distributed search based on Solr. https://issues.apache.org/jira/browse/SOLR-303? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel regards, -Mike
Re: setting absolute path for snapshooter in solrconfig.xml doesn't work
Hi, Pieter, Thanks! Now the exception is gone. However, There's no snapshot file created in the data directory. Strangely, the snapshooter.log seems to complete successfully. Any idea what else I'm missing? $ cat var/SolrHome/solr/logs/snapshooter.log 2007/09/19 20:16:17 started by solruser 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2 2007/09/19 20:16:17 taking snapshot var/SolrHome/solr/data/snapshot.20070919201617 2007/09/19 20:16:17 ended (elapsed time: 0 sec) Thanks, -Hui On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote: > > See this recent thread for some helpful info: > > http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792 > > You'll probably want to configure your exe with an absolute path rather > than > the dir: > > /var/SolrHome/solr/bin/snapshooter > . > > In order to get the snapshooter working correctly. > > cheers, > Piete > > > > On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > > > > Hi, there, > > > > I used an absolute path for the "dir" param in the solrconfig.xml as > > below: > > > > > > snapshooter > > /var/SolrHome/solr/bin > > true > >arg1 arg2 > >MYVAR=val1 > > > > > > However, I got "snapshooter: not found" exception thrown in > catalina.out. > > I don't see why this doesn't work. Anything I'm missing? > > > > > > Many thanks, > > > > -Hui > > > -- Regards, -Hui
Re: setting absolute path for snapshooter in solrconfig.xml doesn't work
If you don't need to pass any command line arguments to snapshooter, remove (or comment out) this line from solrconfig.xml: arg1 arg2 By the same token, if you're not setting environment variables either, remove the following line as well: MYVAR=val1 Once you alter / remove those two lines, snapshooter should function as expected. cheers, Piete On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > > Hi, Pieter, > > Thanks! Now the exception is gone. However, There's no snapshot file > created in the data directory. Strangely, the snapshooter.log seems to > complete successfully. Any idea what else I'm missing? > > $ cat var/SolrHome/solr/logs/snapshooter.log > 2007/09/19 20:16:17 started by solruser > 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2 > 2007/09/19 20:16:17 taking snapshot > var/SolrHome/solr/data/snapshot.20070919201617 > 2007/09/19 20:16:17 ended (elapsed time: 0 sec) > > Thanks, > > -Hui > > > > > On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote: > > > > See this recent thread for some helpful info: > > > > > http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792 > > > > You'll probably want to configure your exe with an absolute path rather > > than > > the dir: > > > > /var/SolrHome/solr/bin/snapshooter > > . > > > > In order to get the snapshooter working correctly. > > > > cheers, > > Piete > > > > > > > > On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > > > > > > Hi, there, > > > > > > I used an absolute path for the "dir" param in the solrconfig.xml as > > > below: > > > > > > > > > snapshooter > > > /var/SolrHome/solr/bin > > > true > > >arg1 arg2 > > >MYVAR=val1 > > > > > > > > > However, I got "snapshooter: not found" exception thrown in > > catalina.out. > > > I don't see why this doesn't work. Anything I'm missing? > > > > > > > > > Many thanks, > > > > > > -Hui > > > > > > > > > -- > Regards, > > -Hui >
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 10:02:08 +0800 "Jarvis" <[EMAIL PROTECTED]> wrote: > You can see the code in "org.apache.nutch.searcher.NutchBean" class . :) thx for the pointer. _ {Beto|Norberto|Numard} Meijome "In order to avoid being called a flirt, she always yielded easily." Charles, Count Talleyrand I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
On Thu, 20 Sep 2007 10:21:39 +0800 "Jarvis" <[EMAIL PROTECTED]> wrote: > What you say is done by hadoop that support Hardware Failure、Data > Replication and some else . > If we want to implement such a good system by ourselves without HDFS > but Solr , it's a very very complex work I think. :) > I just want to know whether there is a component existed can do the > distributed search based on Solr. Thanks for the info. Risking starting up a flame war (which is not my intention :) ), what design reasons / features are there in Solr but not in hadoop/nutch that would make it compelling to use solr instead of h/n ? I know, each case is different the feeling i got from a shortish read into h/n was that H/N is geared towards webpage indexing, crawling,etc. But possibly i'm missing something... Where Solr is , from my point of view, far more flexible. In which case, maybe porting HDFS into Solr to add all this clustering / map/reduce options... thanks for your time and insights :) B _ {Beto|Norberto|Numard} Meijome Windows caters to everyone as though they are idiots. UNIX makes no such assumption. It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: How can i make a distribute search on Solr?
Along similar lines : assuming that i have 2 indexes in the same box , say at : /home/abc/data/index1 and /home/abc/data/index2, and i want the results from both the indexes when i do a search - then how should this be 'optimally' designed - basically these are different Solr homes and i want the results to be clearly demarcated as coming from 2 different sources. -Venkat On 9/20/07, Norberto Meijome <[EMAIL PROTECTED]> wrote: > > On Thu, 20 Sep 2007 10:21:39 +0800 > "Jarvis" <[EMAIL PROTECTED]> wrote: > > > What you say is done by hadoop that support Hardware Failure、Data > > Replication and some else . > > If we want to implement such a good system by ourselves without > HDFS > > but Solr , it's a very very complex work I think. :) > > I just want to know whether there is a component existed can do > the > > distributed search based on Solr. > > Thanks for the info. > > Risking starting up a flame war (which is not my intention :) ), what > design reasons / features are there in Solr but not in hadoop/nutch that > would make it compelling to use solr instead of h/n ? > > I know, each case is > different the feeling i got from a shortish read into h/n was that H/N > is > geared towards webpage indexing, crawling,etc. But possibly i'm missing > something... > > Where Solr is , from my point of view, far more flexible. In which case, > maybe > porting HDFS into Solr to add all this clustering / map/reduce options... > > thanks for your time and insights :) > B > _ > {Beto|Norberto|Numard} Meijome > > Windows caters to everyone as though they are idiots. UNIX makes no such > assumption. It assumes you know what you are doing, and presents the > challenge > of figuring it out for yourself if you don't. > > I speak for myself, not my employer. Contents may be hot. Slippery when > wet. > Reading disclaimers makes you go blind. Writing them is worse. You have > been > Warned. > --
Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory
On 9/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > you imply that you are building your index using embedded solr, but based > on your stack trace it seems you are using Solr in a servlet container ... > i assume to search the index you've already built? I have a jsp that routes the info from a drupal module to my Embedded solr app. Does this case arise when i do a search when there is no index?? - If yes, then i guess the Exception can be made more meaningful. Is the embeddd core completley closed before your servlet cotainer running > Solr is started? what does hte directly list look like in between the > finish of A and the start of B? yes - it is closed ; but i guess this problem arises when i do a search when no index is created - can you confirm this. -Venkat
Re: setting absolute path for snapshooter in solrconfig.xml doesn't work
Thanks, it works now. regards, -Hui On 9/19/07, Pieter Berkel <[EMAIL PROTECTED] > wrote: > > If you don't need to pass any command line arguments to snapshooter, > remove > (or comment out) this line from solrconfig.xml: > > arg1 arg2 > > By the same token, if you're not setting environment variables either, > remove the following line as well: > > MYVAR=val1 > > Once you alter / remove those two lines, snapshooter should function as > expected. > > cheers, > Piete > > > > On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > > > > Hi, Pieter, > > > > Thanks! Now the exception is gone. However, There's no snapshot file > > created in the data directory. Strangely, the snapshooter.log seems to > > complete successfully. Any idea what else I'm missing? > > > > $ cat var/SolrHome/solr/logs/snapshooter.log > > 2007/09/19 20:16:17 started by solruser > > 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 > arg2 > > 2007/09/19 20:16:17 taking snapshot > > var/SolrHome/solr/data/snapshot.20070919201617 > > 2007/09/19 20:16:17 ended (elapsed time: 0 sec) > > > > Thanks, > > > > -Hui > > > > > > > > > > On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote: > > > > > > See this recent thread for some helpful info: > > > > > > > > http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792 > > > > > > > You'll probably want to configure your exe with an absolute path > rather > > > than > > > the dir: > > > > > > /var/SolrHome/solr/bin/snapshooter > > > . > > > > > > In order to get the snapshooter working correctly. > > > > > > cheers, > > > Piete > > > > > > > > > > > > On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote: > > > > > > > > Hi, there, > > > > > > > > I used an absolute path for the "dir" param in the solrconfig.xml as > > > > below: > > > > > > > > > > > > snapshooter > > > > /var/SolrHome/solr/bin > > > > true > > > >arg1 arg2 > > > >MYVAR=val1 > > > > > > > > > > > > However, I got "snapshooter: not found" exception thrown in > > > catalina.out. > > > > I don't see why this doesn't work. Anything I'm missing? > > > > > > > > > > > > Many thanks, > > > > > > > > -Hui > > > > > > > > > > > > > > > -- > > Regards, > > > > -Hui > > > -- Regards, -Hui