Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
Hi ,

Product : Solr  (Embedded)Version : 1.2

Problem Description :
While trying to add and search over the index, we are stumbling on this
error again and again.
Do note that the SolrCore is committed and closed suitably in our Embedded
Solr.

Error (StackTrace) :
Sep 19, 2007 9:41:41 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/data/pub/index: files:
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(
SegmentInfos.java:516)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
at org.apache.solr.search.SolrIndexSearcher.(
SolrIndexSearcher.java:87)
at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122)
at com.serendio.diskoverer.core.entextor.CreateSolrIndex.(
CreateSolrIndex.java:70)
at org.apache.jsp.AddToPubIndex_jsp._jspService
(AddToPubIndex_jsp.java:57)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java
:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at org.apache.jasper.servlet.JspServletWrapper.service(
JspServletWrapper.java:393)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(
JspServlet.java:320)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:175)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:263)
at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:584)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(
JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Extra Information :
/data/pub is the Solr Home.
/data/pub/index contains the index.
CreateSolrIndex.java is our program that creates and searches over the index

Regards,
Venkat
--


multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson


TestJettyLargeVolume.java
Description: Binary data
we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large problem.  we're creating small documents with an id and one field of 1 term only submitting them in batches of 200 with commits every 5000 docs.  when we run the client with 1 thread everything is fine.  when we run it win >1 threads things go south (stack trace is below).  i've attached the junit test which shows the problem.  this happens on both a mac and a pc and when running solr in both jetty and tomcat.  i'll create a junit issue if necessary but i thought i'd see if anyone else had run into this problem first.  (output from junit test)Started thread: 0Started thread: 1org.apache.solr.common.SolrException: Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjeCurrent_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjerequest: http://localhost:8983/solr/update?wt=xml&version=2.2	at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:230)	at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:199)	at org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:46)	at org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:61)	at org.apache.solr.client.solrj.embedded.TestJettyLargeVolume$DocThread.run(TestJettyLargeVolume.java:69)Exception in thread "

Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
I am using Tomcat 6 and Solr 1.2 on a Windows 2003 server using the
following java code.   I am trying to index pdf files, and I'm
constantly getting errors on larger files (the same ones).  

 

  SolrServer server = new CommonsHttpSolrServer(solrPostUrl);

  SolrInputDocument addDoc = new SolrInputDocument();

  addDoc.addField("url", url);

  addDoc.addField("site", site);

  addDoc.addField("author", author);

  addDoc.addField("title", title);

  addDoc.addField("subject", subject);

  addDoc.addField("keywords", keywords);

  addDoc.addField("text", docText);

  UpdateRequest ur = new UpdateRequest();

  ur.setAction( UpdateRequest.ACTION.COMMIT, false, false );  //Auto
Commits on Update...

  ur.add(addDoc);

  UpdateResponse rsp = ur.process(server);

 

The java error I received is: class
org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Software caused connection abort: recv
failed)

Tomcat Log:

SEVERE: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at
java.net.SocketInputStream.read(SocketInputStream.java:129)

at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.ja
va:716)

at
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRe
ad(InternalInputBuffer.java:746)

at
org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInpu
tFilter.java:116)

at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.
java:675)

at org.apache.coyote.Request.doRead(Request.java:428)

at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java
:297)

at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:405)

at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:312)

at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.j
ava:193)

at
sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)

at
sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)

at
java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)

at
org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384)

at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

at
org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)

at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequest
Handler.java:332)

at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH
andler.java:162)

at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd
ateRequestHandler.java:84)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)

at
org.apache.solr.core.SolrCore.execute(SolrCore.java:658)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:191)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:159)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:175)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
63)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
4)

at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:584)

at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)

 

This happens when I try to index a field containing the contents of the
PDF file.  It's string length is 189002.  If I only do a substring on
the field of say length 15, it usually will work.  Does anyone have
any idea on why this might be happening?

 

I have had this and other files in

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Bill Au
What files are there in your /data/pub/index directory?

Bill

On 9/19/07, Venkatraman S <[EMAIL PROTECTED]> wrote:
>
> Hi ,
>
> Product : Solr  (Embedded)Version : 1.2
>
> Problem Description :
> While trying to add and search over the index, we are stumbling on this
> error again and again.
> Do note that the SolrCore is committed and closed suitably in our Embedded
> Solr.
>
> Error (StackTrace) :
> Sep 19, 2007 9:41:41 AM org.apache.catalina.core.StandardWrapperValveinvoke
> SEVERE: Servlet.service() for servlet jsp threw exception
> java.io.FileNotFoundException: no segments* file found in
> org.apache.lucene.store.FSDirectory@/data/pub/index: files:
> at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(
> SegmentInfos.java:516)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
> at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
> at org.apache.solr.search.SolrIndexSearcher.(
> SolrIndexSearcher.java:87)
> at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122)
> at com.serendio.diskoverer.core.entextor.CreateSolrIndex.(
> CreateSolrIndex.java:70)
> at org.apache.jsp.AddToPubIndex_jsp._jspService
> (AddToPubIndex_jsp.java:57)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java
> :70)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
> at org.apache.jasper.servlet.JspServletWrapper.service(
> JspServletWrapper.java:393)
> at org.apache.jasper.servlet.JspServlet.serviceJspFile(
> JspServlet.java:320)
> at org.apache.jasper.servlet.JspServlet.service(JspServlet.java
> :266)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> ApplicationFilterChain.java:290)
> at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> ApplicationFilterChain.java:206)
> at org.apache.catalina.core.StandardWrapperValve.invoke(
> StandardWrapperValve.java:233)
> at org.apache.catalina.core.StandardContextValve.invoke(
> StandardContextValve.java:175)
> at org.apache.catalina.core.StandardHostValve.invoke(
> StandardHostValve.java:128)
> at org.apache.catalina.valves.ErrorReportValve.invoke(
> ErrorReportValve.java:102)
> at org.apache.catalina.core.StandardEngineValve.invoke(
> StandardEngineValve.java:109)
> at org.apache.catalina.connector.CoyoteAdapter.service(
> CoyoteAdapter.java:263)
> at org.apache.coyote.http11.Http11Processor.process(
> Http11Processor.java:844)
> at
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
> Http11Protocol.java:584)
> at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(
> JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:619)
>
> Extra Information :
> /data/pub is the Solr Home.
> /data/pub/index contains the index.
> CreateSolrIndex.java is our program that creates and searches over the
> index
>
> Regards,
> Venkat
> --
>


Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
Quite inetersting actually (this is for 5 documents that were indexed) :

_0.fdt  _0.prx  _1.fnm  _1.tis  _2.nrm  _3.fdx  _3.tii  _4.frq  segments.gen
_0.fdx  _0.tii  _1.frq  _2.fdt  _2.prx  _3.fnm  _3.tis  _4.nrm  segments_6
_0.fnm  _0.tis  _1.nrm  _2.fdx  _2.tii  _3.frq  _4.fdt  _4.prx
_0.frq  _1.fdt  _1.prx  _2.fnm  _2.tis  _3.nrm  _4.fdx  _4.tii
_0.nrm  _1.fdx  _1.tii  _2.frq  _3.fdt  _3.prx  _4.fnm  _4.tis


On 9/19/07, Bill Au <[EMAIL PROTECTED]> wrote:
>
> What files are there in your /data/pub/index directory?
>
> Bill
>
> On 9/19/07, Venkatraman S <[EMAIL PROTECTED]> wrote:
> >
> > Hi ,
> >
> > Product : Solr  (Embedded)Version : 1.2
> >
> > Problem Description :
> > While trying to add and search over the index, we are stumbling on this
> > error again and again.
> > Do note that the SolrCore is committed and closed suitably in our
> Embedded
> > Solr.
> >
> > Error (StackTrace) :
> > Sep 19, 2007 9:41:41 AM
> org.apache.catalina.core.StandardWrapperValveinvoke
> > SEVERE: Servlet.service() for servlet jsp threw exception
> > java.io.FileNotFoundException: no segments* file found in
> > org.apache.lucene.store.FSDirectory@/data/pub/index: files:
> > at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(
> > SegmentInfos.java:516)
> > at org.apache.lucene.index.IndexReader.open(IndexReader.java
> :185)
> > at org.apache.lucene.index.IndexReader.open(IndexReader.java
> :148)
> > at org.apache.solr.search.SolrIndexSearcher.(
> > SolrIndexSearcher.java:87)
> > at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122)
> > at com.serendio.diskoverer.core.entextor.CreateSolrIndex.(
> > CreateSolrIndex.java:70)
> > at org.apache.jsp.AddToPubIndex_jsp._jspService
> > (AddToPubIndex_jsp.java:57)
> > at org.apache.jasper.runtime.HttpJspBase.service(
> HttpJspBase.java
> > :70)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
> > at org.apache.jasper.servlet.JspServletWrapper.service(
> > JspServletWrapper.java:393)
> > at org.apache.jasper.servlet.JspServlet.serviceJspFile(
> > JspServlet.java:320)
> > at org.apache.jasper.servlet.JspServlet.service(JspServlet.java
> > :266)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
> > at
> > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
> > ApplicationFilterChain.java:290)
> > at org.apache.catalina.core.ApplicationFilterChain.doFilter(
> > ApplicationFilterChain.java:206)
> > at org.apache.catalina.core.StandardWrapperValve.invoke(
> > StandardWrapperValve.java:233)
> > at org.apache.catalina.core.StandardContextValve.invoke(
> > StandardContextValve.java:175)
> > at org.apache.catalina.core.StandardHostValve.invoke(
> > StandardHostValve.java:128)
> > at org.apache.catalina.valves.ErrorReportValve.invoke(
> > ErrorReportValve.java:102)
> > at org.apache.catalina.core.StandardEngineValve.invoke(
> > StandardEngineValve.java:109)
> > at org.apache.catalina.connector.CoyoteAdapter.service(
> > CoyoteAdapter.java:263)
> > at org.apache.coyote.http11.Http11Processor.process(
> > Http11Processor.java:844)
> > at
> > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
> > Http11Protocol.java:584)
> > at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(
> > JIoEndpoint.java:447)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > Extra Information :
> > /data/pub is the Solr Home.
> > /data/pub/index contains the index.
> > CreateSolrIndex.java is our program that creates and searches over the
> > index
> >
> > Regards,
> > Venkat
> > --
> >
>



--


Re: multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson
one other note.  the errors pop up when running against the 1.3 trunk
but do not appear to happen when run against 1.2.

- will

On 9/19/07, Will Johnson <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
> we were doing some performance testing for the updating aspects of solr and
> ran into what seems to be a large problem.  we're creating small documents
> with an id and one field of 1 term only submitting them in batches of 200
> with commits every 5000 docs.  when we run the client with 1 thread
> everything is fine.  when we run it win >1 threads things go south (stack
> trace is below).  i've attached the junit test which shows the problem.
> this happens on both a mac and a pc and when running solr in both jetty and
> tomcat.  i'll create a junit issue if necessary but i thought i'd see if
> anyone else had run into this problem first.
>
> (output from junit test)
> Started thread: 0
> Started thread: 1
> org.apache.solr.common.SolrException:
> Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje
>
> Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje
>
> request:
> http://localhost:8983/solr/update?wt=xml&version=2.2
>  at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:230)
>  at
> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:199)
>  at
> org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:46)
>  at
> or

Re: multithread update client causes exceptions and dropped documents

2007-09-19 Thread Ryan McKinley

Can you start a JIRA issue and attach the patch?

I have not seen this happen, but I bet it is caused by something from:
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.ext.subversion:subversion-commits-tabpanel

Can we add that test to trunk?  By default it does not need to be a long 
running test, but its nice to have in there so we can twiddle it for 
specific testing.


thanks
ryan


Will Johnson wrote:

one other note.  the errors pop up when running against the 1.3 trunk
but do not appear to happen when run against 1.2.

- will

On 9/19/07, Will Johnson <[EMAIL PROTECTED]> wrote:





we were doing some performance testing for the updating aspects of solr and
ran into what seems to be a large problem.  we're creating small documents
with an id and one field of 1 term only submitting them in batches of 200
with commits every 5000 docs.  when we run the client with 1 thread
everything is fine.  when we run it win >1 threads things go south (stack
trace is below).  i've attached the junit test which shows the problem.
this happens on both a mac and a pc and when running solr in both jetty and
tomcat.  i'll create a junit issue if necessary but i thought i'd see if
anyone else had run into this problem first.

(output from junit test)
Started thread: 0
Started thread: 1
org.apache.solr.common.SolrException:
Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandl

erhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje


Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandl

erhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpPa

Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley <[EMAIL PROTECTED]> wrote:

> Stu is referring to Federated Search - where each index has some of the 
> data and results are combined before they are returned.  This is not yet 
> supported out of the "box"

Maybe this is related. How does this compare to the map-reduce functionality in 
Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

"With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. 
It is hard to be sure where they are going to land, and it could be dangerous 
sitting under them as they fly overhead."
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> On Wed, 19 Sep 2007 01:46:53 -0400
> Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
> > Stu is referring to Federated Search - where each index has some of the

It really should be Distributed Search I think (my mistake... I
started out calling it Federated).  I think Federated search is more
about combining search results from different data sources.

> > data and results are combined before they are returned.  This is not yet
> > supported out of the "box"
>
> Maybe this is related. How does this compare to the map-reduce functionality 
> in Nutch/Hadoop ?

map-reduce is more for batch jobs.  Nutch only uses map-reduce for
parallel indexing, not searching.

-Yonik


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley


I have had this and other files index correctly using a different
combination version of Tomcat/Solr without any problem (using similar
code, I re-wrote it because I thought it would be better to use Solrj).
I get the same error whether I use a simple StringBuilder to created the
add manually or if I use Solrj.  I have manually encoded each field
before passing it in to the add function as well, so I don't believe it
is a content problem.   I have tried to change every setting in Tomcat
and Solr that I can think of, but I'm newer to both of them.  



So it works if you build an XML file with the same content and send it 
to the server using the example post.sh/post.jar tool?


Have you tried messing with the connection settings?
 SolrServer server = new CommonsHttpSolrServer( url );
  ((CommonsHttpSolrServer)server).setConnectionTimeout(5);
  ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
  ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

a timeout of 5ms is probably too short...


ryan


Re: How can i make a distribute search on So lr?

2007-09-19 Thread Stu Hood
Nutch implements federated search separately from their index generation.

My understanding is that MapReduce jobs generate the indexes (Nutch calls them 
segments) from raw data that has been downloaded, and then makes them available 
to be searched via remote procedure calls. Queries never pass through MapReduce 
in any shape or form, only the raw data and indexes.

If you take a look at the "org.apache.nutch.searcher.DistributedSearch" class, 
specifically the #Client.search method, you can see how they handle the actual 
federation of results.

Thanks,
Stu


-Original Message-
From: Norberto Meijome 
Sent: Wednesday, September 19, 2007 10:23am
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley  wrote:

> Stu is referring to Federated Search - where each index has some of the 
> data and results are combined before they are returned.  This is not yet 
> supported out of the "box"

Maybe this is related. How does this compare to the map-reduce functionality in 
Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

"With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. 
It is hard to be sure where they are going to land, and it could be dangerous 
sitting under them as they fly overhead."
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Ryan McKinley
Jarvis wrote:
> Thanks for your reply,
> I need the Federated Search. You mean this is not yet 
> supported out of the "box". So I have a question that 
> in this situation what can Collection Distribution used for?
> 

The collection distribution scripts help you get duplicate copies of the
same index distributed across many computers.  This lets you put a load
balancer in front of each server and lets you share the load across N
servers.

The collection distribution scripts are particularly useful since NFS
and lucene don't play well together.

ryan


RE: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
I have tried changing those settings, for example, as:

SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
((CommonsHttpSolrServer)server).setConnectionTimeout(60);
((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

However, still no luck.  

I took the SimplePostTool.java file from the wiki, changed the URL,
compiled it and ran it with the output of the command:
UpdateRequest ur = new UpdateRequest();
ur.add(addDoc);
String xml = ur.getXML();

This works.  It seems that it must be a communication setting, but I'm
stumped.  

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 10:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

> 
> I have had this and other files index correctly using a different
> combination version of Tomcat/Solr without any problem (using similar
> code, I re-wrote it because I thought it would be better to use
Solrj).
> I get the same error whether I use a simple StringBuilder to created
the
> add manually or if I use Solrj.  I have manually encoded each field
> before passing it in to the add function as well, so I don't believe
it
> is a content problem.   I have tried to change every setting in Tomcat
> and Solr that I can think of, but I'm newer to both of them.  
> 

So it works if you build an XML file with the same content and send it 
to the server using the example post.sh/post.jar tool?

Have you tried messing with the connection settings?
  SolrServer server = new CommonsHttpSolrServer( url );
   ((CommonsHttpSolrServer)server).setConnectionTimeout(5);
   ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
   ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

a timeout of 5ms is probably too short...


ryan


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley

Daley, Kristopher M. wrote:

I have tried changing those settings, for example, as:

SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
((CommonsHttpSolrServer)server).setConnectionTimeout(60);
((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

However, still no luck.  



Have you tried anything larger then 60?  60ms is not long...

try 1 (10s) and see if it works.



RE: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
I tried 1 and 6, same result.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 11:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Daley, Kristopher M. wrote:
> I have tried changing those settings, for example, as:
> 
> SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
> ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);
> 
> However, still no luck.  
> 

Have you tried anything larger then 60?  60ms is not long...

try 1 (10s) and see if it works.



Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley
I'm stabbing in the dark here, but try fiddling with some of the other 
connection settings:


 getConnectionManager().getParams().setSendBufferSize( big );
 getConnectionManager().getParams().setReceiveBufferSize( big );

http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apache/commons/httpclient/params/HttpConnectionManagerParams.html




Daley, Kristopher M. wrote:

I tried 1 and 6, same result.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 11:18 AM

To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Daley, Kristopher M. wrote:

I have tried changing those settings, for example, as:

SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
((CommonsHttpSolrServer)server).setConnectionTimeout(60);
((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

However, still no luck.  



Have you tried anything larger then 60?  60ms is not long...

try 1 (10s) and see if it works.






RE: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
Ok, I'll try to play with those.  Any suggestion on the size?

Something else that is very interesting is that I just tried to do an
aggregate add of a bunch of docs, including the one that always returned
the error.

I called a function to create a SolrInputDocument and return it.  I then
did the following:

Collection docs = new ArrayList();
SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
UpdateRequest ur = new UpdateRequest();
ur.setAction( UpdateRequest.ACTION.COMMIT, false, false );  //Auto
Commits on Update...
ur.add(docs);
UpdateResponse rsp = ur.process(server);

In doing this, the program simply hangs after the last command.  If I
let it sit there for an amount of time, it eventually returns with the
error: class org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Connection reset by peer: socket write error)

However, if I go to the tomcat server and restart it after I have issued
the process command, the program returns and the documents are all
posted correctly!

Very strange behavioram I somehow not closing the connection
properly?  

 
-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

I'm stabbing in the dark here, but try fiddling with some of the other 
connection settings:

  getConnectionManager().getParams().setSendBufferSize( big );
  getConnectionManager().getParams().setReceiveBufferSize( big );

http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apac
he/commons/httpclient/params/HttpConnectionManagerParams.html




Daley, Kristopher M. wrote:
> I tried 1 and 6, same result.
> 
> -Original Message-
> From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, September 19, 2007 11:18 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files
> 
> Daley, Kristopher M. wrote:
>> I have tried changing those settings, for example, as:
>>
>> SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
>> ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
>> ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
>> ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);
>>
>> However, still no luck.  
>>
> 
> Have you tried anything larger then 60?  60ms is not long...
> 
> try 1 (10s) and see if it works.
> 
> 



Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Laurent Hoss

Hi

We want to (mis)use facet search to get the number of (unique) field 
values appearing in a document resultset.
I thought  facet search perfect for this, because it already gives me 
all the (unique) field values.
But for us to be used for this special problem, we don't want all the 
values listed in response as there might be over 1 and we don't need 
the values at all, just the count of how many!


I looked at
http://wiki.apache.org/solr/SimpleFacetParameters
and hoped to find a parameter like
facet.sizeOnly = true
(or facet.showSize=true  , combined with facet.limit=1 or other small value)

Would you accept a patch with such a feature ?

It should probably be relatively easy, though not sure if fits into the 
concept of facets..


I looked at the code, maybe  add an extra Value to returned NamedList of 
getFacetCounts() in SimpleFacets ?!


ps: Other user having same request AFAIU :
http://www.nabble.com/showing--range-facet-example-%3D-by-Range-%28-1-to-1000-%29-t3660704.html#a10229069

thanks,

Laurent Hoss   






Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Ryan McKinley


But for us to be used for this special problem, we don't want all the 
values listed in response as there might be over 1 and we don't need 
the values at all, just the count of how many!




check the LukeReqeustHandler

http://wiki.apache.org/solr/LukeRequestHandler

It gives you lots of field based stats.



RE: Triggering snapshooter through web admin interface

2007-09-19 Thread Lance Norskog
Is there a ticket for this yet? I have a bug report and request: I just did
a snapshot while indexing 700 records/sec. and got an inconsistency. I was
tarring off the snapshot and tar reported that a file changed while it was
being copied. The error rolled off my screen, so I cannot report the file
name or extension.

If a solr command to do a snapshot is implemented, please make sure that it
is 100% consistent.

Thanks,

Lance Norskog 

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 18, 2007 11:11 AM
To: solr-user@lucene.apache.org
Subject: RE: Triggering snapshooter through web admin interface


: [Wu, Daniel] That sounds great.  Do I need to create a JIRA ticket?

Sure, JIRA is a great way to track feature requests (since they can be
"watched" and "voted" on, and if you want to start on an implementation you
can attach patches...

http://wiki.apache.org/solr/HowToContribute



-Hoss



"Select distinct" in Solr

2007-09-19 Thread Lance Norskog
I believe I saw in the Javadocs for Lucene that there is the ability to
return the unique values for one field for a search, rather than each
record. Is it possible to add this feature to Solr?  It is the equivalent of
'select distinct' in SQL.
 
Thanks,
 
Lance Norskog


Re: "Select distinct" in Solr

2007-09-19 Thread Ryan McKinley

Lance Norskog wrote:

I believe I saw in the Javadocs for Lucene that there is the ability to
return the unique values for one field for a search, rather than each
record. Is it possible to add this feature to Solr?  It is the equivalent of
'select distinct' in SQL.
 


Look into faceting:
http://wiki.apache.org/solr/SimpleFacetParameters

or maybe the Luke request handler:
http://wiki.apache.org/solr/LukeRequestHandler

ryan



useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Adam Goldband
Anyone else using this, and finding it not working in Solr 1.2?  Since
we've got an automated release process, I really need to be able to have
the appserver not see itself as done warming up until the firstSearcher
is ready to go... but with 1.2 this no longer seems to be the case.

adam


Re: useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Adam Goldband <[EMAIL PROTECTED]> wrote:
> Anyone else using this, and finding it not working in Solr 1.2?  Since
> we've got an automated release process, I really need to be able to have
> the appserver not see itself as done warming up until the firstSearcher
> is ready to go... but with 1.2 this no longer seems to be the case.

I took a quick peek at the code, and it should still work (it's pretty simple).
false is also the default.

How are you determining that it isn't working?

-Yonik


Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Yonik Seeley
On 9/19/07, Laurent Hoss <[EMAIL PROTECTED]> wrote:
> We want to (mis)use facet search to get the number of (unique) field
> values appearing in a document resultset.

We have paging of facets, so just like normal search results, it does
make sense to list the total number of facets matching.

The main problem with implementing this is trying to figure out where
to put the info in a backward compatible manner.  Here is how the info
is currently returned (JSON format):

 "facet_fields":{
"cat":[
   "camera",1,
   "card",2,
   "connector",2,
   "copier",1,
   "drive",2
  ]
},


Unfortunately, there's not a good place to put this extra info without
older clients choking on it.  Within "cat" there should have been
another element called "values" or something... then we could easily
add extra fields like "nvalues":

"cat":{
 "nvalues":5042,
 "values":[
   "camera",1,
   "card",2,
   "connector",2,
   "copier",1,
   "drive",2
  ]
 }

-Yonik


Exact phrase highlighting

2007-09-19 Thread Marc Bechler

Hi out of there,

I just walked through the mailing list archive, but I did not find an 
appropriate answer for phrase highlighting.


I do not have any highlighting section (and no dismax handler 
definition) in solrconfig.xml. This way (AFAIK :-)), the standard lucene 
query syntax should be supported in it's full functionality. But, in 
this case double quoting the search expressions does not have any effect 
on highlighting, i.e.


Assume we have the following text (of field type text)
It is hard work to do the hard complex work

A query for
"hard work"
(with the double quotes) results the highlighted section
It is hard work to do the hard complex 
work


Although I would guess that the correct answer should be
 It is hard work to do the hard complex work

Does anyone of the SOLR experts have a good answer for me? (I guess that 
I still did not understand the functional relationship between 
highlighting, query specification and index specification...)


Thanks for your help

 marc


Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Chris Hostetter

: Product : Solr  (Embedded)Version : 1.2


: java.io.FileNotFoundException: no segments* file found in
: org.apache.lucene.store.FSDirectory@/data/pub/index: files:

According to that, the FSDirectory was empty when it ws opened (a file 
list is suppose to come after that "files: " part)

you imply that you are building your index using embedded solr, but based 
on your stack trace it seems you are using Solr in a servlet container ... 
i assume to search the index you've already built?

Is the embeddd core completley closed before your servlet cotainer running 
Solr is started?  what does hte directly list look like in between the 
finish of A and the start of B?




-Hoss



Re: Exact phrase highlighting

2007-09-19 Thread Mike Klaas

On 19-Sep-07, at 1:12 PM, Marc Bechler wrote:


Hi out of there,

I just walked through the mailing list archive, but I did not find  
an appropriate answer for phrase highlighting.


I do not have any highlighting section (and no dismax handler  
definition) in solrconfig.xml. This way (AFAIK :-)), the standard  
lucene query syntax should be supported in it's full functionality.  
But, in this case double quoting the search expressions does not  
have any effect on highlighting, i.e.


Assume we have the following text (of field type text)
It is hard work to do the hard complex work

A query for
"hard work"
(with the double quotes) results the highlighted section
It is hard work to do the hard complex  
work


Although I would guess that the correct answer should be
 It is hard work to do the hard complex work

Does anyone of the SOLR experts have a good answer for me? (I guess  
that I still did not understand the functional relationship between  
highlighting, query specification and index specification...)


It currently is not supported by Solr.  There is work in lucene that  
supports this (see https://issues.apache.org/jira/browse/LUCENE-794? 
page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
tabpanel#action_12526803), but it is currently not integrated.


It would make a great project to get one's hands dirty contributing,  
though :)


-Mike


Re: DisMax queries referencing undefined fields

2007-09-19 Thread Chris Hostetter

: I noticed that the "field list" (fl) parameter ignores field names that it
: cannot locate, while the "query fields" (qf) parameter throws an exception
: when fields cannot be located.  Is there any way to override this behavior and
: have qf also ignore fields it cannot find?

Those parameters are radically different.  FL isn't evaluated untill after 
a query is executed and it's time to return documents ... just because the 
current range of documents being returned doesn't have a value doesn't 
mean there is a problem with the FL -- other documents in the same DocSet 
might have those values.  It's not that Solr ignores fields in the "fl" 
that it can't locate, as it is that Solr tests each field a document to be 
returned has, and only returns it if the field is in the FL. 

In theory, field names in the FL should be tested to see if a matching 
field or dynamic field exists that would match and generate a 
warning/error if it's not -- i would consider that an FL bug.

The semantics of QF follow directly from the semantics of the standard 
query parser: if you tell it to query against a field which does not exist 
for any document, then something is wrong with the request.  Unlike the 
FL case (which is lazy for not checking that the feild exists) dismax 
has to check each field because it needs to know how to analyze the input 
for every field in the QF -- if the field doesn't exist, it can't do that.

: This would be pretty helpful for us, as we're going to have a large number of
: dynamic, user-specific fields defined in our schema.  These fields will have
: canonical name formats (e.g. userid-comment), but they may not be defined for
: every document.  In fact, some fields may be defined for no documents, which I
: gather would be the ones that would throw exceptions.  It would be nice to
: provide solr a set of fields that could be searched and have it use the subset
: of those fields that exist.

i supose it would be possible to make an option for dismax to ignore any 
field it can't find, but that would be fairly kludgy and would introduce 
some really confusing edge cases (ie: what happens if non of the fields in 
QF can be found)

A better option would probably be to use something like this from the 
sample schema.xml...


   


...then any field name you want will work regardless of wether you are 
using dismax or hte standard request handler.


Hmmm: it might be better though if the "ignored" field type was a 
TextField with an Analyzer that produced no tokens ... then it would drop 
out of the query completely ... anyone want to submit a "NoOpAnalyzer" :)  


-Hoss



analysis page and search not in sync - no result for "t-shirt"?

2007-09-19 Thread Martin Grotzke
Hello,

I have an issue, that "T-Shirt" is not found, even if there
are documents with the title "T-Shirt".

The analysis page shows that both the index-analyzer and the
query-analyzer create "t" and "shirt" of this.

However, when I search for "t-shirt", I don't find anything.

The product title is copied to the field "text", this has the
type "text", which is configured like this:


  



  
  



  


What might be the reason for this?

Thanx && cheers,
Martin




signature.asc
Description: This is a digitally signed message part


Re: Exact phrase highlighting

2007-09-19 Thread Marc Bechler

Hi Mike,

thanks for the quick response.

> It would make a great project to get one's hands dirty contributing, 
though :)


... sounds like giving a broad hint ;-) Sounds challenging...

Regards from Germany

 marc




Re: Exact phrase highlighting

2007-09-19 Thread Mike Klaas

On 19-Sep-07, at 2:39 PM, Marc Bechler wrote:


Hi Mike,

thanks for the quick response.

> It would make a great project to get one's hands dirty  
contributing, though :)


... sounds like giving a broad hint ;-) Sounds challenging...


I'm not sure about that--it is supposed to be a drop-in replacement  
for Highlighter.  I expect most of the work will consist of figuring  
the right of way of packaging it in a jar for solr inclusion.


-Mike


Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Chris Hostetter

: The main problem with implementing this is trying to figure out where
: to put the info in a backward compatible manner.  Here is how the info

1) this seems like the kind of thing that would only be returend if 
requested -- so we probably don't have to be overly concerned about 
backwards compatibility. if people are going to request this information, 
they have to make their client code to look for it, so they can also 
make their client code to know how to distinguish it from the existing 
counts.

2) the counts themselves are "first order" faceting data, so i think it 
makes sense to leave them where they are ... "metadata" about the field 
could be included as an sub-list at the start of the field list 
(much like the missing count is included as an unnamed int at the end of 
the field list)  this sub-list could be unnamed to help distinguish it 
from field term values -- but frankly i don't think that's a huge deal -- 
the counts will standa out for being integers, while this metadata would 
be a nested NamedList (aka: map, aka hash, aka however it's represented in 
the format used)

structure could be something like...

...&facet.field=cat&facet.limit=3&facet.mincount=5&facet.missing=true

 
  
 
   42
   678

30
20
10
5
  



-Hoss



RE: Triggering snapshooter through web admin interface

2007-09-19 Thread Chris Hostetter

lance: since the topic you are describing is not directly related to 
triggering a snapshot from the web interface can you please start a new 
thread with a unique subejct describing in more details exactly what it 
was you were doing and the problem you encountered?

this will make it easier for your problem to get visibility (some people 
don't read every thread, and archive searching is frequently done by 
thread, so people looking for similar problems may not realize this new 
thread is burried inside an old one)

-Hoss

: Date: Wed, 19 Sep 2007 11:33:30 -0700
: From: Lance Norskog <[EMAIL PROTECTED]>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: RE: Triggering snapshooter through web admin interface
: 
: Is there a ticket for this yet? I have a bug report and request: I just did
: a snapshot while indexing 700 records/sec. and got an inconsistency. I was
: tarring off the snapshot and tar reported that a file changed while it was
: being copied. The error rolled off my screen, so I cannot report the file
: name or extension.
: 
: If a solr command to do a snapshot is implemented, please make sure that it
: is 100% consistent.
: 
: Thanks,
: 
: Lance Norskog 



rsync start and enable for multiple solr instances within one tomcat

2007-09-19 Thread Yu-Hui Jin
Hi, there,

So we are using the Tomcat's JNDI method to set up multiple solr instances
within a tomcat server. Each instance has a solr home directory.

Now we want to set up collection distribution for all these solr home
indexes. My understanding is:

1.  we only need to run rsync-start once use the script under any of the
solr home dirs.
2.  we need to run each of the rsync-enable scripts under the solr home's
bin dirs.
3.  the twiki page at
http://wiki.apache.org/solr/SolrCollectionDistributionScripts  keeps
refering to solr/xxx. Is this "solr" the example solr home dir?  If so,
would it be hard-coded in any of the scripts?  For example, I saw in
snappuller line 226 (solr 1.2):

${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/
${data_dir}/${name}-wip

Is the above "solr" a hard-coded solr home name? If so, it's not desirable
since we have multiple solr homes with different names.  If not, what is
this "solr"?


thanks,

-Hui


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley


However, if I go to the tomcat server and restart it after I have issued
the process command, the program returns and the documents are all
posted correctly!

Very strange behavioram I somehow not closing the connection
properly?  



What version is the solr you are connecting to? 1.2 or 1.3-dev?  (I have 
not tested against 1.2)


Does this only happen with tomcat?  If you run with jetty do you get the 
same behavior?  (again, just stabs in the dark)


If you can make a small repeatable problem, post it in JIRA and I'll 
look into it.


ryan



setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Hi, there,

I used an absolute path for the "dir" param in the solrconfig.xml as below:


  snapshooter
  /var/SolrHome/solr/bin
  true
   arg1 arg2 
   MYVAR=val1 


However, I got "snapshooter: not found"  exception thrown in catalina.out.
I don't see why this doesn't work. Anything I'm missing?


Many thanks,

-Hui

catalina.out logs:
=
..
Sep 19, 2007 6:17:20 PM org.apache.solr.handler.XmlUpdateRequestHandlerupdate
INFO: added id={SOLR1000} in 67ms
Sep 19, 2007 6:17:20 PM org.apache.solr.core.SolrCore execute
INFO: /update  0 86
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2doDeletions
INFO: DirectUpdateHandler2 docs deleted=0
Sep 19, 2007 6:17:21 PM org.apache.solr.core.SolrException log
SEVERE: java.io.IOException: java.io.IOException: snapshooter: not found
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
at java.lang.Runtime.exec(Runtime.java:591)
at org.apache.solr.core.RunExecutableListener.exec(
RunExecutableListener.java:70)
at org.apache.solr.core.RunExecutableListener.postCommit(
RunExecutableListener.java:97)
at org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(
UpdateHandler.java:99)
at org.apache.solr.update.DirectUpdateHandler2.commit(
DirectUpdateHandler2.java:514)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:214)
at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
(XmlUpdateRequestHandler.java:84)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:191)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:159)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:202)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
at org.apache.catalina.valves.AccessLogValve.invoke(
AccessLogValve.java:526)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java:856)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection
(Http11Protocol.java:7
44)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)


Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher 
INFO: Opening [EMAIL PROTECTED] main
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00
,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00
,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
queryResultCache{lookups=1,hits=0,hitratio=0.00
,inserts=1,evictions=0,size=1,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=1,cumulative_eviction

RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
Nutch has two ways to make a distributed query - through HDFS(hadoop file
system)  or RPC call that is in
"org.apache.nutch.searcher.DistributedSearch" class.

But I think these are both not good enough.

If we use HDFS to service the user's query. Stability is a problem. We must
all do the crawl , index , query on HDFS and use mapreduce. Can we trust in
hadoop all the time?:)

If we use the RPC call in nutch . Manually separate the index is required .
We will receive reduplicate result if there is reduplicate index document on
different servers. And also the data updating and single server's error is
hard to deal with.

Thanks,
Jarvis


-Original Message-
From: Stu Hood [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 10:37 PM
To: solr-user@lucene.apache.org
Subject: Re: How can i make a distribute search on Solr?

Nutch implements federated search separately from their index generation.

My understanding is that MapReduce jobs generate the indexes (Nutch calls
them segments) from raw data that has been downloaded, and then makes them
available to be searched via remote procedure calls. Queries never pass
through MapReduce in any shape or form, only the raw data and indexes.

If you take a look at the "org.apache.nutch.searcher.DistributedSearch"
class, specifically the #Client.search method, you can see how they handle
the actual federation of results.

Thanks,
Stu


-Original Message-
From: Norberto Meijome 
Sent: Wednesday, September 19, 2007 10:23am
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley  wrote:

> Stu is referring to Federated Search - where each index has some of the 
> data and results are combined before they are returned.  This is not yet 
> supported out of the "box"

Maybe this is related. How does this compare to the map-reduce functionality
in Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

"With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. 
It is hard to be sure where they are going to land, and it could be
dangerous sitting under them as they fly overhead."
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



Filter by Group

2007-09-19 Thread mark angelillo

Hey all,

Let's say I have an index of one hundred documents, and these  
documents are grouped into 4 groups A, B, C, and D. The groups do in  
fact overlap. What would people recommend as the best way to apply a  
search query and return only the documents that are in group A? Also,  
how about if we run the same search query but return only those  
documents in groups A, C and D?


I imagine that I could do this by indexing a text field populated  
with the group names and adding something like "groups:A" to the  
query but I'm wondering if there's a better solution.


Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.7 million ratings and counting...




Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 10:29:54 -0400
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> > Maybe this is related. How does this compare to the map-reduce 
> > functionality in Nutch/Hadoop ?  
> 
> map-reduce is more for batch jobs.  Nutch only uses map-reduce for
> parallel indexing, not searching.

I see... so in nutch all nodes have all the date indexed ? 

Thanks,
_
{Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop

"Imagination is more important than knowledge."
  Albert Einstein, On Science

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Term extraction

2007-09-19 Thread Pieter Berkel
I'm currently looking at methods of term extraction and automatic keyword
generation from indexed documents.  I've been experimenting with
MoreLikeThis and values returned by the "mlt.interestingTerms" parameter and
so far this approach has worked well.  However, I'd like to be able to
analyze documents more intelligently to recognize phrase keywords such as
"open source", "Microsoft Office", "Bill Gates" rather than splitting each
word into separate tokens (the field is never used in search queries so
matching is not an issue).  I've been looking at SynonymFilterFactory as a
possible solution to this problem but haven't been able to work out the
specifics of how to configure it for phrase mappings.

Has anybody else dealt with this problem before or able to offer any
insights into achieve the desired results?

Thanks in advance,
Pieter


RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
I think index data which stored in HDFS and generated by map-reduce function
is used for searching in NUTCH-0.9

You can see the code in "org.apache.nutch.searcher.NutchBean" class . :)

Jarvis

-Original Message-
From: Norberto Meijome [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 20, 2007 9:52 AM
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 10:29:54 -0400
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:

> > Maybe this is related. How does this compare to the map-reduce
functionality in Nutch/Hadoop ?  
> 
> map-reduce is more for batch jobs.  Nutch only uses map-reduce for
> parallel indexing, not searching.

I see... so in nutch all nodes have all the date indexed ? 

Thanks,
_
{Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop

"Imagination is more important than knowledge."
  Albert Einstein, On Science

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Pieter Berkel
See this recent thread for some helpful info:
http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792

You'll probably want to configure your exe with an absolute path rather than
the dir:

  /var/SolrHome/solr/bin/snapshooter
  .

In order to get the snapshooter working correctly.

cheers,
Piete



On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote:
>
> Hi, there,
>
> I used an absolute path for the "dir" param in the solrconfig.xml as
> below:
>
> 
>   snapshooter
>   /var/SolrHome/solr/bin
>   true
>arg1 arg2 
>MYVAR=val1 
> 
>
> However, I got "snapshooter: not found"  exception thrown in catalina.out.
> I don't see why this doesn't work. Anything I'm missing?
>
>
> Many thanks,
>
> -Hui
>


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:37:51 +0800
"Jarvis" <[EMAIL PROTECTED]> wrote:

> If we use the RPC call in nutch .
Hi,
I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this
league to be suggesting architecture stuff :) but i imagine there's nothing
wrong with using what they've built if it addresses solr's needs.

>  Manually separate the index is required .

hmm i imagine this really depends on the application. In my case, this
separation of which docs go where happens @ a completely different layer.

> We will receive reduplicate result if there is reduplicate index document on
> different servers. 

Maybe I got this wrong...but isn't this what mapreduce is meant to deal with?
eg, 

1) get the job (a query)
2) map it to workers ( servers that provide search results from their own
indexing)
3) wait for the results from all workers that reply within acceptable timeframe.
4) comb through the lot of  results from all workers, reduce them according to
your own biz rules (eg, remove dupes, sort them by quality / priority... here 
possibly relying on the original parameters of the query in 1)
5) return the reduced results to the frontend.

> And also the data updating and single server's error is
> hard to deal with.

this really depends on your infrastructure + design. 

Having the indexing , searching and providing of results in different layers
should make for some interesting design options...

If each searcher (or wherever the index resides) is really a small cluster of
servers , the issue of data safety / server error is addressed @ that point.
You can also have repeated data across indexes (again, independent indexes) and
that's a more ... randomised :) way of keeping the docs safe... For example,
IIRC, googleFS keeps copies of each file in 3 servers or more...

cheers,
B
_
{Beto|Norberto|Numard} Meijome

"He uses statistics as a drunken man uses lamp-posts ... for support rather
than illumination." Andrew Lang (1844-1912)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Term extraction

2007-09-19 Thread Brian Whitman

On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

I'm currently looking at methods of term extraction and automatic  
keyword

generation from indexed documents.


We do it manually (not in solr, but we put the results in solr.) We  
do it the usual way - chunk (into n-grams, named entities & noun  
phrases) and count (tf & df). It works well enough. There is a bevy  
of literature on the topic if you want to get "smart" -- but be  
warned smart and fast are likely not very good friends.


A lot depends on the provenance of your data -- is it clean text that  
uses a lot of domain specific terms? Is it webtext?




Re: Filter by Group

2007-09-19 Thread Pieter Berkel
Sounds like you're on the right track, if your groups overap (i.e. a
document can be in group A and B), then you should ensure your "groups"
field is multivalued.

If you are searching for "foo" in documents contained in group "A", then it
might be more efficient to use a filter query (fq) like:

q=foo&fq=groups:A

See the wiki page on common query parameters for more info:
http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002

cheers,
Piete



On 20/09/2007, mark angelillo <[EMAIL PROTECTED]> wrote:
>
> Hey all,
>
> Let's say I have an index of one hundred documents, and these
> documents are grouped into 4 groups A, B, C, and D. The groups do in
> fact overlap. What would people recommend as the best way to apply a
> search query and return only the documents that are in group A? Also,
> how about if we run the same search query but return only those
> documents in groups A, C and D?
>
> I imagine that I could do this by indexing a text field populated
> with the group names and adding something like "groups:A" to the
> query but I'm wondering if there's a better solution.
>
> Thanks in advance,
> Mark
>
> mark angelillo
> snooth inc.
> o: 646.723.4328
> c: 484.437.9915
> [EMAIL PROTECTED]
> snooth -- 1.7 million ratings and counting...
>
>
>


RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
HI,
What you say is done by hadoop that support Hardware Failure、Data
Replication and some else . 
If we want to implement such a good system by ourselves without HDFS
but Solr , it's a very very complex work I think. :) 
I just want to know whether there is a component existed can do the
distributed search based on Solr.

Thanks 
Jarvis.

-Original Message-
From: Norberto Meijome [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 20, 2007 10:06 AM
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Thu, 20 Sep 2007 09:37:51 +0800
"Jarvis" <[EMAIL PROTECTED]> wrote:

> If we use the RPC call in nutch .
Hi,
I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in
this
league to be suggesting architecture stuff :) but i imagine there's nothing
wrong with using what they've built if it addresses solr's needs.

>  Manually separate the index is required .

hmm i imagine this really depends on the application. In my case, this
separation of which docs go where happens @ a completely different layer.

> We will receive reduplicate result if there is reduplicate index document
on
> different servers. 

Maybe I got this wrong...but isn't this what mapreduce is meant to deal
with?
eg, 

1) get the job (a query)
2) map it to workers ( servers that provide search results from their own
indexing)
3) wait for the results from all workers that reply within acceptable
timeframe.
4) comb through the lot of  results from all workers, reduce them according
to
your own biz rules (eg, remove dupes, sort them by quality / priority...
here possibly relying on the original parameters of the query in 1)
5) return the reduced results to the frontend.

> And also the data updating and single server's error is
> hard to deal with.

this really depends on your infrastructure + design. 

Having the indexing , searching and providing of results in different layers
should make for some interesting design options...

If each searcher (or wherever the index resides) is really a small cluster
of
servers , the issue of data safety / server error is addressed @ that point.
You can also have repeated data across indexes (again, independent indexes)
and
that's a more ... randomised :) way of keeping the docs safe... For example,
IIRC, googleFS keeps copies of each file in 3 servers or more...

cheers,
B
_
{Beto|Norberto|Numard} Meijome

"He uses statistics as a drunken man uses lamp-posts ... for support rather
than illumination." Andrew Lang (1844-1912)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



Re: Term extraction

2007-09-19 Thread Pieter Berkel
Thanks Brian, I think the "smart" approaches you refer to might be outside
the scope of my current project.  The documents I am indexing already have
manually-generated keyword data, moving forward I'd like to have these
keywords automatically generated, selected from a pre-defined list of
keywords (i.e. the "simple" approach).

The data is fairly clean and domain-specific so I don't expect there will be
more than several hundred of these phrase terms to deal with, which is why I
was exploring the SynonymFilterFactory option.

Pieter



On 20/09/2007, Brian Whitman <[EMAIL PROTECTED]> wrote:
>
> On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:
>
> > I'm currently looking at methods of term extraction and automatic
> > keyword
> > generation from indexed documents.
>
> We do it manually (not in solr, but we put the results in solr.) We
> do it the usual way - chunk (into n-grams, named entities & noun
> phrases) and count (tf & df). It works well enough. There is a bevy
> of literature on the topic if you want to get "smart" -- but be
> warned smart and fast are likely not very good friends.
>
> A lot depends on the provenance of your data -- is it clean text that
> uses a lot of domain specific terms? Is it webtext?
>
>


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Mike Klaas

On 19-Sep-07, at 7:21 PM, Jarvis wrote:


HI,
What you say is done by hadoop that support Hardware Failure、Data
Replication and some else .
If we want to implement such a good system by ourselves without HDFS
but Solr , it's a very very complex work I think. :)
I just want to know whether there is a component existed can do the
distributed search based on Solr.


https://issues.apache.org/jira/browse/SOLR-303? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel


regards,
-Mike

Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Hi, Pieter,

Thanks!  Now the exception is gone. However, There's no snapshot file
created in the data directory. Strangely, the snapshooter.log seems to
complete successfully.  Any idea what else I'm missing?

$ cat var/SolrHome/solr/logs/snapshooter.log
2007/09/19 20:16:17 started by solruser
2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2
2007/09/19 20:16:17 taking snapshot
var/SolrHome/solr/data/snapshot.20070919201617
2007/09/19 20:16:17 ended (elapsed time: 0 sec)

Thanks,

-Hui




On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
>
> See this recent thread for some helpful info:
>
> http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792
>
> You'll probably want to configure your exe with an absolute path rather
> than
> the dir:
>
>   /var/SolrHome/solr/bin/snapshooter
>   .
>
> In order to get the snapshooter working correctly.
>
> cheers,
> Piete
>
>
>
> On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote:
> >
> > Hi, there,
> >
> > I used an absolute path for the "dir" param in the solrconfig.xml as
> > below:
> >
> > 
> >   snapshooter
> >   /var/SolrHome/solr/bin
> >   true
> >arg1 arg2 
> >MYVAR=val1 
> > 
> >
> > However, I got "snapshooter: not found"  exception thrown in
> catalina.out.
> > I don't see why this doesn't work. Anything I'm missing?
> >
> >
> > Many thanks,
> >
> > -Hui
> >
>



-- 
Regards,

-Hui


Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Pieter Berkel
If you don't need to pass any command line arguments to snapshooter, remove
(or comment out) this line from solrconfig.xml:

 arg1 arg2 

By the same token, if you're not setting environment variables either,
remove the following line as well:

 MYVAR=val1 

Once you alter / remove those two lines, snapshooter should function as
expected.

cheers,
Piete



On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote:
>
> Hi, Pieter,
>
> Thanks!  Now the exception is gone. However, There's no snapshot file
> created in the data directory. Strangely, the snapshooter.log seems to
> complete successfully.  Any idea what else I'm missing?
>
> $ cat var/SolrHome/solr/logs/snapshooter.log
> 2007/09/19 20:16:17 started by solruser
> 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2
> 2007/09/19 20:16:17 taking snapshot
> var/SolrHome/solr/data/snapshot.20070919201617
> 2007/09/19 20:16:17 ended (elapsed time: 0 sec)
>
> Thanks,
>
> -Hui
>
>
>
>
> On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
> >
> > See this recent thread for some helpful info:
> >
> >
> http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792
> >
> > You'll probably want to configure your exe with an absolute path rather
> > than
> > the dir:
> >
> >   /var/SolrHome/solr/bin/snapshooter
> >   .
> >
> > In order to get the snapshooter working correctly.
> >
> > cheers,
> > Piete
> >
> >
> >
> > On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote:
> > >
> > > Hi, there,
> > >
> > > I used an absolute path for the "dir" param in the solrconfig.xml as
> > > below:
> > >
> > > 
> > >   snapshooter
> > >   /var/SolrHome/solr/bin
> > >   true
> > >arg1 arg2 
> > >MYVAR=val1 
> > > 
> > >
> > > However, I got "snapshooter: not found"  exception thrown in
> > catalina.out.
> > > I don't see why this doesn't work. Anything I'm missing?
> > >
> > >
> > > Many thanks,
> > >
> > > -Hui
> > >
> >
>
>
>
> --
> Regards,
>
> -Hui
>


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:02:08 +0800
"Jarvis" <[EMAIL PROTECTED]> wrote:

> You can see the code in "org.apache.nutch.searcher.NutchBean" class . :)

thx for the pointer.

_
{Beto|Norberto|Numard} Meijome

"In order to avoid being called a flirt, she always yielded easily."
  Charles, Count Talleyrand

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:21:39 +0800
"Jarvis" <[EMAIL PROTECTED]> wrote:

>   What you say is done by hadoop that support Hardware Failure、Data
> Replication and some else . 
>   If we want to implement such a good system by ourselves without HDFS
> but Solr , it's a very very complex work I think. :) 
>   I just want to know whether there is a component existed can do the
> distributed search based on Solr.

Thanks for the info.

Risking starting up  a flame war (which is not my intention :) ), what
design reasons / features are there in Solr but not in hadoop/nutch that
would make it compelling to use solr instead of h/n ? 

I know, each case is
different the feeling i got from a shortish read into h/n was that H/N is
geared towards webpage indexing, crawling,etc.  But possibly i'm missing
something...

Where Solr is , from my point of view, far more flexible. In which case, maybe
porting HDFS into Solr to add all this clustering / map/reduce options...

thanks for your time and insights :)
B
_
{Beto|Norberto|Numard} Meijome

Windows caters to everyone as though they are idiots. UNIX makes no such
assumption. It assumes you know what you are doing, and presents the challenge
of figuring it out for yourself if you don't.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Venkatraman S
Along similar lines :

assuming that i have 2 indexes in the same box  , say at :
/home/abc/data/index1 and  /home/abc/data/index2,
and i want the results from both the indexes when i do a search - then how
should this be 'optimally' designed - basically these are different Solr
homes and i want the results to be clearly demarcated as coming from 2
different sources.

-Venkat

On 9/20/07, Norberto Meijome <[EMAIL PROTECTED]> wrote:
>
> On Thu, 20 Sep 2007 10:21:39 +0800
> "Jarvis" <[EMAIL PROTECTED]> wrote:
>
> >   What you say is done by hadoop that support Hardware Failure、Data
> > Replication and some else .
> >   If we want to implement such a good system by ourselves without
> HDFS
> > but Solr , it's a very very complex work I think. :)
> >   I just want to know whether there is a component existed can do
> the
> > distributed search based on Solr.
>
> Thanks for the info.
>
> Risking starting up  a flame war (which is not my intention :) ), what
> design reasons / features are there in Solr but not in hadoop/nutch that
> would make it compelling to use solr instead of h/n ?
>
> I know, each case is
> different the feeling i got from a shortish read into h/n was that H/N
> is
> geared towards webpage indexing, crawling,etc.  But possibly i'm missing
> something...
>
> Where Solr is , from my point of view, far more flexible. In which case,
> maybe
> porting HDFS into Solr to add all this clustering / map/reduce options...
>
> thanks for your time and insights :)
> B
> _
> {Beto|Norberto|Numard} Meijome
>
> Windows caters to everyone as though they are idiots. UNIX makes no such
> assumption. It assumes you know what you are doing, and presents the
> challenge
> of figuring it out for yourself if you don't.
>
> I speak for myself, not my employer. Contents may be hot. Slippery when
> wet.
> Reading disclaimers makes you go blind. Writing them is worse. You have
> been
> Warned.
>



--


Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
On 9/20/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> you imply that you are building your index using embedded solr, but based
> on your stack trace it seems you are using Solr in a servlet container ...
> i assume to search the index you've already built?


I  have  a jsp that routes  the info from  a drupal module to my Embedded
solr app.

Does this case arise when i do a search when there is no index?? -  If yes,
then i guess the Exception can be made more meaningful.

Is the embeddd core completley closed before your servlet cotainer running
> Solr is started?  what does hte directly list look like in between the
> finish of A and the start of B?


yes - it is closed ; but i guess this problem arises when i do a search when
no index is created - can you confirm this.

-Venkat


Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Thanks, it works now.


regards,
-Hui


On 9/19/07, Pieter Berkel <[EMAIL PROTECTED] > wrote:
>
> If you don't need to pass any command line arguments to snapshooter,
> remove
> (or comment out) this line from solrconfig.xml:
>
>  arg1 arg2 
>
> By the same token, if you're not setting environment variables either,
> remove the following line as well:
>
>  MYVAR=val1 
>
> Once you alter / remove those two lines, snapshooter should function as
> expected.
>
> cheers,
> Piete
>
>
>
> On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote:
> >
> > Hi, Pieter,
> >
> > Thanks!  Now the exception is gone. However, There's no snapshot file
> > created in the data directory. Strangely, the snapshooter.log seems to
> > complete successfully.  Any idea what else I'm missing?
> >
> > $ cat var/SolrHome/solr/logs/snapshooter.log
> > 2007/09/19 20:16:17 started by solruser
> > 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1
> arg2
> > 2007/09/19 20:16:17 taking snapshot
> > var/SolrHome/solr/data/snapshot.20070919201617
> > 2007/09/19 20:16:17 ended (elapsed time: 0 sec)
> >
> > Thanks,
> >
> > -Hui
> >
> >
> >
> >
> > On 9/19/07, Pieter Berkel <[EMAIL PROTECTED]> wrote:
> > >
> > > See this recent thread for some helpful info:
> > >
> > >
> > http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792
>
> > >
> > > You'll probably want to configure your exe with an absolute path
> rather
> > > than
> > > the dir:
> > >
> > >   /var/SolrHome/solr/bin/snapshooter
> > >   .
> > >
> > > In order to get the snapshooter working correctly.
> > >
> > > cheers,
> > > Piete
> > >
> > >
> > >
> > > On 20/09/2007, Yu-Hui Jin <[EMAIL PROTECTED]> wrote:
> > > >
> > > > Hi, there,
> > > >
> > > > I used an absolute path for the "dir" param in the solrconfig.xml as
> > > > below:
> > > >
> > > > 
> > > >   snapshooter
> > > >   /var/SolrHome/solr/bin
> > > >   true
> > > >arg1 arg2 
> > > >MYVAR=val1 
> > > > 
> > > >
> > > > However, I got "snapshooter: not found"  exception thrown in
> > > catalina.out.
> > > > I don't see why this doesn't work. Anything I'm missing?
> > > >
> > > >
> > > > Many thanks,
> > > >
> > > > -Hui
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > -Hui
> >
>



-- 
Regards,

-Hui