Re: Own Similarity Class in Solr
:I would like to alter the similarity behaviour of solr to remove : the fieldnorm factor in the similarity calculations. As far as I : read, I need to recreate my own similarity class and import it into : solr using the config in schema.xml. : :Has anybody already tweaked or played with this topic, and might : give me some code or advices ? as you're already noticed, you can specify the Similarity class at runtime via the schema.xml -- the only Solr specific aspect of this making sure your Similarity class is in your servlet containers classpath (exactly how you do this depends on your servlet container) searching the java-dev and java-user Lucene mailing lists is the best bet for finding discussions on writing your own similarity, there are also some examples in the main Lucene code base... contrib/miscellaneous/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java src/test/org/apache/lucene/search/TestDisjunctionMaxQuery.java ...if your main interest is just eliminating norms, there is a special option for that in Lucene Fields called "Omit Norms" (it not only eliminates the effects of norms on scoring, but it saves space in your index as well) in Solr you can turn it on/off per or using the omitNorms="true" option in the schema.xml -Hoss
Re: Own Similarity Class in Solr
Hi Chris, thanks for the details, I am meanwhile poking around with my own class which I defined in the schema.xml everything is working perfectly there. But I have still the problem with the normalization, I try to change several parameters to fix it to 1.0, this does indeed change the scoring but still not the real way I need it. It seems that it is always the "fieldNorm" which is playing, but where is this field really from ? In the Similarity Class I don't find this term to alter. Let me give a short example what goes wrong : I have a field "searchname" with a boost of "3.0" during the document.add. Another field "text" is a copyField of several entries, this one does not have a boost factor, but indeed more data in it. In this text is a copy of a field where the text searched is in there 3times. This entry has the score : 5.5930133 But I have also entries where the searchname has the same word in it, but this one have a score of 1.9975047. Currently my class is like this (I took the DefaultSimilarity as a basis); - lengthNorm is fixed to 1.0 - tf fixed to 1.0 - idf fixed to 1.0 With these changes, might it be possible that I've deactivated the boost on the different Fields. What I need is, a search, which will handle each document the same, regardless of the frequency and the size, it shall calculate the score only on the boost factors, so a document with a hight boostfactor and the same text in it as another one with less factor shall be before the others. Something I do might be completely wrong, perhaps You have an idea ? Thanks, Tom
add/update index
Hi, I have created a process which uses xsl to convert my data to the form indicated in the examples so that it can be added to the index as the solr tutorial indicates: value ... In some cases the xsl process will create a field element with no data. (ie ) Is this considered bad input and will not be accepted? Or is this something that solr should deal with? Currently for each field element with no data I receive the message: java.lang.NullPointerException at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:78) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:74) at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917) at org.apache.solr.core.SolrCore.update(SolrCore.java:685) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52) ... Just curious if the gurus out there think I should deal with the null values in my xsl process or if this can be dealt with in solr itself? Thanks, Tricia ps. Thanks for the timely fix for the UTF-8 issue!
Re: add/update index
On 7/27/06, Tricia Williams <[EMAIL PROTECTED]> wrote: Hi, I have created a process which uses xsl to convert my data to the form indicated in the examples so that it can be added to the index as the solr tutorial indicates: value ... In some cases the xsl process will create a field element with no data. (ie ) Is this considered bad input and will not be accepted? If the desired semantics are "the field doesn't exist" or "null value" then yes. There isn't a way to represent a field without a value in Lucene except to not add the field for that document. If it's totally ignored, it probably shouldn't be in the XML. Now, one might think we could drop fields with no value, but that's problematic because it goes against the XML standard: http://www.w3.org/TR/REC-xml/#sec-starttags [Definition: An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. [Definition: An empty-element tag takes a special form:] So and are supposed to be equivalent. Given that, it does look like Solr should treat like a zero-length string (but that's not what you wanted, right?) -Yonik
Re: add/update index
Thanks Yonik, That's exactly what I needed to know. I'll adapt my xsl process to omit null values. Tricia On Thu, 27 Jul 2006, Yonik Seeley wrote: On 7/27/06, Tricia Williams <[EMAIL PROTECTED]> wrote: Hi, I have created a process which uses xsl to convert my data to the form indicated in the examples so that it can be added to the index as the solr tutorial indicates: value ... In some cases the xsl process will create a field element with no data. (ie ) Is this considered bad input and will not be accepted? If the desired semantics are "the field doesn't exist" or "null value" then yes. There isn't a way to represent a field without a value in Lucene except to not add the field for that document. If it's totally ignored, it probably shouldn't be in the XML. Now, one might think we could drop fields with no value, but that's problematic because it goes against the XML standard: http://www.w3.org/TR/REC-xml/#sec-starttags [Definition: An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. [Definition: An empty-element tag takes a special form:] So and are supposed to be equivalent. Given that, it does look like Solr should treat like a zero-length string (but that's not what you wanted, right?) -Yonik
Solr's JSON, Python, Ruby output format
Solr now has a JSON response format, in addition to Python and Ruby versions that can be directly eval'd. http://wiki.apache.org/solr/SolJSON -Yonik
Re: Doc add limit
On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: I removed everything from the Add xml so the docs looked like this: 187880 187852 and it still hung at 6,144... Maybe you can try the following simple Python client to try and rule out some kind of different client interactions... the attached script adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17 and Jetty -Yonik solr.py -- import httplib import socket class SolrConnection: def __init__(self, host='localhost:8983', solrBase='/solr'): self.host = host self.solrBase = solrBase #a connection to the server is not opened at this point. self.conn = httplib.HTTPConnection(self.host) #self.conn.set_debuglevel(100) self.postheaders = {"Connection":"close"} def doUpdateXML(self, request): try: self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) except (socket.error,httplib.CannotSendRequest) : #reconnect in case the connection was broken from the server going down, #the server timing out our persistent connection, or another #network failure. #Also catch httplib.CannotSendRequest because the HTTPConnection object #can get in a bad state. self.conn.close() self.conn.connect() self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) rsp = self.conn.getresponse() #print rsp.status, rsp.reason data = rsp.read() #print "data=",data self.conn.close() def delete(self, id): xstr = ''+id+'' self.doUpdateXML(xstr) def add(self, **fields): #todo: XML escaping flist=['%s' % f for f in fields.items() ] flist.insert(0,'') flist.append('') xstr = ''.join(flist) self.doUpdateXML(xstr) c = SolrConnection() #for i in range(1): # c.delete(str(i)) for i in range(1): c.add(id=i)
Re: Doc add limit
Yonik, It looks like the problem is with the way I'm posting to the SolrUpdate servlet. I am able to use curl to post the data to my tomcat instance without a problem. It only fails when I try to handle the http post from java... my code is below: URL url = new URL("http://localhost:8983/solr/update";); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/octet-stream"); conn.setDoOutput(true); conn.setDoInput(true); conn.setUseCaches(false); // Write to server log.info("About to post to SolrUpdate servlet."); DataOutputStream output = new DataOutputStream(conn.getOutputStream ()); output.writeBytes(sw); output.flush(); log.info("Finished posting to SolrUpdate servlet."); -Sangraal On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > I removed everything from the Add xml so the docs looked like this: > > > 187880 > > > 187852 > > > and it still hung at 6,144... Maybe you can try the following simple Python client to try and rule out some kind of different client interactions... the attached script adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17 and Jetty -Yonik solr.py -- import httplib import socket class SolrConnection: def __init__(self, host='localhost:8983', solrBase='/solr'): self.host = host self.solrBase = solrBase #a connection to the server is not opened at this point. self.conn = httplib.HTTPConnection(self.host) #self.conn.set_debuglevel(100) self.postheaders = {"Connection":"close"} def doUpdateXML(self, request): try: self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) except (socket.error,httplib.CannotSendRequest) : #reconnect in case the connection was broken from the server going down, #the server timing out our persistent connection, or another #network failure. #Also catch httplib.CannotSendRequest because the HTTPConnection object #can get in a bad state. self.conn.close() self.conn.connect() self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) rsp = self.conn.getresponse() #print rsp.status, rsp.reason data = rsp.read() #print "data=",data self.conn.close() def delete(self, id): xstr = ''+id+'' self.doUpdateXML(xstr) def add(self, **fields): #todo: XML escaping flist=['%s' % f for f in fields.items() ] flist.insert(0,'') flist.append('') xstr = ''.join(flist) self.doUpdateXML(xstr) c = SolrConnection() #for i in range(1): # c.delete(str(i)) for i in range(1): c.add(id=i)
Re: Doc add limit
On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: class SolrConnection: def __init__(self, host='localhost:8983', solrBase='/solr'): self.host = host self.solrBase = solrBase #a connection to the server is not opened at this point. self.conn = httplib.HTTPConnection(self.host) #self.conn.set_debuglevel(100) self.postheaders = {"Connection":"close"} def doUpdateXML(self, request): try: self.conn.request('POST', self.solrBase+'/update', request, self.postheaders) Disgressive note: I'm not sure if it is necessary with tomcat, but in my experience driving solr with python using Jetty, it was necessary to specify the content-type when posting utf-8 data: self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) -Mike
Re: Doc add limit
Mike, I've been posting with the content type set like this: conn.setRequestProperty("Content-Type", "application/octet-stream"); I tried your suggestion though, and unfortunately there was no change. conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8"); -Sangraal On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > class SolrConnection: > def __init__(self, host='localhost:8983', solrBase='/solr'): > self.host = host > self.solrBase = solrBase > #a connection to the server is not opened at this point. > self.conn = httplib.HTTPConnection(self.host) > #self.conn.set_debuglevel(100) > self.postheaders = {"Connection":"close"} > > def doUpdateXML(self, request): > try: > self.conn.request('POST', self.solrBase+'/update', request, > self.postheaders) Disgressive note: I'm not sure if it is necessary with tomcat, but in my experience driving solr with python using Jetty, it was necessary to specify the content-type when posting utf-8 data: self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) -Mike
Re: Doc add limit
Hi Sangraal: Sorry--I tried not to imply that this might affect your issue. You may have to crank up the solr logging to determine where it is freezing (and what might be happening). It is certainly worth investigating why this occurs, but I wonder about the advantages of using such huge batches. Assuming a few hundred bytes per document, 6100 docs produces a POST over 1MB in size. -Mike On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: Mike, I've been posting with the content type set like this: conn.setRequestProperty("Content-Type", "application/octet-stream"); I tried your suggestion though, and unfortunately there was no change. conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8"); -Sangraal On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: > > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > class SolrConnection: > > def __init__(self, host='localhost:8983', solrBase='/solr'): > > self.host = host > > self.solrBase = solrBase > > #a connection to the server is not opened at this point. > > self.conn = httplib.HTTPConnection(self.host) > > #self.conn.set_debuglevel(100) > > self.postheaders = {"Connection":"close"} > > > > def doUpdateXML(self, request): > > try: > > self.conn.request('POST', self.solrBase+'/update', request, > > self.postheaders) > > Disgressive note: I'm not sure if it is necessary with tomcat, but in > my experience driving solr with python using Jetty, it was necessary > to specify the content-type when posting utf-8 data: > > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) > > -Mike >
Re: Doc add limit
I think you're right... I will probably work on splitting the batches up into smaller pieces at some point in the future. I think I will need the capability to do large batches at some point though, so I want to make sure the system can handle it. I also want to make sure this problem doesn't pop up and bite me later. -Sangraal On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: Hi Sangraal: Sorry--I tried not to imply that this might affect your issue. You may have to crank up the solr logging to determine where it is freezing (and what might be happening). It is certainly worth investigating why this occurs, but I wonder about the advantages of using such huge batches. Assuming a few hundred bytes per document, 6100 docs produces a POST over 1MB in size. -Mike On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Mike, > I've been posting with the content type set like this: > conn.setRequestProperty("Content-Type", "application/octet-stream"); > > I tried your suggestion though, and unfortunately there was no change. > conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8"); > > -Sangraal > > > On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > > class SolrConnection: > > > def __init__(self, host='localhost:8983', solrBase='/solr'): > > > self.host = host > > > self.solrBase = solrBase > > > #a connection to the server is not opened at this point. > > > self.conn = httplib.HTTPConnection(self.host) > > > #self.conn.set_debuglevel(100) > > > self.postheaders = {"Connection":"close"} > > > > > > def doUpdateXML(self, request): > > > try: > > > self.conn.request('POST', self.solrBase+'/update', request, > > > self.postheaders) > > > > Disgressive note: I'm not sure if it is necessary with tomcat, but in > > my experience driving solr with python using Jetty, it was necessary > > to specify the content-type when posting utf-8 data: > > > > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) > > > > -Mike > > > >
Re: Doc add limit
Are you reading the response and closing the connection? If not, you are probably running out of socket connections. -Yonik On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: Yonik, It looks like the problem is with the way I'm posting to the SolrUpdate servlet. I am able to use curl to post the data to my tomcat instance without a problem. It only fails when I try to handle the http post from java... my code is below: URL url = new URL("http://localhost:8983/solr/update";); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/octet-stream"); conn.setDoOutput(true); conn.setDoInput(true); conn.setUseCaches(false); // Write to server log.info("About to post to SolrUpdate servlet."); DataOutputStream output = new DataOutputStream(conn.getOutputStream ()); output.writeBytes(sw); output.flush(); log.info("Finished posting to SolrUpdate servlet."); -Sangraal On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > > I removed everything from the Add xml so the docs looked like this: > > > > > > 187880 > > > > > > 187852 > > > > > > and it still hung at 6,144... > > Maybe you can try the following simple Python client to try and rule > out some kind of different client interactions... the attached script > adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17 > and Jetty > > -Yonik > > > solr.py -- > import httplib > import socket > > class SolrConnection: > def __init__(self, host='localhost:8983', solrBase='/solr'): > self.host = host > self.solrBase = solrBase > #a connection to the server is not opened at this point. > self.conn = httplib.HTTPConnection(self.host) > #self.conn.set_debuglevel(100) > self.postheaders = {"Connection":"close"} > > def doUpdateXML(self, request): > try: > self.conn.request('POST', self.solrBase+'/update', request, > self.postheaders) > except (socket.error,httplib.CannotSendRequest) : > #reconnect in case the connection was broken from the server going > down, > #the server timing out our persistent connection, or another > #network failure. > #Also catch httplib.CannotSendRequest because the HTTPConnection > object > #can get in a bad state. > self.conn.close() > self.conn.connect() > self.conn.request('POST', self.solrBase+'/update', request, > self.postheaders) > > rsp = self.conn.getresponse() > #print rsp.status, rsp.reason > data = rsp.read() > #print "data=",data > self.conn.close() > > def delete(self, id): > xstr = ''+id+'' > self.doUpdateXML(xstr) > > def add(self, **fields): > #todo: XML escaping > flist=['%s' % f for f in fields.items() ] > flist.insert(0,'') > flist.append('') > xstr = ''.join(flist) > self.doUpdateXML(xstr) > > c = SolrConnection() > #for i in range(1): > # c.delete(str(i)) > for i in range(1): > c.add(id=i)
Re: Doc add limit
Yeah, I'm closing them. Here's the method: - private String doUpdate(String sw) { StringBuffer updateResult = new StringBuffer(); try { // open connection log.info("Connecting to and preparing to post to SolrUpdate servlet."); URL url = new URL("http://localhost:8080/update";); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setRequestProperty("Content-Type", "application/octet-stream"); conn.setDoOutput(true); conn.setDoInput(true); conn.setUseCaches(false); // Write to server log.info("About to post to SolrUpdate servlet."); DataOutputStream output = new DataOutputStream(conn.getOutputStream ()); output.writeBytes(sw); output.flush(); output.close(); log.info("Finished posting to SolrUpdate servlet."); // Read response log.info("Ready to read response."); BufferedReader rd = new BufferedReader(new InputStreamReader( conn.getInputStream())); log.info("Got reader"); String line; while ((line = rd.readLine()) != null) { log.info("Writing to result..."); updateResult.append(line); } rd.close(); // close connections conn.disconnect(); log.info("Done updating Solr for site" + updateSite); } catch (Exception e) { e.printStackTrace(); } return updateResult.toString(); } } -Sangraal On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Are you reading the response and closing the connection? If not, you are probably running out of socket connections. -Yonik On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Yonik, > It looks like the problem is with the way I'm posting to the SolrUpdate > servlet. I am able to use curl to post the data to my tomcat instance > without a problem. It only fails when I try to handle the http post from > java... my code is below: > > URL url = new URL("http://localhost:8983/solr/update";); > HttpURLConnection conn = (HttpURLConnection) url.openConnection(); > conn.setRequestMethod("POST"); > conn.setRequestProperty("Content-Type", "application/octet-stream"); > conn.setDoOutput(true); > conn.setDoInput(true); > conn.setUseCaches(false); > > // Write to server > log.info("About to post to SolrUpdate servlet."); > DataOutputStream output = new DataOutputStream( conn.getOutputStream > ()); > output.writeBytes(sw); > output.flush(); > log.info("Finished posting to SolrUpdate servlet."); > > -Sangraal > > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > > > I removed everything from the Add xml so the docs looked like this: > > > > > > > > > 187880 > > > > > > > > > 187852 > > > > > > > > > and it still hung at 6,144... > > > > Maybe you can try the following simple Python client to try and rule > > out some kind of different client interactions... the attached script > > adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17 > > and Jetty > > > > -Yonik > > > > > > solr.py -- > > import httplib > > import socket > > > > class SolrConnection: > > def __init__(self, host='localhost:8983', solrBase='/solr'): > > self.host = host > > self.solrBase = solrBase > > #a connection to the server is not opened at this point. > > self.conn = httplib.HTTPConnection(self.host) > > #self.conn.set_debuglevel(100) > > self.postheaders = {"Connection":"close"} > > > > def doUpdateXML(self, request): > > try: > > self.conn.request('POST', self.solrBase+'/update', request, > > self.postheaders) > > except (socket.error,httplib.CannotSendRequest) : > > #reconnect in case the connection was broken from the server going > > down, > > #the server timing out our persistent connection, or another > > #network failure. > > #Also catch httplib.CannotSendRequest because the HTTPConnection > > object > > #can get in a bad state. > > self.conn.close() > > self.conn.connect() > > self.conn.request('POST', self.solrBase+'/update', request, > > self.postheaders) > > > > rsp = self.conn.getresponse() > > #print rsp.status, rsp.reason > > data = rsp.read() > > #print "data=",data > > self.conn.close() > > > > def delete(self, id): > > xstr = ''+id+'' > > self.doUpdateXML(xstr) > > > > def add(self, **fields): > > #todo: XML escaping > > flist=['%s' % f for f in fields.items() ] > > flist.insert(0,'') > > flist.append('') > > xstr = ''.join(flist) > > self.doUpdateXML(xstr) > > > > c = SolrConnection() > > #for i in range(1): > > # c.delete(str(i)) > > for i in range(1): > > c.add(id=i)
Re: Doc add limit
I haven't been following the thread, but Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit (set somewhere in its configs) that may be the source of the problem. Otis P.S. Just occurred to me. Tomcat. Jetty. Tom & Jerry. Jetty guys should have called their thing Jerry or Jerrymouse. - Original Message From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, July 27, 2006 6:33:16 PM Subject: Re: Doc add limit Hi Sangraal: Sorry--I tried not to imply that this might affect your issue. You may have to crank up the solr logging to determine where it is freezing (and what might be happening). It is certainly worth investigating why this occurs, but I wonder about the advantages of using such huge batches. Assuming a few hundred bytes per document, 6100 docs produces a POST over 1MB in size. -Mike On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Mike, > I've been posting with the content type set like this: > conn.setRequestProperty("Content-Type", "application/octet-stream"); > > I tried your suggestion though, and unfortunately there was no change. > conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8"); > > -Sangraal > > > On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote: > > > > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > > class SolrConnection: > > > def __init__(self, host='localhost:8983', solrBase='/solr'): > > > self.host = host > > > self.solrBase = solrBase > > > #a connection to the server is not opened at this point. > > > self.conn = httplib.HTTPConnection(self.host) > > > #self.conn.set_debuglevel(100) > > > self.postheaders = {"Connection":"close"} > > > > > > def doUpdateXML(self, request): > > > try: > > > self.conn.request('POST', self.solrBase+'/update', request, > > > self.postheaders) > > > > Disgressive note: I'm not sure if it is necessary with tomcat, but in > > my experience driving solr with python using Jetty, it was necessary > > to specify the content-type when posting utf-8 data: > > > > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'}) > > > > -Mike > > > >
Re: Doc add limit
I'm running on Tomcat... and I've verified that the complete post is making it through the SolrUpdate servlet and into the SolrCore object... thanks for the info though. -- So the code is hanging on this call in SolrCore.java writer.write(""); The thread dump: "http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu time=40698.0440ms user time=38646.1680ms at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes( InternalOutputBuffer.java:746) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:433) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite (InternalOutputBuffer.java:769) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite( ChunkedOutputFilter.java:125) at org.apache.coyote.http11.InternalOutputBuffer.doWrite( InternalOutputBuffer.java:579) at org.apache.coyote.Response.doWrite(Response.java:559) at org.apache.catalina.connector.OutputBuffer.realWriteBytes( OutputBuffer.java:361) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324) at org.apache.tomcat.util.buf.IntermediateOutputStream.write( C2BConverter.java:235) at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java :336) at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer( StreamEncoder.java:404) at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213) at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java :184) at org.apache.tomcat.util.buf.C2BConverter.flushBuffer( C2BConverter.java:127) at org.apache.catalina.connector.OutputBuffer.realWriteChars( OutputBuffer.java:536) at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java:439) at org.apache.tomcat.util.buf.CharChunk.append(CharChunk.java:370) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java :491) at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java :161) at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java :170) at org.apache.solr.core.SolrCore.update(SolrCore.java:695) at org.apache.solr.servlet.SolrUpdateServlet.doPost( SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process( Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) at java.lang.Thread.run(Thread.java:613) On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: I haven't been following the thread, but Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit (set somewhere in its configs) that may be the source of the problem. Otis P.S. Just occurred to me. Tomcat. Jetty. Tom & Jerry. Jetty guys should have called their thing Jerry or Jerrymouse. - Original Message From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, July 27, 2006 6:33:16 PM Subject: Re: Doc add limit Hi Sangraal: Sorry--I tried not to imply that this might affect your issue. You may have to crank up the solr logging to determine where it is freezing (and what might be happening). It is certainly worth investigating why this occurs, but I wonder about the advantages of using such huge batches. Assuming a few hundred bytes per document, 6100 docs produces a POST over 1MB in size. -Mike On 7/27/06, sangraal aiken <[EMAIL P
Re: Doc add limit
You might also try the Java update client here: http://issues.apache.org/jira/browse/SOLR-20 -Yonik
Re: Doc add limit
Commenting out the following line in SolrCore fixes my problem... but of course I don't get the result status info... but this isn't a problem for me really. -Sangraal writer.write(""); On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: I'm running on Tomcat... and I've verified that the complete post is making it through the SolrUpdate servlet and into the SolrCore object... thanks for the info though. -- So the code is hanging on this call in SolrCore.java writer.write(""); The thread dump: "http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu time= 40698.0440ms user time=38646.1680ms at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java :92) at java.net.SocketOutputStream.write (SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes( InternalOutputBuffer.java:746) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java :433) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite (InternalOutputBuffer.java:769) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite ( ChunkedOutputFilter.java:125) at org.apache.coyote.http11.InternalOutputBuffer.doWrite( InternalOutputBuffer.java:579) at org.apache.coyote.Response.doWrite(Response.java:559) at org.apache.catalina.connector.OutputBuffer.realWriteBytes ( OutputBuffer.java:361) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324) at org.apache.tomcat.util.buf.IntermediateOutputStream.write( C2BConverter.java:235) at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes (StreamEncoder.java :336) at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer( StreamEncoder.java:404) at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java :408) at sun.nio.cs.StreamEncoder.flush (StreamEncoder.java:152) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213) at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java :184) at org.apache.tomcat.util.buf.C2BConverter.flushBuffer ( C2BConverter.java:127) at org.apache.catalina.connector.OutputBuffer.realWriteChars( OutputBuffer.java:536) at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java :439) at org.apache.tomcat.util.buf.CharChunk.append (CharChunk.java:370) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java :491) at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java :161) at org.apache.catalina.connector.CoyoteWriter.write ( CoyoteWriter.java:170) at org.apache.solr.core.SolrCore.update(SolrCore.java:695) at org.apache.solr.servlet.SolrUpdateServlet.doPost( SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java :252) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke ( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process ( Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java :684) at java.lang.Thread.run(Thread.java:613) On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED] > wrote: > > I haven't been following the thread, but > Not sure if you are using Tomcat or Jetty, but Jetty has a POST size > limit (set somewhere in its configs) that may be the source of the problem. > > Otis > P.S. > Just occurred to me. > Tomcat. Jetty. Tom & Jerry. Jetty guys should have called their thing > Jerry or Jerrymouse. > > - Original Message > From: Mike Klaas < [EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, July 27, 2006 6:33:16 PM > Subject: Re: Doc add limit > > Hi Sangraal: > > Sorry--I tried not to imply that this might affect your issue. You > may
Re: Doc add limit
On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: Commenting out the following line in SolrCore fixes my problem... but of course I don't get the result status info... but this isn't a problem for me really. -Sangraal writer.write(""); While it's possible you hit a Tomcat bug, I think it's more likely a client problem. -Yonik
Re: Doc add limit
I'll give that a shot... Thanks again for all your help. -S On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: You might also try the Java update client here: http://issues.apache.org/jira/browse/SOLR-20 -Yonik
Re: Doc add limit
I'm sure... it seems like solr is having trouble writing to a tomcat response that's been inactive for a bit. It's only 30 seconds though, so I'm not entirely sure why that would happen. I use the same client code for DL'ing XSL sheets from external servers and it works fine, but in those instances the server responds much faster to the request. This is an elusive bug for sure. -S On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Commenting out the following line in SolrCore fixes my problem... but of > course I don't get the result status info... but this isn't a problem for me > really. > > -Sangraal > > writer.write(""); While it's possible you hit a Tomcat bug, I think it's more likely a client problem. -Yonik
Re: Doc add limit
: I'm sure... it seems like solr is having trouble writing to a tomcat : response that's been inactive for a bit. It's only 30 seconds though, so I'm : not entirely sure why that would happen. but didn't you say you don't have this problem when you use curl -- just your java client code? Did you try Yonik's python test client? or the java client in Jira? looking over the java clinet codey you sent, it's not clear if you are reading the response back, or closing the connections ... can you post a more complete sample app thatexhibits the problem for you? -Hoss