Re: Own Similarity Class in Solr

2006-07-27 Thread Chris Hostetter

:I would like to alter the similarity behaviour of solr to remove
: the fieldnorm factor in  the similarity calculations. As far as I
: read, I need to recreate my own similarity class and import it into
: solr using the  config in schema.xml.
:
:Has anybody already tweaked or played with this topic, and might
: give me some code or advices ?

as you're already noticed, you can specify the Similarity class at runtime
via the schema.xml -- the only Solr specific aspect of this making sure
your Similarity class is in your servlet containers classpath (exactly how
you do this depends on your servlet container)

searching the java-dev and java-user Lucene mailing lists is the best bet
for finding discussions on writing your own similarity, there are also
some examples in the main Lucene code base...

contrib/miscellaneous/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java
src/test/org/apache/lucene/search/TestDisjunctionMaxQuery.java

...if your main interest is just eliminating norms, there is a special
option for that in Lucene Fields called "Omit Norms" (it not only
eliminates the effects of norms on scoring, but it saves space in your
index as well) in Solr you can turn it on/off per  or 
using the omitNorms="true" option in the schema.xml



-Hoss



Re: Own Similarity Class in Solr

2006-07-27 Thread Tom Weber

Hi Chris,

  thanks for the details, I am meanwhile poking around with my own  
class which I defined in the schema.xml everything is working  
perfectly there.


  But I have still the problem with the normalization, I try to  
change several parameters to fix it to 1.0, this does indeed change  
the scoring but still not the real way I need it. It seems that it is  
always the "fieldNorm" which is playing, but where is this field  
really from ? In the Similarity Class I don't find this term to alter.


  Let me give a short example what goes wrong :

  I have a field "searchname" with a boost of "3.0" during the  
document.add. Another field "text" is a copyField of several entries,  
this one does not have a boost factor, but indeed more data in it. In  
this text is a copy of a field where the text searched is in there  
3times. This entry has the score : 5.5930133


  But I have also entries where the searchname has the same word in  
it, but this one have a score of 1.9975047.


  Currently my class is like this (I took the DefaultSimilarity as a  
basis);


  - lengthNorm is fixed to 1.0
  - tf fixed to 1.0
  - idf fixed to 1.0

  With these changes, might it be possible that I've deactivated the  
boost on the different Fields.


  What I need is, a search, which will handle each document the  
same, regardless of the frequency and the size, it shall calculate  
the score only on the boost factors, so a document with a hight  
boostfactor and the same text in it as another one with less factor  
shall be before the others.


  Something I do might be completely wrong, perhaps You have an idea ?

  Thanks,

   Tom


add/update index

2006-07-27 Thread Tricia Williams

Hi,

   I have created a process which uses xsl to convert my data to the form 
indicated in the examples so that it can be added to the index as the solr 
tutorial indicates:


  
value
...
  


   In some cases the xsl process will create a field element with no data. 
(ie )  Is this considered bad input and will not be 
accepted?  Or is this something that solr should deal with?  Currently for 
each field element with no data I receive the message:

java.lang.NullPointerException
 at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:78)
 at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:74)
 at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917)
 at org.apache.solr.core.SolrCore.update(SolrCore.java:685)
 at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52)
 ...


   Just curious if the gurus out there think I should deal with the null 
values in my xsl process or if this can be dealt with in solr itself?


Thanks,
Tricia

ps.  Thanks for the timely fix for the UTF-8 issue!


Re: add/update index

2006-07-27 Thread Yonik Seeley

On 7/27/06, Tricia Williams <[EMAIL PROTECTED]> wrote:

Hi,

I have created a process which uses xsl to convert my data to the form
indicated in the examples so that it can be added to the index as the solr
tutorial indicates:

   
 value
 ...
   


In some cases the xsl process will create a field element with no data.
(ie )  Is this considered bad input and will not be
accepted?


If the desired semantics are "the field doesn't exist" or "null value"
then yes.  There isn't a way to represent a field without a value in
Lucene except to not add the field for that document.  If it's totally
ignored, it probably shouldn't be in the XML.

Now, one might think we could drop fields with no value, but that's
problematic because it goes against the XML standard:

http://www.w3.org/TR/REC-xml/#sec-starttags
[Definition: An element with no content is said to be empty.] The
representation of an empty element is either a start-tag immediately
followed by an end-tag, or an empty-element tag. [Definition: An
empty-element tag takes a special form:]

So  and  are supposed to be equivalent.  Given that, it
does look like Solr should treat  like a
zero-length string (but that's not what you wanted, right?)

-Yonik


Re: add/update index

2006-07-27 Thread Tricia Williams

Thanks Yonik,

   That's exactly what I needed to know.  I'll adapt my xsl process to 
omit null values.


Tricia

On Thu, 27 Jul 2006, Yonik Seeley wrote:


On 7/27/06, Tricia Williams <[EMAIL PROTECTED]> wrote:

Hi,

I have created a process which uses xsl to convert my data to the form
indicated in the examples so that it can be added to the index as the solr
tutorial indicates:

   
 value
 ...
   


In some cases the xsl process will create a field element with no data.
(ie )  Is this considered bad input and will not be
accepted?


If the desired semantics are "the field doesn't exist" or "null value"
then yes.  There isn't a way to represent a field without a value in
Lucene except to not add the field for that document.  If it's totally
ignored, it probably shouldn't be in the XML.

Now, one might think we could drop fields with no value, but that's
problematic because it goes against the XML standard:

http://www.w3.org/TR/REC-xml/#sec-starttags
[Definition: An element with no content is said to be empty.] The
representation of an empty element is either a start-tag immediately
followed by an end-tag, or an empty-element tag. [Definition: An
empty-element tag takes a special form:]

So  and  are supposed to be equivalent.  Given that, it
does look like Solr should treat  like a
zero-length string (but that's not what you wanted, right?)

-Yonik



Solr's JSON, Python, Ruby output format

2006-07-27 Thread Yonik Seeley

Solr now has a JSON response format, in addition to Python and Ruby
versions that can be directly eval'd.

http://wiki.apache.org/solr/SolJSON

-Yonik


Re: Doc add limit

2006-07-27 Thread Yonik Seeley

On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

I removed everything from the Add xml so the docs looked like this:


187880


187852


and it still hung at 6,144...


Maybe you can try the following simple Python client to try and rule
out some kind of different client interactions... the attached script
adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
and Jetty

-Yonik


 solr.py --
import httplib
import socket

class SolrConnection:
 def __init__(self, host='localhost:8983', solrBase='/solr'):
   self.host = host
   self.solrBase = solrBase
   #a connection to the server is not opened at this point.
   self.conn = httplib.HTTPConnection(self.host)
   #self.conn.set_debuglevel(100)
   self.postheaders = {"Connection":"close"}

 def doUpdateXML(self, request):
   try:
 self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)
   except (socket.error,httplib.CannotSendRequest) :
 #reconnect in case the connection was broken from the server going down,
 #the server timing out our persistent connection, or another
 #network failure.
 #Also catch httplib.CannotSendRequest because the HTTPConnection object
 #can get in a bad state.
 self.conn.close()
 self.conn.connect()
 self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)

   rsp = self.conn.getresponse()
   #print rsp.status, rsp.reason
   data = rsp.read()
   #print "data=",data
   self.conn.close()

 def delete(self, id):
   xstr = ''+id+''
   self.doUpdateXML(xstr)

 def add(self, **fields):
   #todo: XML escaping
   flist=['%s' % f for f in fields.items() ]
   flist.insert(0,'')
   flist.append('')
   xstr = ''.join(flist)
   self.doUpdateXML(xstr)

c = SolrConnection()
#for i in range(1):
#  c.delete(str(i))
for i in range(1):
 c.add(id=i)


Re: Doc add limit

2006-07-27 Thread sangraal aiken

Yonik,
It looks like the problem is with the way I'm posting to the SolrUpdate
servlet. I am able to use curl to post the data to my tomcat instance
without a problem. It only fails when I try to handle the http post from
java... my code is below:

 URL url = new URL("http://localhost:8983/solr/update";);
 HttpURLConnection conn = (HttpURLConnection) url.openConnection();
 conn.setRequestMethod("POST");
 conn.setRequestProperty("Content-Type", "application/octet-stream");
 conn.setDoOutput(true);
 conn.setDoInput(true);
 conn.setUseCaches(false);

 // Write to server
 log.info("About to post to SolrUpdate servlet.");
 DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
 output.writeBytes(sw);
 output.flush();
 log.info("Finished posting to SolrUpdate servlet.");

-Sangraal

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> I removed everything from the Add xml so the docs looked like this:
>
> 
> 187880
> 
> 
> 187852
> 
>
> and it still hung at 6,144...

Maybe you can try the following simple Python client to try and rule
out some kind of different client interactions... the attached script
adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
and Jetty

-Yonik


 solr.py --
import httplib
import socket

class SolrConnection:
  def __init__(self, host='localhost:8983', solrBase='/solr'):
self.host = host
self.solrBase = solrBase
#a connection to the server is not opened at this point.
self.conn = httplib.HTTPConnection(self.host)
#self.conn.set_debuglevel(100)
self.postheaders = {"Connection":"close"}

  def doUpdateXML(self, request):
try:
  self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)
except (socket.error,httplib.CannotSendRequest) :
  #reconnect in case the connection was broken from the server going
down,
  #the server timing out our persistent connection, or another
  #network failure.
  #Also catch httplib.CannotSendRequest because the HTTPConnection
object
  #can get in a bad state.
  self.conn.close()
  self.conn.connect()
  self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)

rsp = self.conn.getresponse()
#print rsp.status, rsp.reason
data = rsp.read()
#print "data=",data
self.conn.close()

  def delete(self, id):
xstr = ''+id+''
self.doUpdateXML(xstr)

  def add(self, **fields):
#todo: XML escaping
flist=['%s' % f for f in fields.items() ]
flist.insert(0,'')
flist.append('')
xstr = ''.join(flist)
self.doUpdateXML(xstr)

c = SolrConnection()
#for i in range(1):
#  c.delete(str(i))
for i in range(1):
  c.add(id=i)



Re: Doc add limit

2006-07-27 Thread Mike Klaas

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


class SolrConnection:
  def __init__(self, host='localhost:8983', solrBase='/solr'):
self.host = host
self.solrBase = solrBase
#a connection to the server is not opened at this point.
self.conn = httplib.HTTPConnection(self.host)
#self.conn.set_debuglevel(100)
self.postheaders = {"Connection":"close"}

  def doUpdateXML(self, request):
try:
  self.conn.request('POST', self.solrBase+'/update', request,
self.postheaders)


Disgressive note: I'm not sure if it is necessary with tomcat, but in
my experience driving solr with python using Jetty, it was necessary
to specify the content-type when posting utf-8 data:

self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})

-Mike


Re: Doc add limit

2006-07-27 Thread sangraal aiken

Mike,
I've been posting with the content type set like this:
 conn.setRequestProperty("Content-Type", "application/octet-stream");

I tried your suggestion though, and unfortunately there was no change.
 conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8");

-Sangraal


On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:


On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> class SolrConnection:
>   def __init__(self, host='localhost:8983', solrBase='/solr'):
> self.host = host
> self.solrBase = solrBase
> #a connection to the server is not opened at this point.
> self.conn = httplib.HTTPConnection(self.host)
> #self.conn.set_debuglevel(100)
> self.postheaders = {"Connection":"close"}
>
>   def doUpdateXML(self, request):
> try:
>   self.conn.request('POST', self.solrBase+'/update', request,
> self.postheaders)

Disgressive note: I'm not sure if it is necessary with tomcat, but in
my experience driving solr with python using Jetty, it was necessary
to specify the content-type when posting utf-8 data:

self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})

-Mike



Re: Doc add limit

2006-07-27 Thread Mike Klaas

Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Mike,
 I've been posting with the content type set like this:
  conn.setRequestProperty("Content-Type", "application/octet-stream");

I tried your suggestion though, and unfortunately there was no change.
  conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8");

-Sangraal


On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
>
> On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> > class SolrConnection:
> >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > self.host = host
> > self.solrBase = solrBase
> > #a connection to the server is not opened at this point.
> > self.conn = httplib.HTTPConnection(self.host)
> > #self.conn.set_debuglevel(100)
> > self.postheaders = {"Connection":"close"}
> >
> >   def doUpdateXML(self, request):
> > try:
> >   self.conn.request('POST', self.solrBase+'/update', request,
> > self.postheaders)
>
> Disgressive note: I'm not sure if it is necessary with tomcat, but in
> my experience driving solr with python using Jetty, it was necessary
> to specify the content-type when posting utf-8 data:
>
> self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})
>
> -Mike
>




Re: Doc add limit

2006-07-27 Thread sangraal aiken

I think you're right... I will probably work on splitting the batches up
into smaller pieces at some point in the future. I think I will need the
capability to do large batches at some point though, so I want to make sure
the system can handle it. I also want to make sure this problem doesn't pop
up and bite me later.

-Sangraal

On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:


Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Mike,
>  I've been posting with the content type set like this:
>   conn.setRequestProperty("Content-Type",
"application/octet-stream");
>
> I tried your suggestion though, and unfortunately there was no change.
>   conn.setRequestProperty("Content-Type", "text/xml;
charset=utf-8");
>
> -Sangraal
>
>
> On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
> >
> > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > > class SolrConnection:
> > >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > > self.host = host
> > > self.solrBase = solrBase
> > > #a connection to the server is not opened at this point.
> > > self.conn = httplib.HTTPConnection(self.host)
> > > #self.conn.set_debuglevel(100)
> > > self.postheaders = {"Connection":"close"}
> > >
> > >   def doUpdateXML(self, request):
> > > try:
> > >   self.conn.request('POST', self.solrBase+'/update', request,
> > > self.postheaders)
> >
> > Disgressive note: I'm not sure if it is necessary with tomcat, but in
> > my experience driving solr with python using Jetty, it was necessary
> > to specify the content-type when posting utf-8 data:
> >
> > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})
> >
> > -Mike
> >
>
>



Re: Doc add limit

2006-07-27 Thread Yonik Seeley

Are you reading the response and closing the connection?  If not, you
are probably running out of socket connections.

-Yonik

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Yonik,
It looks like the problem is with the way I'm posting to the SolrUpdate
servlet. I am able to use curl to post the data to my tomcat instance
without a problem. It only fails when I try to handle the http post from
java... my code is below:

  URL url = new URL("http://localhost:8983/solr/update";);
  HttpURLConnection conn = (HttpURLConnection) url.openConnection();
  conn.setRequestMethod("POST");
  conn.setRequestProperty("Content-Type", "application/octet-stream");
  conn.setDoOutput(true);
  conn.setDoInput(true);
  conn.setUseCaches(false);

  // Write to server
  log.info("About to post to SolrUpdate servlet.");
  DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
  output.writeBytes(sw);
  output.flush();
  log.info("Finished posting to SolrUpdate servlet.");

-Sangraal

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
>
> On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > I removed everything from the Add xml so the docs looked like this:
> >
> > 
> > 187880
> > 
> > 
> > 187852
> > 
> >
> > and it still hung at 6,144...
>
> Maybe you can try the following simple Python client to try and rule
> out some kind of different client interactions... the attached script
> adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
> and Jetty
>
> -Yonik
>
>
>  solr.py --
> import httplib
> import socket
>
> class SolrConnection:
>   def __init__(self, host='localhost:8983', solrBase='/solr'):
> self.host = host
> self.solrBase = solrBase
> #a connection to the server is not opened at this point.
> self.conn = httplib.HTTPConnection(self.host)
> #self.conn.set_debuglevel(100)
> self.postheaders = {"Connection":"close"}
>
>   def doUpdateXML(self, request):
> try:
>   self.conn.request('POST', self.solrBase+'/update', request,
> self.postheaders)
> except (socket.error,httplib.CannotSendRequest) :
>   #reconnect in case the connection was broken from the server going
> down,
>   #the server timing out our persistent connection, or another
>   #network failure.
>   #Also catch httplib.CannotSendRequest because the HTTPConnection
> object
>   #can get in a bad state.
>   self.conn.close()
>   self.conn.connect()
>   self.conn.request('POST', self.solrBase+'/update', request,
> self.postheaders)
>
> rsp = self.conn.getresponse()
> #print rsp.status, rsp.reason
> data = rsp.read()
> #print "data=",data
> self.conn.close()
>
>   def delete(self, id):
> xstr = ''+id+''
> self.doUpdateXML(xstr)
>
>   def add(self, **fields):
> #todo: XML escaping
> flist=['%s' % f for f in fields.items() ]
> flist.insert(0,'')
> flist.append('')
> xstr = ''.join(flist)
> self.doUpdateXML(xstr)
>
> c = SolrConnection()
> #for i in range(1):
> #  c.delete(str(i))
> for i in range(1):
>   c.add(id=i)


Re: Doc add limit

2006-07-27 Thread sangraal aiken

Yeah, I'm closing them.  Here's the method:

-
 private String doUpdate(String sw) {
   StringBuffer updateResult = new StringBuffer();
   try {
 // open connection
 log.info("Connecting to and preparing to post to SolrUpdate
servlet.");
 URL url = new URL("http://localhost:8080/update";);
 HttpURLConnection conn = (HttpURLConnection) url.openConnection();
 conn.setRequestMethod("POST");
 conn.setRequestProperty("Content-Type", "application/octet-stream");
 conn.setDoOutput(true);
 conn.setDoInput(true);
 conn.setUseCaches(false);

 // Write to server
 log.info("About to post to SolrUpdate servlet.");
 DataOutputStream output = new DataOutputStream(conn.getOutputStream
());
 output.writeBytes(sw);
 output.flush();
 output.close();
 log.info("Finished posting to SolrUpdate servlet.");

 // Read response
 log.info("Ready to read response.");
 BufferedReader rd = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
 log.info("Got reader");
 String line;
 while ((line = rd.readLine()) != null) {
   log.info("Writing to result...");
   updateResult.append(line);
 }
 rd.close();

 // close connections
 conn.disconnect();

 log.info("Done updating Solr for site" + updateSite);
   } catch (Exception e) {
 e.printStackTrace();
   }

   return updateResult.toString();
 }
}

-Sangraal

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


Are you reading the response and closing the connection?  If not, you
are probably running out of socket connections.

-Yonik

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Yonik,
> It looks like the problem is with the way I'm posting to the SolrUpdate
> servlet. I am able to use curl to post the data to my tomcat instance
> without a problem. It only fails when I try to handle the http post from
> java... my code is below:
>
>   URL url = new URL("http://localhost:8983/solr/update";);
>   HttpURLConnection conn = (HttpURLConnection) url.openConnection();
>   conn.setRequestMethod("POST");
>   conn.setRequestProperty("Content-Type",
"application/octet-stream");
>   conn.setDoOutput(true);
>   conn.setDoInput(true);
>   conn.setUseCaches(false);
>
>   // Write to server
>   log.info("About to post to SolrUpdate servlet.");
>   DataOutputStream output = new DataOutputStream(
conn.getOutputStream
> ());
>   output.writeBytes(sw);
>   output.flush();
>   log.info("Finished posting to SolrUpdate servlet.");
>
> -Sangraal
>
> On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > On 7/26/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> > > I removed everything from the Add xml so the docs looked like this:
> > >
> > > 
> > > 187880
> > > 
> > > 
> > > 187852
> > > 
> > >
> > > and it still hung at 6,144...
> >
> > Maybe you can try the following simple Python client to try and rule
> > out some kind of different client interactions... the attached script
> > adds 10,000 documents and works fine for me in WinXP w/ Tomcat 5.5.17
> > and Jetty
> >
> > -Yonik
> >
> >
> >  solr.py --
> > import httplib
> > import socket
> >
> > class SolrConnection:
> >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > self.host = host
> > self.solrBase = solrBase
> > #a connection to the server is not opened at this point.
> > self.conn = httplib.HTTPConnection(self.host)
> > #self.conn.set_debuglevel(100)
> > self.postheaders = {"Connection":"close"}
> >
> >   def doUpdateXML(self, request):
> > try:
> >   self.conn.request('POST', self.solrBase+'/update', request,
> > self.postheaders)
> > except (socket.error,httplib.CannotSendRequest) :
> >   #reconnect in case the connection was broken from the server
going
> > down,
> >   #the server timing out our persistent connection, or another
> >   #network failure.
> >   #Also catch httplib.CannotSendRequest because the HTTPConnection
> > object
> >   #can get in a bad state.
> >   self.conn.close()
> >   self.conn.connect()
> >   self.conn.request('POST', self.solrBase+'/update', request,
> > self.postheaders)
> >
> > rsp = self.conn.getresponse()
> > #print rsp.status, rsp.reason
> > data = rsp.read()
> > #print "data=",data
> > self.conn.close()
> >
> >   def delete(self, id):
> > xstr = ''+id+''
> > self.doUpdateXML(xstr)
> >
> >   def add(self, **fields):
> > #todo: XML escaping
> > flist=['%s' % f for f in fields.items() ]
> > flist.insert(0,'')
> > flist.append('')
> > xstr = ''.join(flist)
> > self.doUpdateXML(xstr)
> >
> > c = SolrConnection()
> > #for i in range(1):
> > #  c.delete(str(i))
> > for i in range(1):
> >   c.add(id=i)



Re: Doc add limit

2006-07-27 Thread Otis Gospodnetic
I haven't been following the thread, but
Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit (set 
somewhere in its configs) that may be the source of the problem.

Otis
P.S.
Just occurred to me.
Tomcat.  Jetty.  Tom & Jerry.  Jetty guys should have called their thing Jerry 
or Jerrymouse.

- Original Message 
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, July 27, 2006 6:33:16 PM
Subject: Re: Doc add limit

Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Mike,
>  I've been posting with the content type set like this:
>   conn.setRequestProperty("Content-Type", "application/octet-stream");
>
> I tried your suggestion though, and unfortunately there was no change.
>   conn.setRequestProperty("Content-Type", "text/xml; charset=utf-8");
>
> -Sangraal
>
>
> On 7/27/06, Mike Klaas <[EMAIL PROTECTED]> wrote:
> >
> > On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > > class SolrConnection:
> > >   def __init__(self, host='localhost:8983', solrBase='/solr'):
> > > self.host = host
> > > self.solrBase = solrBase
> > > #a connection to the server is not opened at this point.
> > > self.conn = httplib.HTTPConnection(self.host)
> > > #self.conn.set_debuglevel(100)
> > > self.postheaders = {"Connection":"close"}
> > >
> > >   def doUpdateXML(self, request):
> > > try:
> > >   self.conn.request('POST', self.solrBase+'/update', request,
> > > self.postheaders)
> >
> > Disgressive note: I'm not sure if it is necessary with tomcat, but in
> > my experience driving solr with python using Jetty, it was necessary
> > to specify the content-type when posting utf-8 data:
> >
> > self.postheaders.update({'Content-Type': 'text/xml; charset=utf-8'})
> >
> > -Mike
> >
>
>





Re: Doc add limit

2006-07-27 Thread sangraal aiken

I'm running on Tomcat... and I've verified that the complete post is making
it through the SolrUpdate servlet and into the SolrCore object... thanks for
the info though.
--
So the code is hanging on this call in SolrCore.java

   writer.write("");

The thread dump:

"http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu
time=40698.0440ms user time=38646.1680ms
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(
InternalOutputBuffer.java:746)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:433)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348)
at
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite
(InternalOutputBuffer.java:769)
at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(
ChunkedOutputFilter.java:125)
at org.apache.coyote.http11.InternalOutputBuffer.doWrite(
InternalOutputBuffer.java:579)
at org.apache.coyote.Response.doWrite(Response.java:559)
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(
OutputBuffer.java:361)
at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324)
at org.apache.tomcat.util.buf.IntermediateOutputStream.write(
C2BConverter.java:235)
at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(StreamEncoder.java
:336)
at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(
StreamEncoder.java:404)
at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java:408)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:152)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java
:184)
at org.apache.tomcat.util.buf.C2BConverter.flushBuffer(
C2BConverter.java:127)
at org.apache.catalina.connector.OutputBuffer.realWriteChars(
OutputBuffer.java:536)
at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java:439)
at org.apache.tomcat.util.buf.CharChunk.append(CharChunk.java:370)
at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java
:491)
at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java
:161)
at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java
:170)
at org.apache.solr.core.SolrCore.update(SolrCore.java:695)
at org.apache.solr.servlet.SolrUpdateServlet.doPost(
SolrUpdateServlet.java:52)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:252)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:613)

On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:


I haven't been following the thread, but
Not sure if you are using Tomcat or Jetty, but Jetty has a POST size limit
(set somewhere in its configs) that may be the source of the problem.

Otis
P.S.
Just occurred to me.
Tomcat.  Jetty.  Tom & Jerry.  Jetty guys should have called their thing
Jerry or Jerrymouse.

- Original Message 
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, July 27, 2006 6:33:16 PM
Subject: Re: Doc add limit

Hi Sangraal:

Sorry--I tried not to imply that this might affect your issue.  You
may have to crank up the solr logging to determine where it is
freezing (and what might be happening).

It is certainly worth investigating why this occurs, but I wonder
about the advantages of using such huge batches.  Assuming a few
hundred bytes per document, 6100 docs produces a POST over 1MB in
size.

-Mike

On 7/27/06, sangraal aiken <[EMAIL P

Re: Doc add limit

2006-07-27 Thread Yonik Seeley

You might also try the Java update client here:
http://issues.apache.org/jira/browse/SOLR-20

-Yonik


Re: Doc add limit

2006-07-27 Thread sangraal aiken

Commenting out the following line in SolrCore fixes my problem... but of
course I don't get the result status info... but this isn't a problem for me
really.

-Sangraal

writer.write("");
On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:


I'm running on Tomcat... and I've verified that the complete post is
making it through the SolrUpdate servlet and into the SolrCore object...
thanks for the info though.
--
So the code is hanging on this call in SolrCore.java

writer.write("");

The thread dump:

"http-8080-Processor24" Id=32 in RUNNABLE (running in native) total cpu
time= 40698.0440ms user time=38646.1680ms

 at java.net.SocketOutputStream.socketWrite0(Native Method)
 at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java
:92)
 at java.net.SocketOutputStream.write (SocketOutputStream.java:136)
 at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(
InternalOutputBuffer.java:746)
 at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java
:433)
 at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:348)
 at
org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite
(InternalOutputBuffer.java:769)
 at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite (
ChunkedOutputFilter.java:125)
 at org.apache.coyote.http11.InternalOutputBuffer.doWrite(
InternalOutputBuffer.java:579)
 at org.apache.coyote.Response.doWrite(Response.java:559)
 at org.apache.catalina.connector.OutputBuffer.realWriteBytes (
OutputBuffer.java:361)
 at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:324)
 at org.apache.tomcat.util.buf.IntermediateOutputStream.write(
C2BConverter.java:235)
 at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes (StreamEncoder.java
:336)
 at sun.nio.cs.StreamEncoder$CharsetSE.implFlushBuffer(
StreamEncoder.java:404)
 at sun.nio.cs.StreamEncoder$CharsetSE.implFlush(StreamEncoder.java
:408)
 at sun.nio.cs.StreamEncoder.flush (StreamEncoder.java:152)
 at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:213)
 at org.apache.tomcat.util.buf.WriteConvertor.flush(C2BConverter.java
:184)
 at org.apache.tomcat.util.buf.C2BConverter.flushBuffer (
C2BConverter.java:127)
 at org.apache.catalina.connector.OutputBuffer.realWriteChars(
OutputBuffer.java:536)
 at org.apache.tomcat.util.buf.CharChunk.flushBuffer(CharChunk.java
:439)
 at org.apache.tomcat.util.buf.CharChunk.append (CharChunk.java:370)
 at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java
:491)
 at org.apache.catalina.connector.CoyoteWriter.write(CoyoteWriter.java
:161)
 at org.apache.catalina.connector.CoyoteWriter.write (
CoyoteWriter.java:170)
 at org.apache.solr.core.SolrCore.update(SolrCore.java:695)
 at org.apache.solr.servlet.SolrUpdateServlet.doPost(
SolrUpdateServlet.java:52)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:709)

 at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java :252)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
 at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
 at org.apache.catalina.core.StandardContextValve.invoke (
StandardContextValve.java:178)
 at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
 at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
 at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
 at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
 at org.apache.coyote.http11.Http11Processor.process (
Http11Processor.java:869)
 at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection
(Http11BaseProtocol.java:664)
 at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
 at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
 at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java :684)

 at java.lang.Thread.run(Thread.java:613)

On 7/27/06, Otis Gospodnetic <[EMAIL PROTECTED] > wrote:
>
> I haven't been following the thread, but
> Not sure if you are using Tomcat or Jetty, but Jetty has a POST size
> limit (set somewhere in its configs) that may be the source of the problem.
>
> Otis
> P.S.
> Just occurred to me.
> Tomcat.  Jetty.  Tom & Jerry.  Jetty guys should have called their thing
> Jerry or Jerrymouse.
>
> - Original Message 
> From: Mike Klaas < [EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, July 27, 2006 6:33:16 PM
> Subject: Re: Doc add limit
>
> Hi Sangraal:
>
> Sorry--I tried not to imply that this might affect your issue.  You
> may 

Re: Doc add limit

2006-07-27 Thread Yonik Seeley

On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:

Commenting out the following line in SolrCore fixes my problem... but of
course I don't get the result status info... but this isn't a problem for me
really.

-Sangraal

writer.write("");


While it's possible you hit a Tomcat bug, I think it's more likely a
client problem.

-Yonik


Re: Doc add limit

2006-07-27 Thread sangraal aiken

I'll give that a shot...

Thanks again for all your help.

-S

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


You might also try the Java update client here:
http://issues.apache.org/jira/browse/SOLR-20

-Yonik



Re: Doc add limit

2006-07-27 Thread sangraal aiken

I'm sure... it seems like solr is having trouble writing to a tomcat
response that's been inactive for a bit. It's only 30 seconds though, so I'm
not entirely sure why that would happen.

I use the same client code for DL'ing XSL sheets from external servers and
it works fine, but in those instances the server responds much faster to the
request.

This is an elusive bug for sure.

-S

On 7/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 7/27/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
> Commenting out the following line in SolrCore fixes my problem... but of
> course I don't get the result status info... but this isn't a problem
for me
> really.
>
> -Sangraal
>
> writer.write("");

While it's possible you hit a Tomcat bug, I think it's more likely a
client problem.

-Yonik



Re: Doc add limit

2006-07-27 Thread Chris Hostetter

: I'm sure... it seems like solr is having trouble writing to a tomcat
: response that's been inactive for a bit. It's only 30 seconds though, so I'm
: not entirely sure why that would happen.

but didn't you say you don't have this problem when you use curl -- just
your java client code?

Did you try Yonik's python test client? or the java client in Jira?

looking over the java clinet codey you sent, it's not clear if you are
reading the response back, or closing the connections ... can you post a
more complete sample app thatexhibits the problem for you?



-Hoss