Hi all,
So the Solr tutorial recommends batching operation to improve performance by avoiding multiple costly commits.

To implement this, I originally had a couple of methods in my python app reading from or writing to Solr, with a scheduled task blindly committing every 15 seconds.

However, my logs were chock full of errors such as:
File "/mnt/yelteam/server_dev/YelServer/yel/yel_search.py", line 73, in __add
    self.conn.add(**params)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in add
    return self.doUpdateXML(xstr)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in doUpdateXML rsp = self.doPost(self.solrBase+'/update', request, self.xmlheaders) File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 94, in doPost
    return self.__errcheck(self.conn.getresponse())
  File "/usr/lib64/python2.4/httplib.py", line 856, in getresponse
    raise ResponseNotReady()
ResponseNotReady

and:
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in add
    return self.doUpdateXML(xstr)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in doUpdateXML rsp = self.doPost(self.solrBase+'/update', request, self.xmlheaders) File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 102, in doPost
    return self.__errcheck(self.conn.getresponse())
  File "/usr/lib64/python2.4/httplib.py", line 866, in getresponse
    response.begin()
  File "/usr/lib64/python2.4/httplib.py", line 336, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python2.4/httplib.py", line 294, in _read_status
    line = self.fp.readline()
  File "/usr/lib64/python2.4/socket.py", line 317, in readline
    data = recv(1)
error: (104, 'Connection reset by peer')

and a few other variations.

I thought it might be to do with commit operations conflicting with reads or writes, so wrote and even dumber queueing system to hold onto pending reads/writes while a commit went through.

However, my logs are still full of those errors :) I doubt that either python's httplib library or Solr are buggy, so is it something to do with the way I'm using the API?

How do people generally approach the deferred commit issue? Do I need to queue index and search requests myself or does Solr handle it? My app indexes about 100 times more than it searches, but searching is more time critical. Does that change anything?

Thanks!
James

Reply via email to