Hi all,
So the Solr tutorial recommends batching operation to improve
performance by avoiding multiple costly commits.
To implement this, I originally had a couple of methods in my python
app reading from or writing to Solr, with a scheduled task blindly
committing every 15 seconds.
However, my logs were chock full of errors such as:
File "/mnt/yelteam/server_dev/YelServer/yel/yel_search.py", line
73, in __add
self.conn.add(**params)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in
add
return self.doUpdateXML(xstr)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in
doUpdateXML
rsp = self.doPost(self.solrBase+'/update', request,
self.xmlheaders)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 94, in
doPost
return self.__errcheck(self.conn.getresponse())
File "/usr/lib64/python2.4/httplib.py", line 856, in getresponse
raise ResponseNotReady()
ResponseNotReady
and:
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in
add
return self.doUpdateXML(xstr)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in
doUpdateXML
rsp = self.doPost(self.solrBase+'/update', request,
self.xmlheaders)
File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 102, in
doPost
return self.__errcheck(self.conn.getresponse())
File "/usr/lib64/python2.4/httplib.py", line 866, in getresponse
response.begin()
File "/usr/lib64/python2.4/httplib.py", line 336, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.4/httplib.py", line 294, in _read_status
line = self.fp.readline()
File "/usr/lib64/python2.4/socket.py", line 317, in readline
data = recv(1)
error: (104, 'Connection reset by peer')
and a few other variations.
I thought it might be to do with commit operations conflicting with
reads or writes, so wrote and even dumber queueing system to hold
onto pending reads/writes while a commit went through.
However, my logs are still full of those errors :) I doubt that
either python's httplib library or Solr are buggy, so is it something
to do with the way I'm using the API?
How do people generally approach the deferred commit issue? Do I need
to queue index and search requests myself or does Solr handle it? My
app indexes about 100 times more than it searches, but searching is
more time critical. Does that change anything?
Thanks!
James