I had a similar problem and was able to fix it in Solr by manually buffering the responses to a StringWriter before sending it to Tomcat. Essentially, Tomcat's buffer will only hold so much and at that point it blocks (thus it always hangs at a constant number of documents). However, a better solution (to be implemented) is to use more intelligent code on the client to read the response at the same time that it is sending input -- not too difficult to do, though best to do with two threads (i.e. fire off a thread to read the response before you send any data). Seeing as the HttpClient code probably does this already, I'll most likely end up using that.
On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote:
Those are some great ideas Chris... I'm going to try some of them out. I'll post the results when I get a chance to do more testing. Thanks. At this point I can work around the problem by ignoring Solr's response but this is obviously not ideal. I would feel better knowing what is causing the issue as well. -Sangraal On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > : Sure, the method that does all the work updating Solr is the > doUpdate(String > : s) method in the GanjaUpdate class I'm pasting below. It's hanging when > I > : try to read the response... the last output I receive in my log is Got > : Reader... > > I don't have the means to try out this code right now ... but i can't see > any obvious problems with it (there may be somewhere that you are opening > a stream or reader and not closing it, but i didn't see one) ... i notice > you are running this client on the same machine as Solr (hence the > localhost URLs) did you by any chance try running the client on a seperate > machine to see if hte number of updates before it hangs changes? > > my money is still on a filehandle resource limit somwhere ... if you are > running on a system that has "lsof" (on some Unix/Linux installations you > need sudo/su root permissions to run it) you can use "lsof -p ####" to > look up what files/network connections are open for a given process. You > can try running that on both the client pid and the Solr server pid once > it's hung -- You'll probably see a lot of Jar files in use for both, but > if you see more then a few XML files open by the client, or more then a > 1 TCP connection open by either the client or the server, there's your > culprit. > > I'm not sure what Windows equivilent of lsof may exist. > > Wait ... i just had another thought.... > > You are using InputStreamReader to deal with the InputStreams of your > remote XML files -- but you aren't specifying a charset, so it's using > your system default which may be differnet from the charset of the > orriginal XML files you are pulling from the URL -- which (i *think*) > means that your InputStreamReader may in some cases fail to read all of > the bytes of the stream, which might some dangling filehandles (i'm just > guessing on that part ... i'm not acctually sure whta happens in that > case). > > What if you simplify your code (for the purposes of testing) and just put > the post-transform version ganja-full.xml in a big ass String variable in > your java app and just call GanjaUpdate.doUpdate(bigAssString) over and > over again ... does that cause the same problem? > > > : > : ---------- > : > : package com.iceninetech.solr.update; > : > : import com.iceninetech.xml.XMLTransformer; > : > : import java.io.*; > : import java.net.HttpURLConnection; > : import java.net.URL; > : import java.util.logging.Logger; > : > : public class GanjaUpdate { > : > : private String updateSite = ""; > : private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl"; > : > : private static final File xmlStorageDir = new > : File("/source/solr/xml-dls/"); > : > : final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); > : > : public GanjaUpdate(String siteName) { > : this.updateSite = siteName; > : log.info("GanjaUpdate is primed and ready to update " + siteName); > : } > : > : public void update() { > : StringWriter sw = new StringWriter(); > : > : try { > : // transform gawkerInput XML to SOLR update XML > : XMLTransformer transform = new XMLTransformer(); > : log.info("About to transform ganjaInput XML to Solr Update XML"); > : transform.transform(getXML(), sw, getXSL()); > : log.info("Completed ganjaInput/SolrUpdate XML transform"); > : > : // Write transformed XML to Disk. > : File transformedXML = new File(xmlStorageDir, updateSite+".sml"); > : FileWriter fw = new FileWriter(transformedXML); > : fw.write(sw.toString()); > : fw.close(); > : > : // post to Solr > : log.info("About to update Solr for site " + updateSite); > : String result = this.doUpdate(sw.toString()); > : log.info("Solr says: " + result); > : sw.close(); > : } catch (Exception e) { > : e.printStackTrace(); > : } > : } > : > : public File getXML() { > : String XML_URL = "http://localhost:8080/" + updateSite + "/ganja- > : full.xml"; > : > : // check for file > : File localXML = new File(xmlStorageDir, updateSite + ".xml"); > : > : try { > : if (localXML.createNewFile() && localXML.canWrite()) { > : // open connection > : log.info("Downloading: " + XML_URL); > : URL url = new URL(XML_URL); > : HttpURLConnection conn = (HttpURLConnection) url.openConnection > (); > : conn.setRequestMethod("GET"); > : > : // Read response to File > : log.info("Storing XML to File" + localXML.getCanonicalPath()); > : FileOutputStream fos = new FileOutputStream(new > File(xmlStorageDir, > : updateSite + ".xml")); > : > : BufferedReader rd = new BufferedReader(new InputStreamReader( > : conn.getInputStream())); > : String line; > : while ((line = rd.readLine()) != null) { > : line = line + '\n'; // add break after each line. It preserves > : formatting. > : fos.write(line.getBytes("UTF8")); > : } > : > : // close connections > : rd.close(); > : fos.close(); > : conn.disconnect(); > : log.info("Got the XML... File saved."); > : } > : } catch (Exception e) { > : e.printStackTrace(); > : } > : > : return localXML; > : } > : > : public File getXSL() { > : StringBuffer retVal = new StringBuffer(); > : > : // check for file > : File localXSL = new File(xmlStorageDir, "ganja.xsl"); > : > : try { > : if (localXSL.createNewFile() && localXSL.canWrite()) { > : // open connection > : log.info("Downloading: " + XSL_URL); > : URL url = new URL(XSL_URL); > : HttpURLConnection conn = (HttpURLConnection) url.openConnection > (); > : conn.setRequestMethod("GET"); > : // Read response > : BufferedReader rd = new BufferedReader(new InputStreamReader( > : conn.getInputStream())); > : String line; > : while ((line = rd.readLine()) != null) { > : line = line + '\n'; > : retVal.append(line); > : } > : // close connections > : rd.close(); > : conn.disconnect(); > : > : log.info("Got the XSLT."); > : > : // output file > : log.info("Storing XSL to File" + localXSL.getCanonicalPath()); > : FileOutputStream fos = new FileOutputStream(new > File(xmlStorageDir, > : "ganja.xsl")); > : fos.write(retVal.toString().getBytes()); > : fos.close(); > : log.info("File saved."); > : } > : } catch (Exception e) { > : e.printStackTrace(); > : } > : return localXSL; > : } > : > : private String doUpdate(String sw) { > : StringBuffer updateResult = new StringBuffer(); > : try { > : // open connection > : log.info("Connecting to and preparing to post to SolrUpdate > : servlet."); > : URL url = new URL("http://localhost:8080/update"); > : HttpURLConnection conn = (HttpURLConnection) url.openConnection(); > : conn.setRequestMethod("POST"); > : conn.setRequestProperty("Content-Type", > "application/octet-stream"); > : conn.setDoOutput(true); > : conn.setDoInput(true); > : conn.setUseCaches(false); > : > : // Write to server > : log.info("About to post to SolrUpdate servlet."); > : DataOutputStream output = new DataOutputStream( > conn.getOutputStream > : ()); > : output.writeBytes(sw); > : output.flush(); > : output.close(); > : log.info("Finished posting to SolrUpdate servlet."); > : > : // Read response > : log.info("Ready to read response."); > : BufferedReader rd = new BufferedReader(new InputStreamReader( > : conn.getInputStream())); > : log.info("Got reader...."); > : String line; > : while ((line = rd.readLine()) != null) { > : log.info("Writing to result..."); > : updateResult.append(line); > : } > : rd.close(); > : > : // close connections > : conn.disconnect(); > : > : log.info("Done updating Solr for site" + updateSite); > : } catch (Exception e) { > : e.printStackTrace(); > : } > : > : return updateResult.toString(); > : } > : } > : > : > : On 7/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : > > : > > : > : I'm sure... it seems like solr is having trouble writing to a tomcat > : > : response that's been inactive for a bit. It's only 30 seconds > though, so > : > I'm > : > : not entirely sure why that would happen. > : > > : > but didn't you say you don't have this problem when you use curl -- > just > : > your java client code? > : > > : > Did you try Yonik's python test client? or the java client in Jira? > : > > : > looking over the java clinet codey you sent, it's not clear if you are > : > reading the response back, or closing the connections ... can you post > a > : > more complete sample app thatexhibits the problem for you? > : > > : > > : > > : > -Hoss > : > > : > > : > > > > -Hoss > >