Hi there, I've been working with this issue for a while and I really don’t know what the root cause is. Any insight would be great!
I have 14 million records in a mysql DB. I grab 100,000 records from the DB at a time and then use ConcurrentUpdateSolrServer (with queue size = 50 and thread count = 4 and using the internally managed solr client) to write the documents to the solr index. If I build metadata only (I.e. Only from DB to Solr), then the index build takes 4 hrs with no errors. But if I build metadata + ocr text (ocr text is stored on the file system and can be very large), then the index build takes 15 – 16 hrs and often times I get a few early EOF errors on the Solr server. >From Solr.log: INFO - 2014-06-13 06:28:27.113; org.apache.solr.update.processor.LogUpdateProcessor; [ltdl3testperf] webapp=/solr path=/update params={wt=javabin&version=2} {add=[trpy0136 (1470801743195406336), nfhc0136 (1470801743199600640), sfhc0136 (1470801743205892096), kghc0136 (1470801743218475008), zfhc0136 (1470801743220572160), jghc0136 (1470801743237349376), rghc0136 (1470801743268806656), ffhc0136 (1470801743270903808), pghc0136 (1470801743285583872), sghc0136 (1470801743286632448), ... (14165 adds)]} 0 260102 ERROR - 2014-06-13 06:28:27.114; org.apache.solr.common.SolrException; java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early EOF at com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) … We tried increasing the solr server from 4 to 6 cpus. We moved the solr server to a faster disk. I reduced the queue size for the for ConcurrentUpdateSolrServer from 100 to 50. But we cannot consistently get a full index going without any the EOF errors. In my past three builds (I build them overnight): 1. The first one succeeded 2. The second one had one early EOF error and dropped 3 records out of 14 million 3. The third one had many early EOFs and dropped around 200,000 records One cluster of the errors occurred at around 6:28am. I looked at the cpu and file I/O stats around that time, and didn't see anything out of the ordinary. > sar 06:00:01 AM all 42.13 0.00 1.54 2.13 0.00 54.20 06:10:01 AM all 43.30 0.00 1.68 2.77 0.00 52.24 06:20:01 AM all 47.73 0.00 1.83 2.43 0.00 48.01 06:30:01 AM all 47.71 0.00 1.76 3.15 0.00 47.38 06:40:01 AM all 47.01 0.00 1.68 2.55 0.00 48.76 > sar –d 06:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 06:20:01 AM dev8-0 1.84 2.35 370.95 203.01 0.05 27.60 9.58 1.76 06:20:01 AM dev8-16 83.05 464.90 44384.81 540.05 13.25 160.17 2.53 21.03 06:20:01 AM dev8-32 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 06:20:01 AM dev253-0 1.41 1.71 10.90 8.95 0.01 10.16 3.03 0.43 06:20:01 AM dev253-1 45.09 0.64 360.06 8.00 2.46 54.66 0.30 1.37 06:20:01 AM dev253-2 5513.98 464.90 44092.00 8.08 1623.60 295.54 0.04 21.04 06:30:01 AM dev8-0 2.52 100.62 83.64 72.99 0.03 10.42 6.59 1.66 06:30:01 AM dev8-16 52.56 1502.75 18736.64 385.06 5.67 107.95 2.17 11.42 06:30:01 AM dev8-32 42.55 0.01 38923.71 914.83 15.33 360.27 3.84 16.35 06:30:01 AM dev253-0 3.03 98.24 13.55 36.93 0.03 9.44 2.99 0.90 06:30:01 AM dev253-1 9.06 2.38 70.09 8.00 0.26 29.19 0.84 0.77 06:30:01 AM dev253-2 7216.35 1502.76 57660.35 8.20 2599.49 360.22 0.04 26.58 Does anyone have any suggestions of where I can dig for the root cause? Thanks! Rebecca Tang Applications Developer, UCSF CKM Legacy Tobacco Document Library<legacy.library.ucsf.edu/> E: rebecca.t...@ucsf.edu