I've been encountering a frequent problem with the 2.0-rc2 release in the
spider I'm working on where the HttpParser throws an exception when a extra
byte is returned from a web server. When this exception is thrown, none of
the Headers are returned even though they all contained valid data.
An example packet from Ethereal is attached.
As you can see, there is an extraneous byte (0x00) being sent that is causing
the problem.
I've attached a quick and dirty patch to fix this. There was already a test
looking for a length < 1 in order to skip processing. Rather than
specifically looking for this case, I simply changes the check to look for a
length < 2 on the grounds that there could never be a valid header of one
character anyway. The patch is against HEAD, but would probably apply to
2.0-rc2 release cleanly.
Let me know what you think.
Let me know if this is the wrong place to post this!
Andrew Buchanan
00000000 47 45 54 20 2f 3f 66 6e 3d 31 26 73 69 3d 38 34 GET /?fn =1&si=84
00000010 32 33 36 20 48 54 54 50 2f 31 2e 30 0d 0a 55 73 236 HTTP /1.0..Us
00000020 65 72 2d 41 67 65 6e 74 3a 20 48 75 67 68 43 72 er-Agent : HughCr
00000030 61 77 6c 65 72 2f 30 2e 36 0d 0a 48 6f 73 74 3a awler/0. 6..Host:
00000040 20 69 6e 2e 70 61 79 63 6f 75 6e 74 65 72 2e 63 in.payc ounter.c
00000050 6f 6d 0d 0a 0d 0a om....
00000000
48 54 54 50 2f 31 2e 30 20 33 30 32 20 52 65 73 HTTP/1.0 302 Res
00000010
6f 75 72 63 65 20 6d 6f 76 65 64 0d 0a 43 6f 6e ource mo ved..Con
00000020
6e 65 63 74 69 6f 6e 3a 20 63 6c 6f 73 65 0d 0a nection: close..
00000030
53 65 72 76 65 72 3a 20 70 63 74 72 61 63 6b 64 Server: pctrackd
00000040
2f 30 2e 39 0d 0a 50 33 50 3a 20 70 6f 6c 69 63 /0.9..P3 P: polic
00000050
79 72 65 66 3d 22 68 74 74 70 3a 2f 2f 77 77 77 yref="ht tp://www
00000060
2e 70 61 79 63 6f 75 6e 74 65 72 2e 63 6f 6d 2f .paycoun ter.com/
00000070
77 33 63 2f 70 33 70 2e 78 6d 6c 22 2c 20 43 50 w3c/p3p. xml", CP
00000080
3d 22 4e 4f 4e 20 44 53 50 20 43 4f 52 20 44 45 ="NON DS P COR DE
00000090
56 20 50 53 41 20 4f 55 52 20 42 55 53 20 4e 41 V PSA OU R BUS NA
000000A0
56 20 53 54 41 20 50 52 45 22 0d 0a 53 65 74 2d V STA PR E"..Set-
000000B0
43 6f 6f 6b 69 65 3a 20 70 63 74 72 61 63 6b 64 Cookie: pctrackd
000000C0
3d 30 30 30 4b 61 43 30 31 34 47 68 72 30 30 32 =000KaC0 14Ghr002
000000D0
30 30 3b 20 70 61 74 68 3d 2f 3b 20 64 6f 6d 61 00; path =/; doma
000000E0
69 6e 3d 2e 70 61 79 63 6f 75 6e 74 65 72 2e 63 in=.payc ounter.c
000000F0
6f 6d 3b 20 65 78 70 69 72 65 73 3d 54 75 65 2c om; expi res=Tue,
00000100
20 33 31 20 44 65 63 20 32 30 33 30 20 30 31 3a 31 Dec 2030 01:
00000110
30 30 3a 30 30 20 47 4d 54 0d 0a 4c 6f 63 61 74 00:00 GM T..Locat
00000120
69 6f 6e 3a 20 68 74 74 70 3a 2f 2f 77 77 77 2e ion: htt p://www.
00000130
70 63 61 64 75 6c 74 2e 63 6f 6d 2f 3f 73 69 3d pcadult. com/?si=
00000140
38 34 32 33 36 26 63 61 74 3d 32 0d 0a 00 84236&ca t=2...
Index: src/java/org/apache/commons/httpclient/HttpParser.java
===================================================================
RCS file: /home/cvspublic/jakarta-commons/httpclient/src/java/org/apache/commons/httpclient/HttpParser.java,v
retrieving revision 1.8
diff -u -r1.8 HttpParser.java
--- src/java/org/apache/commons/httpclient/HttpParser.java 15 Jul 2003 02:19:58 -0000 1.8
+++ src/java/org/apache/commons/httpclient/HttpParser.java 9 Jan 2004 20:26:42 -0000
@@ -170,7 +170,7 @@
StringBuffer value = null;
for (; ;) {
String line = HttpParser.readLine(is);
- if ((line == null) || (line.length() < 1)) {
+ if ((line == null) || (line.length() < 2)) {
break;
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]