rzo1 commented on code in PR #1587:
URL: https://github.com/apache/stormcrawler/pull/1587#discussion_r2178397592
##########
core/pom.xml:
##########
@@ -264,6 +265,12 @@ under the License.
<scope>test</scope>
</dependency>
+ <dependency>
+ <groupId>org.apache.httpcomponents.core5</groupId>
Review Comment:
StormCrawler is currently using HttpClient version 4, along with core
components also on version 4. We should either upgrade to version 5 or maintain
consistency by sticking with version 4 throughout.
##########
core/src/main/java/org/apache/stormcrawler/protocol/okhttp/HttpProtocol.java:
##########
@@ -352,7 +352,7 @@ public ProtocolResponse getProtocolOutput(String url, final
Metadata metadata)
final String lastModified =
metadata.getFirstValue(HttpHeaders.LAST_MODIFIED);
if (StringUtils.isNotBlank(lastModified)) {
- rb.header("If-Modified-Since",
HttpHeaders.formatHttpDate(lastModified));
+ rb.header("If-Modified-Since", formatHttpDate(lastModified));
Review Comment:
Can this be replaced by
https://hc.apache.org/httpcomponents-core-4.4.x/current/httpcore/apidocs/org/apache/http/HttpHeaders.html#IF_MODIFIED_SINCE
?
##########
core/src/main/java/org/apache/stormcrawler/protocol/okhttp/HttpProtocol.java:
##########
@@ -352,7 +352,7 @@ public ProtocolResponse getProtocolOutput(String url, final
Metadata metadata)
final String lastModified =
metadata.getFirstValue(HttpHeaders.LAST_MODIFIED);
if (StringUtils.isNotBlank(lastModified)) {
- rb.header("If-Modified-Since",
HttpHeaders.formatHttpDate(lastModified));
+ rb.header("If-Modified-Since", formatHttpDate(lastModified));
}
final String ifNoneMatch = metadata.getFirstValue("etag",
protocolMDprefix);
Review Comment:
https://hc.apache.org/httpcomponents-core-4.4.x/current/httpcore/apidocs/org/apache/http/HttpHeaders.html#ETAG
?
##########
core/src/main/java/org/apache/stormcrawler/Metadata.java:
##########
@@ -109,14 +109,15 @@ public String[] getValues(String key, String prefix) {
}
public String[] getValues(String key) {
- String[] values = md.get(key);
+ if (key == null || key.isEmpty()) return null;
+ String[] values = md.getOrDefault(key, md.get(key.toLowerCase()));
Review Comment:
I would be fine to drop the backward compatibility here. We could mention in
the changelog, that the next SC release have fixed odd http headers (or we
stick with the compatibility, but we should drop it some day because it makes
our code a bit more complex to re-think / track why it was added). If people
would like to have the backward compatibility for those headers, we should
maybe add a code comment referencing the issue? wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]