rzo1 commented on code in PR #1843:
URL: https://github.com/apache/stormcrawler/pull/1843#discussion_r2999827818


##########
docs/src/main/asciidoc/configuration.adoc:
##########
@@ -196,20 +196,26 @@ implementation.
 | http.proxy.pass | - | Proxy password.
 | http.proxy.port | 8080 | Proxy port.
 | http.proxy.user | - | Proxy username.
-| http.robots.403.allow | true | Defines behavior when robots.txt returns HTTP 
403.
+| http.retry.on.connection.failure | true | Retry fetching on connection 
failure.
+| http.robots.403.allow | true | Allow crawling when robots.txt returns HTTP 
403.
+| http.robots.5xx.allow | false | Allow crawling when robots.txt returns a 
server error (5xx).
 | http.robots.agents | '' | Additional user-agent strings for interpreting 
robots.txt.
-| http.robots.file.skip | false | Ignore robots.txt rules (1.17+).
+| http.robots.content.limit | -1 | Maximum bytes to fetch for robots.txt. -1 
uses http.content.limit.
+| http.robots.file.skip | false | Ignore robots.txt rules entirely.
+| http.robots.headers.skip | false | Ignore robots directives from HTTP 
headers.
+| http.robots.meta.skip | false | Ignore robots directives from HTML meta tags.
 | http.skip.robots | false | Deprecated (replaced by http.robots.file.skip).
+| robots.noFollow.strict | true | If true, remove all outlinks from pages 
marked as noFollow.
 | http.store.headers | false | Whether to store response headers.
-| http.store.responsetime | true | Not yet implemented — store response time 
in Metadata.
 | http.timeout | 10000 | Connection timeout (ms).
 | http.use.cookies | false | Use cookies in subsequent requests.
 | https.protocol.implementation | 
org.apache.stormcrawler.protocol.httpclient.HttpProtocol | HTTPS Protocol
 implementation.
 | partition.url.mode | byHost | Defines how URLs are partitioned: byHost, 
byDomain, or byIP.
-| protocols | http,https | Supported protocols.
-| redirections.allowed | true | Allow URL redirects.
+| protocols | http,https,file | Supported protocols.
+| http.allow.redirects | false | Allow URL redirects.

Review Comment:
   Did that now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to