2006/6/7, Mark Nottingham <[EMAIL PROTECTED]>:
The degree of precision that FH currently provides isn't desirable for search results. Feed History also requires that the server maintain state about a particular feed, which is unworkable for search results; e.g., to implement feed history for search results, a server would have to mint a whole new set of feed documents for every query, and keep them around. That's not workable for most search engines (Yahoo, Google, Amazon, whatever), so they need another option -- one that needs to be clearly distinct from FH. This brings me to my other motivation -- I found that most people who use "previous" and "next" don't understand the assumptions that FH makes about archive stability, and point them at URIs like "http:// example.org/feed.atom?page=3". That will break the FH algorithm badly, reducing the value of the mechanism as a whole, because people will stop trusting it. The link relation for implementing the incremental approach needs to have the stability semantics baked in and explicit.
As I previously said, the current FH algorithm isn't workable for all the use cases (notably with paged non-incremental feeds, those feeds being snapshots or "live feeds") but that doesn't mean there's a need for other rel values. I think FH could: - (more explicitly) RECOMMEND using stable "chunks" so that caching can be used (maybe using an "overview" section similar to the APP's section 5) - provide algorithms as an "how-to optimize bandwith et al. and not retrieve the whole set of pages/chunks", not as a "you should/must do that to comply to this spec" - provide different algorithms for incremental and non-incremental feeds, non-incremental feeds retrieval having to go through all the pages/chunks and not stop as soon as there's no more change (but the whole algorithm remains almost the same) - RECOMMEND using RFC3229 w/ feeds (problem: this is not an I-D), particularly for non-incremental feeds (Atom does not define order for the entries in the feed, order has to be told by extensions, e.g. Feed Rank, so there's no problem retrieving only a subset of the whole feed, even if new/updated entries would be sparsed in the feed if retrieved without RFC3229 w/ feeds), and NOT RECOMMEND (or even a "MUST NOT") using paging for RFC3229 responses: either you return the whole set of changes, or you choose to ignore the "IM" request and return the feed as if the client didn't use RFC3229) - use 304 (Not Modified) instead of IRI-comparison, enlightening the fact that web servers like Apache provide such response out-of-the-bow when delivering files (this should incite people to using stable chunks/pages by saving them to files, instead of making use of CGI applications with database requests). The fact that many implementations currently return feeds (whether or not they're using paging/history) without processing preconditions (If-Modified-Since or If-Not-Match) shouldn't influence the FH design, many of these implementations are deployments of CMSes like WordPress, and those can be fixed. Maybe the Feed Validator can have an option to test this: it would retrieve the feed twice (at least), using If-Modified-Since and/or If-Not-Match for the second request, and showing a warning if the feed hasn't changed and the second GET didn't result in a 304 (Not Modified). Please note that I haven't went back read the FH draft since weeks, so some of my comments might not be accurate (I've a pretty good memory but data-losses might still happen ;-) ) -- Thomas Broyer
