2006/6/7, Mark Nottingham <[EMAIL PROTECTED]>:

The degree of precision that FH currently provides isn't desirable
for search results. Feed History also requires that the server
maintain state about a particular feed, which is unworkable for
search results; e.g., to implement feed history for search results, a
server would have to mint a whole new set of feed documents for every
query, and keep them around. That's not workable for most search
engines (Yahoo, Google, Amazon, whatever), so they need another
option -- one that needs to be clearly distinct from FH.

This brings me to my other motivation -- I found that most people who
use "previous" and "next" don't understand the assumptions that FH
makes about archive stability, and point them at URIs like "http://
example.org/feed.atom?page=3". That will break the FH algorithm
badly, reducing the value of the mechanism as a whole, because people
will stop trusting it. The link relation for implementing the
incremental approach needs to have the stability semantics baked in
and explicit.

As I previously said, the current FH algorithm isn't workable for all
the use cases (notably with paged non-incremental feeds, those feeds
being snapshots or "live feeds") but that doesn't mean there's a need
for other rel values.

I think FH could:
- (more explicitly) RECOMMEND using stable "chunks" so that caching
can be used (maybe using an "overview" section similar to the APP's
section 5)
- provide algorithms as an "how-to optimize bandwith et al. and not
retrieve the whole set of pages/chunks", not as a "you should/must do
that to comply to this spec"
- provide different algorithms for incremental and non-incremental
feeds, non-incremental feeds retrieval having to go through all the
pages/chunks and not stop as soon as there's no more change (but the
whole algorithm remains almost the same)
- RECOMMEND using RFC3229 w/ feeds (problem: this is not an I-D),
particularly for non-incremental feeds (Atom does not define order for
the entries in the feed, order has to be told by extensions, e.g. Feed
Rank, so there's no problem retrieving only a subset of the whole
feed, even if new/updated entries would be sparsed in the feed if
retrieved without RFC3229 w/ feeds), and NOT RECOMMEND (or even a
"MUST NOT") using paging for RFC3229 responses: either you return the
whole set of changes, or you choose to ignore the "IM" request and
return the feed as if the client didn't use RFC3229)
- use 304 (Not Modified) instead of IRI-comparison, enlightening the
fact that web servers like Apache provide such response out-of-the-bow
when delivering files (this should incite people to using stable
chunks/pages by saving them to files, instead of making use of CGI
applications with database requests). The fact that many
implementations currently return feeds (whether or not they're using
paging/history) without processing preconditions (If-Modified-Since or
If-Not-Match) shouldn't influence the FH design, many of these
implementations are deployments of CMSes like WordPress, and those can
be fixed. Maybe the Feed Validator can have an option to test this: it
would retrieve the feed twice (at least), using If-Modified-Since
and/or If-Not-Match for the second request, and showing a warning if
the feed hasn't changed and the second GET didn't result in a 304 (Not
Modified).


Please note that I haven't went back read the FH draft since weeks, so
some of my comments might not be accurate (I've a pretty good memory
but data-losses might still happen ;-) )

--
Thomas Broyer

Reply via email to