Re: Paging, Feed History, etc.

Thomas Broyer Wed, 07 Jun 2006 06:17:15 -0700


[sorry, autocompletion got it to atom-protocol]


2006/6/7, Mark Nottingham <[EMAIL PROTECTED]>:


I've been talking to a number of people face-to-face about Feed
History and the use cases for it, and I've come to a position where I
believe there are two major use cases out there for putting together
multiple feeds to form one big, virtual feed.

1) So-called "incremental" feeds, where changes are happening at the
"front end" of the feed, while less recent entries are pretty much
static. If a change is required in an old entry, either a new revised
entry with the same ID, or something like a tombstone is required.
Publishers can easily make stable archives of old entries available,
and clients wish to take advantage of caching, etc. to avoid re-
transferring old entries again and again. A high degree of fidelity
may be required; e.g., it should be possible to accurately
reconstruct the entire state of the feed with no missed entries.
E.g., a blog feed, but this could also be seen as an "event feed",
where the entries are changes to the state of the underlying resources.

2) So-called "paging" feeds, where the entries are often the results
to a query, being paged through in groups so as to not overwhelm the
server and/or communications link. Entries may be arbitrarily added,
deleted and reordered. Clients expect to access what the portions
they need in relatively quick succession, and do not require absolute
fidelity. E.g., OpenSearch query results. The entries in the feed
directly correspond to the underlying resources, 1-to-1.

These are very different things. Incremental feeds require that the
server keep state around per-feed, which isn't viable for something
like query results, but fine for a blog. Paging feeds can lose
entries (e.g., if http://example.org/index.page?page=1 refers to
http://example.org/index.atom?page=2, page 2's contents can change
between the two fetches), which is OK for some applications, and not
for others.

As such, I'm pretty much convinced that they need to be dealt with
separately.


I'm not sure they're as different as you're saying…

From the client point of view, isn't an incremental feed just the same

as a paging feed, except that you're assured you won't loose entries
(well, that's not really true actually: "live feed document" can
change while you're following next/previous links, and could even
result in a new "archived feed document").

If applications need paging around snapshots to avoid loosing entries,
then they (servers) can create a snapshot of a query result set that
the client will page through (with some TTL before the snapshot is
deleted).

Actually, what changes is how clients deal with them, but those
different feeds probably won't be consume by the same clients: an
incremental feed is designed to be subscribed by a "feed reader" while
others are for more specific uses (search results, shared lists with
or without ranking info, etc.)
I'm not saying you won't use the same client application to process
those feeds, but the client is assumed to behave differently depending
whether it's dealing with an incremental feed or another paging feed
(i.e. a snapshot feed split into small "pages"). That behaviour is not
dependent upon the relations between the "chunks" but in how to deal
with the feed: if it's an incremental feed, I'll use a "feed state
reconstruction algorithm" to avoid downloading entries that have not
changed, and will probably also keep local copies of those entries; if
it's a "paged snapshot feed", the set of entries in the all the
"chunks" replaces the previous set, and some "algorithm"
(preconditions and/or IM) can still be used to try to avoid
downloading chunks that have not changed.

The only difference that *might* exist in dealing with the relations
is that, for an incremental feed, as soon as a requests returns a 304
(Not Modified), you can stop "paging", whereas for paged snapshot
feeds the next chunk might have changed…

But the real relationship between these pages/chunks/etc. is the same:
one comes after/before the other.
A "dumb" client could download the whole set of pages/chunks/etc. each
time, again and again, and will still have the same set of entries.
The "feed state reconstruction algorithm" is some kind of optimization
to help not wasting bandwidth: it doesn't change the way you follow
links, just the way you optimize retrievals.

So a "flag" in the feed telling which kind of optimization you can use
without loosing information is enough; something like the
fh:incremental element in version -04 of your draft.

Originally, I proposed that Feed History use "prev-archive" link
relations, but many people pushed back on that in favour of the more
generic "previous" and "next".


I think I originally pushed the other way around [1], but for
different reasons. If it's clear that previous/next deals with "pages"
or "chunks" of the same feed, then I'm OK with them, and other use
cases will need new rel values.

[1] http://www.imc.org/atom-syntax/mail-archive/msg17372.html

Given the above, I'd like to see if anyone would still object to
having separate relation sets for incremental feeds ("prev-archive"
and friends) and paging feeds ("previous", "next" and friends).


Given the above, yes. Consider it a "-0" though, not a "-1", as I
might need some more thinking (or people trying to convince me).

--
Thomas Broyer

Re: Paging, Feed History, etc.

Reply via email to