Bug#379003: Support for caching the full article associated with a headline.

Lars Lindner Fri, 21 Jul 2006 08:03:03 -0700

Am Freitag, den 21.07.2006, 06:39 -0700 schrieb Daniel Burrows:
> On Fri, Jul 21, 2006 at 01:01:56PM +0200, Lars Lindner <[EMAIL PROTECTED]> 
> was heard to say:
> > On 7/21/06, Daniel Burrows <[EMAIL PROTECTED]> wrote:
> > >On Thu, Jul 20, 2006 at 06:44:02PM +0200, Lars Lindner 
> > ><[EMAIL PROTECTED]> was heard to say:
> > >> On 7/20/06, Daniel Burrows <[EMAIL PROTECTED]> wrote:
> > >> >Package: liferea
> > >> >Severity: wishlist
> > >> >
> > >> >  A lot of RSS feeds seem to just be a blurb with a link to the full
> > >> >article (in a "link" tag, if I'm reading the XML correctly).  Since I
> > >> >mostly use liferea for reading blogs while I'm disconnected from the
> > >> >Internet, it would be nice if it had the ability to cache the full
> > >> >article and associated images as well.
> > >>
> > >> How would you retrieve this "missing" information. The publisher
> > >> of the feed explicitely does not deliver the content and will have
> > >> no interest that a feed reader can easily collect the information
> > >> from the website itself.
> > >>
> > >> Liferea has a feature to use external programs/scripts as a feed
> > >> source. Writing a website scraping script is the only solution
> > >> for this problem. I cannot be done automatically.
> > >
> > >  Um, there's a <link> tag with that URL in the RSS feed.  Here's a
> > >headling from the feed for the Beeb, for instance (with the formatting
> > >cleaned up):
> > 
> > The link itself is rendered when displaying the item. Following the
> > link and retrieving the relevant information from the website behind
> > and presenting it without all website structure, layout and advertisments
> > is not possible. But if you mean that Liferea should just load the link
> > inside the HTML pane, so that you automatically surf the website
> > the item link points to, that would be something that it is worth to
> > consider to implement. But currently not planned.
> 
>   Saving the file at that link and viewing it in a Web browser works just
> fine, although obviously I lose embedded images if I'm not online.  So,
> although I feel a bit silly having to make this point, it's clearly possible
> to download all the files associated with the Web page; and even just
> blindly downloading the file behind the <link> gets you most of the way
> there.


It is possible. But I see no use case. The publisher clearly does not
want you not to see the contents of the headline without going to his
website. And also Liferea pre-downloading web content would make it an
offline browser. But it is a feed reader and presents feed contents
per definition.

There are also technical reasons not to pre-download websites. First
the space needed to do so, if such websites are single HTML pages. As 
soon as CSS or frames are used they will become useless without 
recursive downloads. Even when ignoring multi media contents. Further
how long would such a webpage be valid, the online page might update
regularily. Next how to integrate the content into the item view. How
to signalize the user that this content is offline. And how to allow
changing the display mode to online. And how to follow links.

I don't want to say that those problems cannot be solved. But one can
clearly see the additional complexity of such a simple request.
Realizing complex features also mean setting a default way of using the
program which won't suite many people, so such a feature would need a
lot of consideration. 

The stated project goal is still "a _simple_ news aggregator".

Therefore: won't fix.

> 
>   PS: looks like I lost the BTS; sorry about that.  Feel free to bounce
>       my replies back there.

No problem.

Lars



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Bug#379003: Support for caching the full article associated with a headline.

Reply via email to