On 10/05/2021 07:06, Andrei POPESCU wrote: > On Lu, 10 mai 21, 01:44:32, Emanuel Berg wrote: >> Charles Curley wrote: >> >>> Right. However, as I found out asking elsewhere, you can >>> include HTML in Markdown. >> Hehehe, let's see, first write HTML, then include it in >> Markdown, then have the static site generator generate >> HTML... brilliant :) > Surely there must be some site generator with RSS support that takes > "plain" HTML as input.
I would guess that there isn't, purely because the task of figuring out what information to extract is relatively awkward. OK, there are some easy tasks such as "What is the title of the page?" (<title> tag), "What is the publication date of the page?" (mtime of the file), but there are trickier questions: "Who was the author of this page?" (well, we could hope for a meta tag, and fall back to the user running the tool, perhaps) and "What's the copyright of the page?" (I'm fairly certain there's no standard tag for that in HTML). Finally, there comes to the tricky bit of the page summary. Most feeds provide a summary of the page content to entice readers to read the whole article; one or two paragraphs should be sufficient. But if you've ever used the "Reader Mode" of a web browser, or ever pointed a screen reader at a web page, you'll know that finding the body of the page isn't a 100% accurate task. This is why so many site generators prefer you to provide the pieces and they'll build up the final HTML. HTML *is* supposed to be a semantic language rather than a presentation language (that is, one could argue that the first few <p> tags are the first few paragraphs of the page), but if you're asking for a tool that can parse arbitrary HTML (including machine-generated HTML), then I don't think it's going to be easy. > > Kind regards, > Andrei
OpenPGP_signature
Description: OpenPGP digital signature