On Sun, 9 May 2021 Emanuel Berg wrote:
How can I generate a rss.xml from a bunch of HTML files?
Tho one would think this to be quite a simple tool of parsing
the HTML and outputting the RSS XML dialect, I can't find any
tool...
XSLT is a language that is sort of made for describing this kind of
transformation.
My degree of XSLT-clue is quite low, but occasionally I find a small
project pitched to my rudimentary ability, and try to level up a
little.
Whenever I do that, I find this debian package useful:
xsltproc - XSLT 1.0 command line processor
XSLT is an XML language for defining transformations of XML files from
XML to some other arbitrary format, such as XML, HTML, plain text, etc.
using standard XSLT version 1.0 stylesheets.
.
This package contains a command line tool that facilitates XSLT
transformations.
Homepage: http://xmlsoft.org/xslt/
Sometimes I need this one too, to tweak HTML into something xsltproc
can deal with:
tidy - HTML/XML syntax checker and reformatter
Tidy corrects and cleans up HTML and XML documents by fixing
markup errors and upgrading legacy code to modern standards.
.
This package contains a command line tool 'tidy'.
Homepage: http://www.html-tidy.org/
If I wanted to do what you want to do, those are the tools I'd use.
My couple of round logs, so to speak ;)
tt-rss maybe, but when I install it it tries to setup a MySQL
database which fails.
I don't know why, but it seems too involved anyway, there
isn't a webpile2rss tool like this or something:
$ webpile2rss *.html > rss.xml # sweet
It sounds to me like you want to make a script that calls xsltproc to
apply some XSLT transformation of your own devising. I think if I were
in your place, I would study a few examples like this simple one...
"The XSLT used by html2rss-web"
html2rss-web/rss.xsl at master · gildesmarais/html2rss-web · GitHub
https://github.com/gildesmarais/html2rss-web/blob/master/public/rss.xsl#start-of-content
...and this somewhat more complex-looking one...
"W3C RSS 1.0 News Feed Creation How-To"
https://www.w3.org/2001/10/glance/doc/howto
...and model my efforts on something in between.
Maybe you could find more appropriate models yourself, since you know
what you're looking for better than I do. But those are the two that
caught my eye first.
Anyways, good luck with your project.
--
Ce qui est important est rarement urgent
et ce qui est urgent est rarement important
-- Dwight David Eisenhower