Re: [dev] suckless html to markdown (text)

2019-01-06 Thread Fischers Fritz
Although implementations usually get this wrong, Markdown is supposed to be an extension of HTML; that is, any HTML document is also a Markdown document. Consequently, you can use cat(1) to convert. cat webpage.html > webpage.md You likely want also to remove some of the HTML tags and use the M

Re: [dev] suckless html to markdown (text)

2019-01-06 Thread Nick
Quoth Alexander Krotov: > > Ideally, with sed/awk, or better in C. > > "Parsing" HTML with sed is simply wrong. This is a good point that I should have mentioned. I spent years using sed and awk to extract things from HTML, writing crawlers and suchlike, for personal projects. It can work, of c

Re: [dev] suckless html to markdown (text)

2019-01-06 Thread Alexander Krotov
> Ideally, with sed/awk, or better in C. "Parsing" HTML with sed is simply wrong. You need to use a decent HTML parsing library, as parsing HTML is complex. There is https://github.com/yujiahaol68/downmark that uses Go html library, but I have not tried it. Seriously though, if you are not g