On Thursday, 2018-05-24 17:27:05 -0700, Laura Ekstrand wrote: > Use Beautiful Soup to fix bad html, then use pandoc for converting to > rst. > --- > docs/rstConverter.py | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > create mode 100755 docs/rstConverter.py > > diff --git a/docs/rstConverter.py b/docs/rstConverter.py > new file mode 100755 > index 0000000000..5321fdde8b > --- /dev/null > +++ b/docs/rstConverter.py > @@ -0,0 +1,23 @@ > +#!/usr/bin/python3 > +import glob > +import subprocess > +from bs4 import BeautifulSoup > + > +pages = glob.glob("*.html") > +pages += glob.glob("relnotes/*.html") > +for filename in pages: > + # Fix some annoyingly bad html. > + with open(filename) as f: > + soup = BeautifulSoup(f, 'html5lib') > + soup.find("div", "header").extract() # Get rid of old header > + soup.iframe.extract() # Get rid of old contents bar. > + soup.find("div", "content").unwrap() # Strip the content div.
Good call on using beautifulsoup to clean the html before converting it! > + > + # Write out the better html. > + with open(filename, 'wt') as f: > + f.write(str(soup)) > + > + # Convert to rst with pandoc. > + name = filename.split(".html")[0] > + bashCmd = "pandoc " + filename + " -o " + name + ".rst" > + subprocess.run(bashCmd.split()) Idea: remove the old html at the same time as we introduce the rst (commit-wise), so that git picks it up as a rename with changes, which hopefully would be easier to check as a 1:1 of any given conversion? (In case this is as unclear as I think it is, I'm thinking about how we can review individual pages conversions; say index.html -> index.rst, to see that no release has been dropped in the process. If git shows this as a rename with changes, I expect it will be easier to check than if one commit creates all the rst files and another deletes all the html) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev