How to clean up HTML... Using shell with xmllint (yes, ugly shortcuts below):
export cmd="xmllint --html" find . -name '*.html' -exec $cmd \{\} > \{\}.new \; find . -name '*.html' -exec cp \{\}.new \{\} \; svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm (--html not totally necessary if you have valid XML, eg you can format xml as follows: export cmd="xmllint" find . -name '*.xml' -exec $cmd \{\} > \{\}.new \; find . -name '*.xml' -exec cp \{\}.new \{\} \; svn status | egrep '^\?' | sed -e 's/^\? *//g' | xargs rm ) Using shell with tidy: export cmd="tidy -m -i -c -e" find . -name '*.html' -exec $cmd \{\} \; In ant you would create a <fileset> and then do an <exec> of much the same. Both tools have some more interesting options. - LSD On Wed, Feb 01, 2006 at 09:29:56AM +1100, David Crossley wrote: > Martin Sebor wrote: > > > > I'm a little distressed to see the conversion process has messed > > up the formatting of the original HTML that I manually maintained > > for readability. Specifically, many of the terminating tags (such > > as </p>) are not indented as they ought to be and instead are in > > column 1. I don't suppose there is an easy way to regenerate the > > page so as to preserve more of the original formatting, is there? > > I tried my best to format stuff automatically > as part of the Forrest output process. If it > was raw xml serialiser output then it would have > been even worse. No we cannot retain original > formatting. > > I know that it is not good enough. > > Someone could run all documents through something > like HTML Tidy or Henning's CodeWrestler or perhaps > some XSL. > > I would be pleased to see how they do this, because > i want to add the ability to our future tools. > > On many projects i have seen messy source documents > cause grief with svn diffs - too much clutter and > inconsistent whitespace. > > -David > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]