On Jan 10, 2008 1:10 PM, Denver Gingerich <[EMAIL PROTECTED]> wrote: > On Jan 6, 2008 10:40 PM, Andrew Clausen <[EMAIL PROTECTED]> wrote: > > Hi Denver, > > > > On Sun, Jan 06, 2008 at 01:58:18PM -0500, Denver Gingerich wrote: > > > Thanks for the patch. I'm not sure what the conventions are for GNU > > > command-line tools with respect to outputting HTML. It seems that it > > > might be better to have an HTML post-processor that takes the normal > > > output of wdiff and converts it to HTML. That way wdiff only has to > > > worry about one type of formatting (plain text). This appears to be > > > what the GNU diff people expect since GNU diff doesn't natively > > > support HTML output. > > > > From a usability point of view, I think it's desirable to have a single > > wdiff front-end command to handle everything. (It's easier to find > > what you want with a single front-end, and the options in all the various > > output formats are likely to overlap substantially.) > > > > From a maintainability point of view, I don't see a big advantage from > > having > > HTML output generated via a wdiff post-processor. Most of the code would be > > parsing wdiff's output rather than generating html. > > One should consider where HTML output would be used most. In most > cases, HTML-ized diff output is used in web-based version control > viewing systems (such as ViewVC). Since wdiff works better than diff > for long lines (which are more common in written text than in source > code), it might also be used in web-based document histories, such as > those provided by MediaWiki. > > In both cases, the request is made via a web interface (ie. by > clicking a link that says "compare with previous revision") and the > response is provided via a web interface (the HTML-ized diff output). > As a result, implementing HTML-ized output in wdiff does not make > sense because its input is from a command line and its output is to a > command line. > > Now you could make the argument that wdiff could be run from within a > web scripting language (ie. in PHP: "$diff = `wdiff --html a.txt > b.txt`"). However, this is generally considered to be a hack and for > good reason. First of all, it requires a significant amount of > processing overhead in converting the input data to files and starting > a new process on the web server. Secondly, it makes dependencies > difficult to trace because a web application using wdiff needs to > specify that it requires wdiff to be installed on the web server. > Generally web server administrators prefer to install plugins for the > web server than command-line applications. Additionally, not all web > servers will support running command-line applications (ie. the above > PHP command will not work) for security reasons. > > I believe it is best for this sort of thing to be done in a library or > a dynamically-loadable web server module. Examples of this are the > use of Python's difflib in ViewVC [1] and MediaWiki's use of wikidiff2 > [2] (a dynamically-loadable module). wikidiff2 does include a > command-line version that prints HTML to standard out, but it is > exclusively for testing purposes. > > So the best bet for getting this into wdiff is to abstract out the > diffing part of wdiff into a library and then make wdiff use the > library and make a dynamically-loadable module that uses the library > to produce HTML output. Unfortunately, this is unlikely to work > properly until wdiff uses diff as a library because wdiff currently > exec()s the diff command directly (which I consider a hack), which > makes using the currently wdiff code as a dynamically-loaded module > equivalent in ugliness to using wdiff directly from the command-line > (it's just that your web server would depend on having the > command-line "diff" tool installed instead of "wdiff"). > > If you are interested in doing this (making wdiff use a diff library > instead of calling diff directly), I would encourage you and could > probably provide some help. This is on my long-term todo list for > wdiff anyway.
If you're still interested in making a tool that produces HTML output, I suggest looking at the diffseq module in gnulib [4], which I learned about after inquiring as to how one might split diff into a library and a command-line tool [5]. I'm not sure if diffseq will do word-wise diffing, but it shouldn't be too hard to modify it so it does. I will likely be moving wdiff to using diffseq after I finish all the other cleanup that needs doing. You may want to check back on the diffseq module after that happens as it (or a similar module) will definitely support word-wise diffing by then. Sorry I won't be adding your patch to wdiff. I wish you all the best in your work on free software projects. Denver > 1. http://viewvc.tigris.org/svn/viewvc/trunk/lib/idiff.py > 2. http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/wikidiff2/ > 3. > http://packages.debian.org/changelogs/pool/main/w/wdiff/wdiff_0.5-16/changelog 4. http://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=commit;h=a29749fa308fb3e489f2678f1acbff3877501479 5. http://lists.gnu.org/archive/html/bug-gnu-utils/2008-01/msg00040.html