Hi Chris, first off, I don't want to dispute that your program is useful. I just do not think that it is quite ready to be in a stable release and listed some things that I believe to be stoppers. BTW: my main use case I remember is the python2.x chm documentation, I didn't wait for chm2pdf to finish with that because it took too long.
Just a quick comment about the links and the efficency: - for the removal of anchors: I would really prefer to have these preserved by default and maybe have an option to remove them (and in this direction). Alternatively: How about bookkeeping in the following way: Compile a hash (dict or anydbm) mapping linked anchor-URLs to file positions where they are linked to unverified links->[(file, pos1a,pos1b,pos1c),(file2, pos2a,pos2b)...] and a python set (or dict with foo->None entries) of verified anchors as you go through the files, record each link and anchor (throwing out unverified links once you hit an anchor). After that, you need to go through the had and delete the anchor at those position. Because 1) links get shorter by this 2) you can just have more whitespace in the <a href=" "WHITESPACEHERE>, you just need to overwrite very specific locations and not even rewrite the whole file. - For the actual replacement: Just do the replacements in one go, then you don't need all of that. You can easily do this by declaring a function that does the link replacement and above accounting and then passing it as the replacement argument to re.sub. For the security issues: The shell-escaping when using system is really important because you might accidentally overwrite or remove important stuff when using filenames with spaces or so (imagine calling chm2pdf from your home dir on a file "MyInfoAbout Mail andStuff.chm" and then you remove "Mail" because of the spaces. As for the TMP dir: it is best to use TMP as patched unless the user explicitly specifies a different work (instead of /tmp/SOMETHING) dir. Using HOME can be very costly (e.g. when HOME is networked) and it's not cleaned up (as TMP is on reboot) should your program die for some reason without cleaning up and would be cluttered. All other programs (well almost, some have the same bug) work in safely created TMP subdirs unless the user very specifically elects not to. Yours should, too. This may not be of importance to you individually, but when a user installs 1000 packages that is something that one should be able to rely on being consistent across all of them. Kind regards T. -- Thomas Viehmann, http://thomas.viehmann.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]