It should be possible to use the contents of index.html to set the order for concatenation too.On Thu, 25 Aug 2011, Bob Proulx wrote:
> RiverWind wrote: > > The idea was to concat a large html file and then convert it to > > text. The pdf can be converted to text, and it so far seems like a > > pretty viable translation. > > If I were going to do that for myself I would convert each individual > html file to text first and then concatenate the individual text > files. The reason being that the individual html files are at that > moment completely consistent. Individually they should be able to > convert to text cleanly with no problems. And then the text can be > concatenated. But once you concatenate the html then you have created > a Frankenstein html file that is almost certainly going to be > problematic to convert to text. > > Also, my naive experience with this is that converting html to text is > a lot easier than converting pdf to text. With html it is already a > text type. The mime type is "text/html" after all. But pdf has been > less accessible for conversions for me. The mime time is > "application/pdf" and isn't a text type. That introduces more room > for error to be introduced. > > Bob > Jude <jdash...@shellworld.net> "I love the Pope, I love seeing him in his Pope-Mobile, his three feet of bullet proof plexi-glass. That's faith in action folks! You know he's got God on his side." ~ Bill Hicks -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/alpine.bsf.2.00.1108251749530.8...@freire1.furyyjbeyq.arg