On Tue, 18 Oct 2005 00:35:33 +0200 Nicolas François <[EMAIL PROTECTED]> wrote:
> Given the results, I'm prety sure it is ready for a public release. > Maybe you should contact the debian-i18n@lists.debian.org list. Someone(s) > are already developping a spellchecker for the PO (used in translation) in > the various languages. Very glad to know it's well received, but it'll be a little while yet before I split the code up for public use. By split, I mean abstract it into useful parts, as with the "software tools" school of programming. > Your checker could be important to translate the man pages in a correct > english. That's one of the reasons for it. It's funny, because I sometimes feel as though it were too petty to submit a one-word bug, but when translation is factored in, that one word is multiplied many times; e.g. some poor translator whose second language is English may have to waste minutes verifying that "resursively" isn't a word and should have been "recursively". > > (**for example, it needs a way to know what typos have already been > > reported, otherwise the BTS could be clogged with redundant reports.) > > This is "just" an infrastructure issue. Typo bugs have an interesting property that's unlike most other bugs: they're unambiguous -- every report must include the typo, its location, and its correction. Even using the BTS we now have, it should be possible to search for any given typo bug. It wouldn't be an efficient search, but it needs no infrastructure, and until such an infrastructure existed would be better than nothing. > I can give you an example of what is done in the Debian French translation > team: > * A robot checks which files need to be translated > * Another robot checks what has been done so far (a mailing list, with > special tags is used for that), and whether the patch reported on the > BTS was applied or not. It also permits to assign files to > translators. > > I'm pretty sure we can setup such an infrastructure for the spellchecking > of the man pages (e.g. on Alioth). > And you could get help from other peoples in this HUGE task. > (Unless you are trying to be the first bug submitter ;) If I can help by making a tool or two, that'd be good. By "HUGE task" you mean translation, right? I'm not sure spellchecking is so huge, or rather maybe it needn't be once the proper tools exist -- but I agree that brute force (manual proofreading) is a huge task. BTW, lately I've been thinking about translation, and had this crazy idea for machine aided translation of man pages; (which may be old and rejected for all I know -- I'm not an expert). The idea, or ideas, as they come to mind: 1) Technical docs are an UNUSUAL type of language because they don't require any ambiguity. All terms and their relations should be unambiguous. So provided we know the "one meaning" of a tech noun, it can be given an exact translation, when one exists. The same should be true of any unambiguous grammatical relation. 2) Computers are (currently) lousy at ambiguity anyway. 3) Make up an ad hoc unambiguous extra detailed tech oriented meta-language, have humans translate a given source text into this meta-language, and have the computer translate that to other human languages, which could then be refined by human translators. What might such a meta-language look like? I'm thinking that if it were based on English, (it needn't be), we'd start with a source line like: "My dog's name is Fido, he likes people." ...and get something like, (only better): "My(possesive pronoun=A Costa, of next noun 'dog') dog(noun, 5 Webster)'s(possesive of next noun 'name') name(noun, 3 Webster) is (present tense verb, 2 Webster) Fido(proper noun, leave alone), he (pronoun masculine=Fido) likes(present tense plural verb, 2 Webster) people(plural noun, 6 Webster)." Every first-order pronoun would be identified with its noun. Any word with more than one sense would be attached to a specific definition in a big dictionary; in the above case "dog(noun, 5 Webster)" would mean "look in Webster's under 'dog', the noun definition, (not the verb), and sense #5 is what was meant." (Note: I don't know if sense #5 in any Webster's is what's needed, it's a fake example.) Obviously just translating it to a meta-language would be like five times harder than plain old translation. What is the gain if it's more work than the old way? The gain ought to be that the computer would then have an unambiguous text it could work with from there, so the time spent translating "upstream" to the meta language would save much more time translating a text "downstream" to a real language. So it first translates the grammar, (I'm assuming that's possible, or that we reserve this technique only for languages where it's possible, and that there are enough such languages to make it worthwhile), then it translates each unambiguous term. When it turns out that, say a French dictionary has no equivalent for "dog(noun, 5 Webster)", it prompts the human translator for one, adds it to the dictionary, and now it "knows" how to translate it. When it turns out the destination language itself has no equivalent, the translator can highlight the term somehow (italics, quotes, or whatnot), or attempt to coin their own term. Fringe benefit: if the meta language was logical enough, algorithms to simplify needlessly complex expressions might be used on a poorly written redundant sentence, and have it translate it back to the source language better than the original. > I'm definitely still interrested;) > And it could be interresting to make it speak French. Thanks for the feedback and interest! I'll certainly keep you posted when there's something fit to post.