> User-Agent: Roundcube Webmail/1.2.3 > Date: Wed, 08 Nov 2017 13:25:59 +0100 > From: Francis Tyers <[email protected]> > To: [email protected] > Reply-To: [email protected] > Subject: [Apertium-stuff] is treating newlines the same as spaces the right > thing to do ? > > Very many times people come to me and ask why they get > a different number of lines out of lt-proc than they > put in. > > The answer is invariably that there is some multiword > that is gobbling up a newline. > > My question is: Is this ever the right thing to do ? I > struggle to come up with use cases for this. I'm not > sure how hard it would be to fix. But I thought I'd start > a discussion. > > Fran >
Sorry not to have participated to this discussion faster even if I am often surprised to see how fast several people answer emails from this list but also other people to web forums at working time for subjects very far from their job. I think the problem pointed is something like this : the red flower the red flower To Spanish with Apertium : La flor roja La flor roja First it's more coherent than what do other translators : Google : la flor roja el rojo flor Systran : la flor roja rojo flor Secondly, translations are not needed only for raw texts. With Apertium, a nicely presented html file is translated to another html file as nicely presented. It's important to keep that, so, line-feeds in the source file, and also groups of spaces must be kept in the translation as it is presently. According to text analysis, yes, considering a line-feed as a way to separate two words as a space could do, is a good solution. Several years ago, I wrote 2 deformatters (and a common reformatter similar to apertium-retxt) for man pages (where several lines star by a . on the first colon), and for mnemonic files. See apertium-c-formatters). Mnemonic files are like this : Original in French : 1_HEURE 1 heure ACCUEIL Accueil ANALYSER Analyser ANA_FSURF Analyse d'une forme de surface ATTRIBUTS Attributs .. Translation to Spanish with Apertium and the deformatter desmnemo : 1_HEURE 1 hora ACCUEIL Recepción ANALYSER Analizar ANA_FSURF Análisis de una forma de superficie ATTRIBUTS Atributos .. The keyword starting on the first colon stay unchanged and the right part is translated. Presently, a stupid example of mnemonic file can give strange results : LA la FLEUR fleur ROUGE rouge In English : LA the FLEUR red ROUGE flower Several years ago, I did not think to this problem, because for the mnemonic files I translated, the right par was between " ". LA "la" FLEUR "fleur" ROUGE "rouge" Translation to English : LA "the" FLEUR "Flower" ROUGE "Red" I found some other simple ways to ask Apertium not to take into account previous lines when translating the current one. For instance, a . can be added at the end of each line : LA la. FLEUR fleur. ROUGE rouge. LA The. FLEUR Flower. ROUGE Red. But it is better to add a , at the end of the line : LA la, FLEUR fleur, ROUGE rouge, LA The, FLEUR flower, ROUGE red, Unknown words can be also added to break a possible sequence of pattern-items : LA la xxx yyy FLEUR fleur xxx yyy ROUGE rouge xxx yyy is translated into : LA the *xxx *yyy FLEUR flower *xxx *yyy ROUGE red *xxx *yyy So, a good solution is to ask deformatters to add extra information to tell when the previous and the following line must not be taken into account for translating the current line. I sometimes saw [] after text deformatting. I don't know if this blank zone has a special meaning, but that could be a nice tag to tell Apertium the part on the left to [] and the part on the right must be translated separately. -------------------------------- Bernard Chardonneau (France) Phone : [33] 9 72 36 32 90 GSM phone : [33] 7 69 46 16 31 Multilingual websites for my free softwares : http://libremail.free.fr and http://libremail.tuxfamily.org http://cyloop.tuxfamily.org (mainly translated with Apertium) My general website (in french only) http://bech.free.fr ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
