A 2014-10-11 14:22, Adrian Chaves Fernandez escrigué: > The first issue I found that I would like to fix is the capitalization > of > headers. > > For example, “A General Introduction” is translated as “Unha > Introdución > Xeral”, but I want it translated as “Unha introdución xeral”. > > At the point when I pass the source string to Apertium I know that the > source > string that I am passing Apertium is a header string, so I can actually > workaround the issue outside of Apertium. This is not a perfect > aproach, as I > might end up lowercasing proper nouns, but the headers that I am > translating > do not usually have those. > > However, ideally I would like Apertium to detect that the text is > capitalized > as a header (all words, or combinations or nouns and other, are > capitalized), > and to uncapitalize words > > But before I go that way, I would like to know if this can be done on > the > Apertium side instead somehow, and if so, whether that would be a good > approach, or whether I should perform the changes on the translated > string > myself nonetheless.
For me, in English "A General Introduction" is bad style, I would prefer to have the header as "A general introduction" in English. So, I think in this case it is probably better to handle this outside of Apertium in a pre-normalisation stage, but using parts of the Apertium pipeline to aid in the normalisation. For example, in order to retrieve the dictionary form of the word, you can use the morphological analyser, with the option -w. E.g. $ echo "A Tourist's Guide To Barcelona." | lt-proc -w ~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin ^A/a<det><ind><sg>$ ^Tourist/tourist<adj>/tourist<n><sg>$ ^'s/'s<gen>/be<vbser><pri><p3><sg>$ ^Guide/guide<n><sg>/guide<vblex><inf>/guide<vblex><pres>$ ^To/to<pr>$ ^Barcelona/Barcelona<np><loc><sg>$^./.<sent>$ Then you could use a script like this: http://paste2.org/gYF4j4Wj $ echo "A Tourist's Guide To Barcelona." | lt-proc -w ~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin | python3 /tmp/untitle-case.py A tourist 's guide to Barcelona. $ echo "A Tourist's Guide To Barcelona." | lt-proc -w ~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin | python3 /tmp/untitle-case.py | apertium -d ~/source/apertium/trunk/apertium-en-es/ en-es La guía de un turista a Barcelona. vs. $ echo "A Tourist's Guide To Barcelona." | apertium -d ~/source/apertium/trunk/apertium-en-es/ en-es La guía de un Turista A Barcelona. The superfluous space could be removed fairly easily. But I leave that as an exercise to the reader :) Regards, Fran ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://p.sf.net/sfu/Zoho _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
