A 2014-10-11 14:22, Adrian Chaves Fernandez escrigué:
> The first issue I found that I would like to fix is the capitalization 
> of
> headers.
> 
> For example, “A General Introduction” is translated as “Unha 
> Introdución
> Xeral”, but I want it translated as “Unha introdución xeral”.
> 
> At the point when I pass the source string to Apertium I know that the 
> source
> string that I am passing Apertium is a header string, so I can actually
> workaround the issue outside of Apertium. This is not a perfect 
> aproach, as I
> might end up lowercasing proper nouns, but the headers that I am 
> translating
> do not usually have those.
> 
> However, ideally I would like Apertium to detect that the text is 
> capitalized
> as a header (all words, or combinations or nouns and other, are 
> capitalized),
> and to uncapitalize words
> 
> But before I go that way, I would like to know if this can be done on 
> the
> Apertium side instead somehow, and if so, whether that would be a good
> approach, or whether I should perform the changes on the translated 
> string
> myself nonetheless.

For me, in English "A General Introduction" is bad style, I would prefer 
to have
the header as "A general introduction" in English.

So, I think in this case it is probably better to handle this outside of 
Apertium
in a pre-normalisation stage, but using parts of the Apertium pipeline 
to aid in
the normalisation. For example, in order to retrieve the dictionary form 
of the word,
you can use the morphological analyser, with the option -w. E.g.

$ echo "A Tourist's Guide To Barcelona." | lt-proc -w 
~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin
^A/a<det><ind><sg>$ ^Tourist/tourist<adj>/tourist<n><sg>$ 
^'s/'s<gen>/be<vbser><pri><p3><sg>$ 
^Guide/guide<n><sg>/guide<vblex><inf>/guide<vblex><pres>$ ^To/to<pr>$ 
^Barcelona/Barcelona<np><loc><sg>$^./.<sent>$

Then you could use a script like this:

http://paste2.org/gYF4j4Wj

$ echo "A Tourist's Guide To Barcelona." | lt-proc -w 
~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin | python3 
/tmp/untitle-case.py
A tourist 's guide to Barcelona.

$ echo "A Tourist's Guide To Barcelona." | lt-proc -w 
~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin | python3 
/tmp/untitle-case.py | apertium -d 
~/source/apertium/trunk/apertium-en-es/ en-es
La guía  de un turista a Barcelona.

vs.

$ echo "A Tourist's Guide To Barcelona." | apertium -d 
~/source/apertium/trunk/apertium-en-es/ en-es
La guía de un Turista A Barcelona.

The superfluous space could be removed fairly easily. But I leave that 
as an exercise to the reader :)

Regards,

Fran


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to