Wow, that solves everything. Thank you very much!

2014-10-11 15:40 GMT+02:00 Francis Tyers <[email protected]>:

> A 2014-10-11 14:22, Adrian Chaves Fernandez escrigué:
> > The first issue I found that I would like to fix is the capitalization
> > of
> > headers.
> >
> > For example, “A General Introduction” is translated as “Unha
> > Introdución
> > Xeral”, but I want it translated as “Unha introdución xeral”.
> >
> > At the point when I pass the source string to Apertium I know that the
> > source
> > string that I am passing Apertium is a header string, so I can actually
> > workaround the issue outside of Apertium. This is not a perfect
> > aproach, as I
> > might end up lowercasing proper nouns, but the headers that I am
> > translating
> > do not usually have those.
> >
> > However, ideally I would like Apertium to detect that the text is
> > capitalized
> > as a header (all words, or combinations or nouns and other, are
> > capitalized),
> > and to uncapitalize words
> >
> > But before I go that way, I would like to know if this can be done on
> > the
> > Apertium side instead somehow, and if so, whether that would be a good
> > approach, or whether I should perform the changes on the translated
> > string
> > myself nonetheless.
>
> For me, in English "A General Introduction" is bad style, I would prefer
> to have
> the header as "A general introduction" in English.
>
> So, I think in this case it is probably better to handle this outside of
> Apertium
> in a pre-normalisation stage, but using parts of the Apertium pipeline
> to aid in
> the normalisation. For example, in order to retrieve the dictionary form
> of the word,
> you can use the morphological analyser, with the option -w. E.g.
>
> $ echo "A Tourist's Guide To Barcelona." | lt-proc -w
> ~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin
> ^A/a<det><ind><sg>$ ^Tourist/tourist<adj>/tourist<n><sg>$
> ^'s/'s<gen>/be<vbser><pri><p3><sg>$
> ^Guide/guide<n><sg>/guide<vblex><inf>/guide<vblex><pres>$ ^To/to<pr>$
> ^Barcelona/Barcelona<np><loc><sg>$^./.<sent>$
>
> Then you could use a script like this:
>
> http://paste2.org/gYF4j4Wj
>
> $ echo "A Tourist's Guide To Barcelona." | lt-proc -w
> ~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin | python3
> /tmp/untitle-case.py
> A tourist 's guide to Barcelona.
>
> $ echo "A Tourist's Guide To Barcelona." | lt-proc -w
> ~/source/apertium/trunk/apertium-en-es/en-es.automorf.bin | python3
> /tmp/untitle-case.py | apertium -d
> ~/source/apertium/trunk/apertium-en-es/ en-es
> La guía  de un turista a Barcelona.
>
> vs.
>
> $ echo "A Tourist's Guide To Barcelona." | apertium -d
> ~/source/apertium/trunk/apertium-en-es/ en-es
> La guía de un Turista A Barcelona.
>
> The superfluous space could be removed fairly easily. But I leave that
> as an exercise to the reader :)
>
> Regards,
>
> Fran
>
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> http://p.sf.net/sfu/Zoho
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://p.sf.net/sfu/Zoho
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to