Missatge de Kevin Brubeck Unhammer <[email protected]> del dia dt., 2 de
febr. 2021 a les 13:35:

> Flammie A Pirinen <[email protected]> čálii:
>
> > Hi all,
> >
> > I've written a handful of apertium-fin-* prototypes and I usually end up
> > spending way too much time with all the useless subclasses of proper
> > nouns we have (cogs, ants, als, tops, orgs, and to top all that,
> > sometimes ms and fs for some extra (mis)gendering). Could we just get
> > rid of those or those someone have a good use for them? Most of the time
> > it's very random anyways and we aren't really doing NERing or anything.
> > I think if these are used in e.g. cg or whatever we should probably have
> > different way of introducing them that doesn't intervene with
> > analysis-generation stuffs, like we talked passing by in the last
> > apertium zoom meeting? Or is there some smart way to bypass them I
> > haven't thought of (probably)
>
> Genders are useful when anaphora resolving / in transfer, though only on
> person names. There are some place/org names from swe that have genders
> (originally from SALDO) which bled into other scandipairs – I'd be happy
> to remove those since they seem quite useless for us.
>
> The <ant>, <cog> and <top> tags are used quite a bit in the nob
> disambiguator, but not in transfer.
>
> I tend to underspecify np's in bidix:
>
> <e> <p><l>Iran<s n="np"/></l><r>Iran<s n="np"/></r></p></e>
> <e> <p><l>Thiel<s n="np"/></l><r>Thiel<s n="np"/></r></p></e>
> <e> <p><l>Saruman<s n="np"/></l><r>Saruman<s n="np"/></r></p></e>
> <e> <p><l>Contras<s n="np"/></l><r>Contras<s n="np"/></r></p></e>
>
> so just the monodixen need to be synced. If there is an actual
> bidix-relevant difference, e.g. some place name gets translated but not
> if it's a person name, then one can specify the tags for just that
> entry.
>
> The remaining problem is when the analyser gives ^Saruman<np><al>$ and
> you try to send that into a generator that expects ^Saruman<np><ant>$.
>
> We could perhaps use the Giellatekno solution for that, where dixen have
> RL entries that just contain <np> (ie., no cog/ant/al), and some
> transfer step cleans off the tags. Should be a fairly simple change, and
> it's tried and tested in giella-pairs. Since lttoolbox is used mostly
> for languages where np pardefs are small, adding the RL's is like max
> 10 extra lines; for languages requiring hfst it's probably a fairly
> simple twol or xfregex rule?
>
>
The question of np is complex, and it certainly needs to be thought
through. The problem is that for some pairs some differences are relevant,
but probably for most they are not.

As Kevin says, gender may be useful for the anaphora resolution, but the
truth is that I have not dared to put Navratilová or Kurnikova as feminine
surnames in the dictionaries of Romance languages I have worked on.

On the other hand, the difference between names and surnames is important,
as the former are sometimes translated, while the latter are rarely
translated (it's more of a transliteration problem, since in almost every
Romance language Russian surnames are spelt differently: it's deadly!)

The difference between people and places is important in the languages I
deal with: prepositions, for example, can have different translations.

I am more sceptical about the need to distinguish between toponyms and
hydronyms. In some languages one will have an article and the other will
not, but these are rare cases. On the other hand, we do not distinguish
between countries (or regions) and cities, which in French is quite
important both for generating the article and the preposition preceding it,
if you translate from Catalan or Spanish: for instance, "New-York" is the
city, but "le New-York" is the state, so will have "à New-York" or "au
New-York" for "in New-York" (or "à Paris" but "en France").  The generation
of articles may also not be the same whether "Barcelona" stands for the
city or the (football or whatever) team, nor is the gender often the same.
So, are we then going to create more and more subtypes ad nauseam? Better
not!

In short, we can find casuistries in certain pairs that may make us think
that some distinctions are appropriate, but adding them in monolingual
dictionaries and forcing them to be maintained for all languages seems
doubtful to me. I would remove the distinctions in some pairs between np.org
and np.al and np.hyd and np.top, for example. I agree that the gender of
place names in Romance languages is often a burden and not clear at all,
but this could be solved if we define them as "mf" instead of "m" or "f".

Hèctor




> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to