Hi again,

>> I do not see how not encoding the morphological information that does
>> not change makes the data less reusable. All the "relevant" information
>> is there and the "irrelevant" one can be easily added.
>
> Not easily added by a non-expert user. If there were tools that would
> automatically "introduce" this information it may be a different
> matter.
>
> But if you want to go from a bilingual dictionary to some kind of
> CSV/text-based dictionary, with all the grammatical information, then
> including it is useful.

I like the linguistic information to be easily reusable if it does not 
make it harder to use existing automatic tools, as is the case. Our 
business is Apertium-based RBMT, let's make things easy for the 
development of Apertium-based RBMT systems.


>
>>> In any case I think it is probably not a good idea to assume that the
>>> bilingual dictionary only encodes "different" information. If there is
>>> another way to find it out, it would be better.
>>
>> Not encoding the morphological information that does not change makes it
>> possible to automatically infer structural transfer rules with
>> apertium-transfer-tools. This tool is around for more than 4 years.
>>
>> I think it is not a good idea to change the way we do things. When we
>> designed Apertium we took the decision of not encoding the morphological
>> information that does not change in the bilingual dictinary and I think
>> that we should stand to what we decided at that moment if there is not a
>> "good" reason for the change and "reusability" is not (see above).
>
> I've been doing it this way since as long as I can remember.

The way it is done in apertium-es-ca (and many others) is the way we 
thought it should be done when we designed Apertium. Not doing in that 
way means not being able to use automatic tools than can help in the 
development of new language pairs.

I think we should promote the way it was done in apertium-es-ca as the 
"canonical" way of doing things. Including all the morphological tags in 
the bilingual dictionary does not help machine translation if they do 
not change. What do the rest of the PMC members think?

> Also, what you really mean, is the "information apart from part of
> speech that does not change". Otherwise we should have entries like:

Yes, I mean that. Thanks for the clarification.

>    <e><p><l>coche</l><r>cotxe</r></p></e>
>
> Anyway, I'm sure some solution can be come up with, perhaps a prefix
> list of parts-of-speech, and then compare the remaining tags to see if
> they are equivalent on both sides.

We already have a solution.

> But in any case, if you need to test apertium-transfer-tools, then there
> are pairs which I think follow the old standard: es-ca, es-pt etc.

It is not a matter of testing apertium-transfer-tools. The way new 
language pairs are being developed avoids using it, and in some cases it 
could be of help. That's it.

Cheers
--
Felipe

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to