Hi again, >> I do not see how not encoding the morphological information that does >> not change makes the data less reusable. All the "relevant" information >> is there and the "irrelevant" one can be easily added. > > Not easily added by a non-expert user. If there were tools that would > automatically "introduce" this information it may be a different > matter. > > But if you want to go from a bilingual dictionary to some kind of > CSV/text-based dictionary, with all the grammatical information, then > including it is useful.
I like the linguistic information to be easily reusable if it does not make it harder to use existing automatic tools, as is the case. Our business is Apertium-based RBMT, let's make things easy for the development of Apertium-based RBMT systems. > >>> In any case I think it is probably not a good idea to assume that the >>> bilingual dictionary only encodes "different" information. If there is >>> another way to find it out, it would be better. >> >> Not encoding the morphological information that does not change makes it >> possible to automatically infer structural transfer rules with >> apertium-transfer-tools. This tool is around for more than 4 years. >> >> I think it is not a good idea to change the way we do things. When we >> designed Apertium we took the decision of not encoding the morphological >> information that does not change in the bilingual dictinary and I think >> that we should stand to what we decided at that moment if there is not a >> "good" reason for the change and "reusability" is not (see above). > > I've been doing it this way since as long as I can remember. The way it is done in apertium-es-ca (and many others) is the way we thought it should be done when we designed Apertium. Not doing in that way means not being able to use automatic tools than can help in the development of new language pairs. I think we should promote the way it was done in apertium-es-ca as the "canonical" way of doing things. Including all the morphological tags in the bilingual dictionary does not help machine translation if they do not change. What do the rest of the PMC members think? > Also, what you really mean, is the "information apart from part of > speech that does not change". Otherwise we should have entries like: Yes, I mean that. Thanks for the clarification. > <e><p><l>coche</l><r>cotxe</r></p></e> > > Anyway, I'm sure some solution can be come up with, perhaps a prefix > list of parts-of-speech, and then compare the remaining tags to see if > they are equivalent on both sides. We already have a solution. > But in any case, if you need to test apertium-transfer-tools, then there > are pairs which I think follow the old standard: es-ca, es-pt etc. It is not a matter of testing apertium-transfer-tools. The way new language pairs are being developed avoids using it, and in some cases it could be of help. That's it. Cheers -- Felipe ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
