Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

Tanmai Khanna Fri, 04 Sep 2020 01:08:16 -0700

In response to this issue, empty blanks are considered blanks again, so a
<b/> or a <b pos="1,2,etc."/> in the output will put an empty blank if the
input had an empty blank in that position.


Once again, a <b/> and a <b pos="1"/> now does the same thing. Earlier, <b
pos="1,2.."/> was used to print a blank from the input and <b/> was used to
print a literal space in the output. Now, a <b/> will print a blank from
the input in order of the blanks read. In transfer rules written in the
future there's no need to add a pos attribute to <b/>, and the ones that
exist already will act the same as a <b/>.

This means that there's no way to reorder blanks from transfer rules now,
but that is by design. Hèctor, let me know if this solved your issue :)

Thanks and Regards,
*तन्मय खन्ना *
*Tanmai Khanna*


On Fri, Sep 4, 2020 at 12:15 PM Hèctor Alòs i Font <[email protected]>
wrote:

> Missatge de Tanmai Khanna <[email protected]> del dia dv., 4 de
> set. 2020 a les 9:22:
>
>> Hèctor,
>> Yes, the new improvements aren't backwards compatible but that's because
>> they're better than the system we had earlier. Here's the changes:
>>
>> So, you are saying that the new stuff is not backwards compatible, aren't
>>> you? There aren't any <b/> in the rule, but <b pos="1,2..."/>, which is not
>>> the same. Until now, <b/> means explicitly putting a blank, while <b
>>> pos="1,2..."/> means copying to the output whatever is in the input in a
>>> given point.
>>>
>>
>> <b pos="1,2..."/> and <b/> now do exactly the same thing. You don't need
>> to replace all of the former with the latter but even if you do or don't it
>> won't change anything. Until now it meant what you said but now it means
>> that if you see a <b/> or a <b pos="1,2,etc."/> then print one blank from
>> the blank queue in the output.
>>
>> Superblanks most of the time are blanks, but, as you now probably know
>>> better than anyone else, they can be lots of things; they can even contain
>>> no blanks at all. Even in some cases, like in Romance-language enclitics,
>>> we know there shouldn't be any blank at all before them, but we had to
>>> add <b pos="1,2..."/> for not loosing information on italics, bold letters,
>>> etc.
>>>
>>
>> You're right, except now we have a completely different system to deal
>> with italics, bold letters, and all markup, i.e. wordbound blanks, which
>> aren't considered blanks. Now that there is no information to lose, we
>> didn't want to burden the people who write transfer rules to explicitly
>> define positions of blanks. In cases where you don't want a space in the
>> output, you just don't put a <b pos="1,2"/> in the output rule.
>>
>>
>>> I'm not really ready to change all <b pos="1,2..."/> in the hundreds of
>>> rules I've been writing in several language pairs. Specifically for
>>> apertium-fra-frp, I hope it will be able to publish it before the new
>>> version of the Apertium core you are preparing, so they are needed right
>>> now.
>>>
>>
>> You won't have to change all of them. Most of them will work as it is.
>> The new system prints blanks in the same order as they were input, so it
>> won't harm most of the rules. The *only thing *you'll have to change, is
>> rules where you don't want a space in the output between LUs, you remove
>> the <b pos="1"/> from those rules. This is because now, an empty blank
>> isn't considered a blank anymore. This was because we want the users to
>> have control about whether they want a blank or not between their output
>> LUs, regardless of the input blanks. If we consider an empty blank, your
>> problem will be solved, but other problems will come up, where empty blanks
>> will appear in the output regardless of <b/>s in the output.
>>
>> So to conclude, the only thing you need to remove is the <b pos="1"/>
>> from rules where you know you don't want a space in the output, like num_n,
>> and maybe some enclitics. Apart from that, everything will work as it is.
>> To improve the system, at some point we'll have to add a change that isn't
>> strictly backwards compatible, and several people agree that after
>> wordbound blanks, we should stop handling blank positions in transfer rules.
>>
>
> The problem is that in 99% of the cases I want a blank in num_n, that is
> between the numeral and the name. In most of the cases we have "two cows",
> "3 dogs", etc. In Romance languages, the rule is needed mostly for gender
> agreement. The problem is that sometimes, as we see, we got something else.
> So the question is not whether I want a blank there or not. I want whatever
> was there. So, let me try to formulate it in another way. If I want to
> preserve what was written between two words, I shouldn't write <b
> pos="1,2..."/>, but if I want to add a blank, I have to add <b/>. Am I
> right? If this is correct, it comes to remove all <b pos="1,2..."/>. It
> seems it would be easier that they wouldn't be taken into account, and thus
> avoiding any change in the language pairs. Am I missing something?
>
> Hèctor
>
>
>>
>> If this isn't acceptable, we can discuss other possible solutions :)
>>
>> *तन्मय खन्ना *
>> *Tanmai Khanna*
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] We now have markup handling and reordering in Apertium!

Reply via email to