Did you found any solution for the merging lines?

I found this problem translating the Europarl corpus (spa>cat). This
problem didn't happen before, but it happens now with the current nightly
version. It turns out that the sentences where the merging occurs (or
starts?) contain a 'soft hyphen' character (U+00AD). Removing this
character (in fact, it should be replaced by an em dash), there is no
merging.

Another change in behavior I have noticed is related to characters not in
the language alphabet.
Before it was: Kwaśniewski > *Kwaś*niewski
Now it is: Kwaśniewski > *Kwaśniewski
Which is preferable.

Jaume Ortolà
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to