On 13 February 2013 11:51, Jimmy O'Regan <[email protected]> wrote:
> On 13 February 2013 11:27, Per Tunedal <[email protected]> wrote:
>> Hi,
>> I'm experimenting with the script on the page
>> http://wiki.apertium.org/wiki/Building_dictionaries .
>>
>> I'm repeatedly getting an error message:
>>
>> 'import sitecustomize' failed; use -v for traceback
>>
>> All the same, I get results:
>>
>> <e><p><l>plånbok<s n="n"/></l><r>Portemonnæ<s n="n"/></r></p></e>
>> <e><p><l>programkod<s n="n"/></l><r>Kildekode<s n="n"/></r></p></e>
>> <e><p><l>register<s n="n"/></l><r>Register<s n="n"/></r></p></e>
>> <e><p><l>replik<s n="n"/></l><r>Replik<s n="n"/></r></p></e>
>> <e><p><l>scanner<s n="n"/></l><r>Skanner<s n="n"/></r></p></e>
>>
>> The Danish national characters are distorted, though.
>>
>> Any suggestions?
>>
>
> cat [your file]|perl -MEncode -ane 'chomp;if(m!(<e><p><l>)([^<]*)(<s
> n="n"/></l><r>)([^<]*)(<s n="n"/></r></p></e>)!){print
> "$1$2$3".encode("iso-8859-1",decode("utf-8", $4))."$5\n";}'
>
Actually, I can save you a little more editing on the output, if it's
ok to assume that if the left is lowercase, then the right should be
too: |perl -MEncode -ane 'chomp;if(m!(<e><p><l>)([^<]*)(<s
n="n"/></l><r>)([^<]*)(<s
n="n"/></r></p></e>)!){$rec=encode("iso-8859-1",decode("utf-8",
$4));if($2 eq lc($2)){$rec=lc($rec);}; print "$1$2$3$rec$5\n";}'
--
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff