Hej Per,
Fine code, most of us have made our own code at some point.
In Linux these tools are not that needed as there are powerfull tools like
sed and grep that makes text processing from command line easy enough.
WRT formatting and sorting, there is apertium-dixtools.
Its even mentioned in your dixes:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Dictionary:
Sections: 3
Entries: 10140
Sdefs: 64
Paradigms: 341
* Last processed by: apertium-dixtools fix apertium-sv-da.da.dix x.dix
*
-->
You should try that out, and you might make an apertium-dixtools extension
that fits your needs.
dixtools is a monster program that reads (several) dixes into memory as
Java objects and lets you manipulate them and save them again. It remembers
formatting as well.
$ apertium-dixtools
Usage: apertium-dixtools [task] [generic options] [task parameters] ...
Tasks:
autorestrict: automatically adds restrictions on a bidix so no
ambiguity exitst
autoconcord: automatically makes gender, number etc in bidix concord
with the monodices
cross: cross 2 language pairs (using linguistic res. XML
file)
cross-param: cross 2 language pairs (using command line
parameters)
merge-morph: merges two morphological dictionaries (monodix)
equiv-paradigms: finds equivalent paradigms and updates references
list: lists entries in a dictionary
reverse-bil: reverses a bilingual dictionary
sort: sorts (and groups by category) a dictionary
format: formats a dictionary (according to Generic Options)
fix: fix a dictionary (remove duplicates, convert spaces)
For help on a task, invoke it without parameters
Generic options: (mostly for tasks that outputs dix files)
-debug print extra debugging information
-useTabs use tabs (instead of default 2 spaces) when indenting
-noProcComments don't add processing comments (telling what was
done)
-noHeader don't put header comment with a summary in the top
-stripEmptyLines removes empty lines (originating from original file)
-alignBidix align a bidix (<p> or <i> at col 10, <r> at col 55)
-alignMonodix align a monodix (pardef 10, 30, other entries 25,
45)
-align [[E] P R] custom align (default <p>/<i> at col 10, <r> at col
55)
-alignpardef [[E] P R] paradigm alignment (if differ from general align)
-noalign old, noncompact XML-ish output (one tag per line,
lots of indents)
If no -align option is specified, the alignment is autodetected
Use - as file name for piping (read/write .dix files on standard
input/output)
More info: http://wiki.apertium.org/wiki/Apertium-dixtools
2013/1/7 Per Tunedal <[email protected]>
> Hi,
> yes, you're right. But ...
> The pair Swedish-Danish (sv-da) is full of comments about different
> groups of words (within the categories). Comments like: check these
> etc.
> Sorting would create a terrible mess! And when the dictionary isn't
> sorted, it's not obvious where to put a new entry.
>
> My simple tools makes it possible to add new words, without solving all
> old problems. I prefer to tackle them one at a time, when I feel like
> it.
>
> BTW: isn't it a shame that we all have are own personal tools for adding
> words? It would be great if we could build an universal tool for adding
> words.
>
> Yours,
> Per Tunedal
>
> PS I've just updated my tools to version 0.2. I've added the possibility
> to create a monodix with the help of a bidix, and fixed a lot of bugs.
> Now you can work like this:
> A. Create a monodix in language 1.
> B. Create a bidix from the monodix (simply adding the translation)
> and finally,
> C. Create a monodix in language 2 from the bidix.
>
> http://www.tunedal.nu/download/AddToDix/
>
>
>
> On Sun, Jan 6, 2013, at 23:31, Bernard Chardonneau wrote:
> > > X-Mailer: MessagingEngine.com Webmail Interface - html
> > > Date: Wed, 02 Jan 2013 17:32:44 +0100
> > > From: Per Tunedal <[email protected]>
> > > To: Apertium Stuff <[email protected]>
> > > Reply-To: [email protected]
> > > Subject: [Apertium-stuff] Tools for adding to dictionaries
> > >
> > > Hi again,
> > > (..........)
> > >
> > > A side effect is that it will be unnecessary to sort the dictionaries!
> > > You can be sure that you don't add some word already present and new
> > > words can be pasted anywhere in the dictionary files.
> > >
> > I just answer to this point.
> >
> > I previously wrote, I also have my personal tools for adding word,
> > and I think we are a lot to work like this.
> >
> > But sometimes, we just have to correct a mistake.
> >
> > For instance, last week, I compared analysis of French done by eo-fr
> > and fr-es pairs. And I found a result for the surface form "joue"
> > with the verb "jouir" whitch is wrong. The solution was to change the
> > paradigm for this verb in the French monodix.
> >
> > To do that, a text editor is enough, and there is a search fonction
> > in it. But if you just ast for the string "jouir", you can find bigger
> > word with the same letters inside. So, if you did not think to type
> > the word between "" for a monodix and between >< for a bidix, having
> > words in alphabetic order will make it easier to see if we went too
> > far in the file or not (when working on more than one word).
> >
> > So, an automatic sorting when adding words may save time and make
> > dictionaries more pleasant to edit on other occasions.
> >
> > > (..........)
> > >
> > > Yours,
> > > Per Tunedal
> > >
> --snip--
>
>
> ------------------------------------------------------------------------------
> Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
> MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
> with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
> MVPs and experts. SALE $99.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122412
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>
--
Jacob Nordfalk <http://profiles.google.com/jacob.nordfalk>
javabog.dk
Androidudvikler og -underviser på
IHK<http://cv.ihk.dk/diplomuddannelser/itd/vf/MAU>og
Lund&Bendsen <https://www.lundogbendsen.dk/undervisning/beskrivelse/LB1809/>
------------------------------------------------------------------------------
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff