Re: location of UnicodeData.txt

Jim Penny Tue, 10 Dec 2002 17:41:52 -0600

On Tue, Dec 10, 2002 at 05:18:38PM -0500, Branden Robinson wrote:
> On Thu, Dec 05, 2002 at 08:33:08PM -0600, John Hasler wrote:
> > However, if that data can only be usefully expressed in precisely that way
> > (that is, reverse-engineering those algorithms would regenerate the file)
> > then the copyright on the file is probably unenforceable.
> 
> Exactly.  If there is no possibility for original expression within the
> technical constraints imposed, one has no ability to generate the sort
> of work which copyright is designed to protect.


about 48 or more scripts are encoded.

ASCII was frozen.

That leaves 47! ways to order the scripts (and they did not choose
alphabetic by english name).

Latin alone has 840 "code points" (characters).  Even given that there
is a traditional ordering in the portions of this, there are other big
spans that have no natural order.  Bunch more choices made here.

Then, each character has a potential of 22 binary "properties", (not 
derived from UnicodeData.txt, but in a separate file PropList.txt), and 
14 "fields", most of which have 20 to 256 or more options.

I would venture to guess that even with a perfect oracle, it would be
essentially imposible to reverse engineer the Unicode data files, much
less the ancillary algorithms.  That is, a 32 bit search space with at
least 36 properties to be discovered per data point is whopping big.

Jim Penny

Re: location of UnicodeData.txt

Reply via email to