https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119455

--- Comment #5 from Robert Dubner <rdubner at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> Possibly the place to write down something I noticed.  It seems to me that
> COBOL is a statically typed language but the GCC COBOL frontend seems to use
> something like a type/object descriptor for variables and constants and
> dispatching to the GCC COBOL runtime for all operations.  That means the
> static types are not exposed to GCCs optimizations and the middle-end has no
> idea what is done with all the data meaning optimization opportunities are
> basically non-existant unless you'd consider LTOing the GCC COBOL runtime
> and then having high hopes for GCC to untangle the thing.
> 
> So the optimization that's missing is probably finding appropriate
> middle-end types that match the COBOL static types and directly use those
> for variables and operations where possible.
> 
> I did not spot something like an intermediate COBOL AST being generated
> during parsing before handing off to GENERIC, so any such optimization that
> would require more than very local operation is probably difficult.

As I look through the PR's assigned to me, I ran across this one.

Let me expand on Jim's comments:

A commonly used elementary type is PIC 9(N) USAGE DISPLAY.  This is an N-digit
positive integer value, stored in memory as N characters, each '0' through '9'.

In our implementation, N can be 0 through 37, so, right off the bat, that would
be 37 static types, if we implemented it that way.

But wait!  More generally, that format can be used to specify fixed-point
variables, where PIC 99V999 is five characters of storage that represent 00.000
through 99.999.

It is thus the case we have PIC 9(M)V9(N), where M+N ranges from 1 to 37. 
Expand that out, and noticing that where M+N is, say, three, giving you V999,
9V99, 99V9, and 999V, then the the total number of constructions is the SUM [J]
where J ranges from 2 to 38.  That comes out to 38*39/2 minus 1, which is 740
different constructions for PIC 9(M)V9(N).

But wait!  That's just for unsigned values.  The sign indicator is a leading
'S', so PIC S999 can range from -999 to +999.  In memory, the sign indicator
can be internal or SEPARATE, and it can be LEADING or TRAILING.  That gives us
an additional four times 740 constructions, giving us a total of 3,700
constructions.

But wait!  The 'V' virtual decimal place isn't the only way of specifying a
fixed point value.  PIC 999PPP means describes a value ranging from 000000 to
999000, and PIC PPP999 ranges from 0.000000 through 0.000999.  (The P
characters are metadata, and do not specify memory storage; only the '9'
characters do that.)  It belatedly occurs to me that we probably don't limit
P(L) to anything, which means that if Jim is using a 32-bit unsigned int to
parse the value of L, it means for each of the 37 PIC 9(N) possibilities, there
are an additional eight billion possibilities of PIC P(L)9(N) and PIC 9(N)P(L).

Let's assume some kind of sanity, and limit L to 15.  (That lets us specify
exabytes and femtoseconds, I believe.)  That means for each of the 37 * 5
constructions of 9(N) possibilities that have no 'N', there are another 30,
giving us another 5,500 constructions on top of the 3,700 we've already
identified.

So.  We are up to 9,200 constructions just for unsigned and signed versions of
usage DISPLAY for 
     PIC 9(M)P(L)
     PIC 9(M)
     PIC 9(M)V9(N)
     PIC P(L)9(N)

I would be very unsurpised to learn that hardly anybody has gotten to this part
of this posting.  I mostly did it out loud because I've been wondering for a
while just how many constructions there are, and I took this opportunity.  Nor
would I be surprised to learn I messed up the calculation; I did everything
here in the message without checking my calculations on paper.

But whatever the exact value, there are too many to treat them as truly static
types, especially when you remember that USAGE DISPLAY is only one type that
has PICTURE strings.  So do USAGE BINARY, USAGE COMP-3, COMP-5, and COMP-6,
which each result in different memory storage of the COBOL variable.

Reply via email to