Re: Very Fast: Directly Coded Lexical Analyzer

J.C. Pizarro Thu, 31 May 2007 13:34:21 -0700

From "Frank Schaefer" <[EMAIL PROTECTED]> wrote:

Dear GCC Team,


Last weekend I finished the release of my directly coded
analyzer generator engine for Quex. First, I thought, it would
be just a nice idea to step away from table driven approach
of flex/lex. Directly coding also facilitates the step towards
analysis of different character encodings (iconv as a pre-filter).


Like e.g. the generated code
  IF match-char1 THEN ..
  ELSIF match-char2 THEN ..
  ELSIF match-char3 THEN ..
  ..
  END
?

Why to complicate the things? The determinist finite automaton (DFA)
table-driven helps us to express efficiently this with regular
expressions (RE).

Too, with the Ken Thompson & Kleene theories, it's not easy to be
hand-written the implementation of an analyzer or analyzer generator.

It's possible to be unstable your quex analyzer generator while the
lex/flex generator is stable.

I was really amazed about the performance of this approach and also
about the fact that probably not many people have tried to go
that way. First benchmarks have shown a boost of 200-250% in speed
gain over a flex generated engine! Still, there are some topics
of synchronisation with OS-buffering which I have not addressed yet. So
I think there can be even more of a speed gain.


To obtain 200-250% in speed gain won't be possible for this GCC
optimizing compiler because of http://en.wikipedia.org/wiki/Amdahl%27s_law

To understand the law's idea, to see first the red-A & blue-B graphic.

GCC throws more time optimizing than analyzing lexically unless it
deactivates the optimizing option -O.

Is there any interest in using such an engine in the GCC toolset?
It would be an honor for me to provide any adaptions you require.


Adaptions?
The C, C++, Fortran, ObjectiveC, ObjectiveC++, Java and Ada
grammars are BIG to do carefully these complex adaptions!!!

Anyway, quex's syntax is mostly conform to flex/lex, so there is
not much 'getting used to' with this generator. I am also positive,
that it is very hard to program a hand-written analyzer that is faster,
since the engine does not do any house-keeping and profits from Hopcroft-
Optimization and Binary-Search for code generation. These things are
hard to to by hand.


The basic transformations of the Ken Thomson & Kleene theories like
the NFA <-> DFA transformations are hard too.

http://en.wikipedia.org/wiki/Regular_expression
http://en.wikipedia.org/wiki/Finite_state_machine
http://en.wikipedia.org/wiki/State_diagram

There is a sourceforge project at http://quex.sf.net where
the generator can be downloaded. The documentation is still
'first draft' but shows what the thing it can do.

The core engine comes with a large set of unit tests, so I am feel
comfortable about its stability.

Best Regards

Frank Schäfer
--
// Dr.-Ing. Frank-René Schäfer, Bodelschwinghstr. 28
// D-50170 Kerpen
// Tel.: 49+176/22 02 58 59;


Good bye :)

Re: Very Fast: Directly Coded Lexical Analyzer

Reply via email to