Package: flex, bison Version: 2.5.35-6, 1:2.3.dfsg-5 The parameter yyloc_param to yylex is specified in the bison documentation to be only a way for yylex to return the location to bison. It is not documented to be available for storage by yylex between calls to the lexer. In bison in a reentrant parser the variable is allocated on the stack inside yyparse which means that it may be corrupted between different calls to yyparse. And, indeed, a caller other than bison might reasonably pass an even more local stack variable for such an "out" parameter.
Furthermore, in the obvious calling pattern the contents ought to be used uninitialised. However, it seems that bison _does_ initialise the variable to {1,0,1,0}, on entry to yyparse, if it has the default type - but not otherwise. I can't seem to find this documented anywhere either. This causes difficulties because caller might want to call yyparse more than once on the same stream, since the applications' actions can cause yyparse to return early by using YYACCEPT or YYABORT. Currently such an application would find the location counting is reset on each entry to yyparse. With the attached input file, the following command: flex --header-file=libxlu_cfg_l.h --outfile=libxlu_cfg_l.c libxlu_cfg_l.l produces a scanner which relies on the value of yyloc_param passed (probably by bison) to each yylex call being actually a pointer to the same structure, untouched from each call to the next. This is not in accordance with the documentation, though it does work. Relatedly in a reentrant lexer with locations, we get this function: void xlu__cfg_yyset_lloc (YYLTYPE * yylloc_param , yyscan_t yyscanner) { struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; yylloc = yylloc_param; } This squirrels away the user's provided pointer! This is not even slightly documented in the manual; the manual doesn't mention the semantics of yyset_lloc at all. The natural interpretation of the prototype is that it copies *yylloc_param (ie, the contents), not the pointer. Likewise yyget_lloc returns the pointer. There are two reasons to change the documentation for the calling convention for reentrant yylex with locations, rather than the code: 1. This convention, with a persistent location in the parser, avoids unnecessary copying of the location on each lexer symbol. 2. Changing it would break old code, unless a new lexer option were introduced, which would add complexity. In this view it would seem that some means needs to be provided for the user to initialise the location explicitly on entry to yyparse, either because they want it to have a different type, or because they want the default type but to preserve the value somehow. In any case the reentrant versions of the yyset/get_lloc functions need to be fixed. It is difficult to imagine anyone using them in their current state. Ian.
/* -*- fundamental -*- */ %{ #include "libxlu_cfg_i.h" #define ctx ((CfgParseContext*)yyextra) #define YY_NO_INPUT #define GOT(x) do{ \ yylloc->first_line= yylineno; \ return (x); \ }while(0) /* Some versions of flex have a bug (Fedora bugzilla 612465) which causes * it to fail to declare these functions, which it defines. So declare * them ourselves. Hopefully we won't have to simultaneously support * a flex version which declares these differently somehow. */ int xlu__cfg_yyget_column(yyscan_t yyscanner); void xlu__cfg_yyset_column(int column_no, yyscan_t yyscanner); %} %option warn %option nodefault %option batch %option 8bit %option yylineno %option noyywrap %option bison-bridge %option bison-locations %option reentrant %option prefix="xlu__cfg_yy" %option nounput %x lexerr %% [a-z][_0-9a-z]* { yylval->string= xlu__cfgl_strdup(ctx,yytext); GOT(IDENT); } [0-9][0-9a-fx]* { yylval->string= xlu__cfgl_strdup(ctx,yytext); GOT(NUMBER); } [ \t] , { GOT(','); } \[ { GOT('['); } \] { GOT(']'); } \= { GOT('='); } \; { GOT(';'); } \n|\#.*\n { yylloc->first_line= yylineno-1; return NEWLINE; } \'([^\'\\\n]|\\.)*\' { yylval->string= xlu__cfgl_dequote(ctx,yytext); GOT(STRING); } \"([^\"\\\n]|\\.)*\" { yylval->string= xlu__cfgl_dequote(ctx,yytext); GOT(STRING); } [+-.():] { ctx->likely_python= 1; BEGIN(lexerr); yymore(); } . { BEGIN(lexerr); yymore(); } <lexerr>[^ \t\n]*|[ \t] { xlu__cfgl_lexicalerror(ctx,"lexical error"); BEGIN(0); } <lexerr>\n { xlu__cfgl_lexicalerror(ctx,"lexical error"); BEGIN(0); GOT(NEWLINE); }