l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2001 Dec 30 17:04

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] another C question (lex/yacc for parsing input)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] another C question (lex/yacc for parsing input)

On Mon, Apr 09, 2001 at 12:11:27PM -0700, Peter Jay Salzman wrote:
> i use memmove to do things like:
>    1. convert double spaces to single spaces
>    2. get rid of superfluous tokens like "pretty please" and "Computer".

These things are traditionally handled by the lexical scanner...

> and then i ship off the resulting string
>     auto-destruct 520010
> to my parser which tokenizes the nouns and executes the relevant code for
> auto destruction.
> i know there are other ways of doing this (like searching the string for
> certain tokens and building up a list from the start).   however, i tend to
> like this method better.  because i can more easily see what's going on when my
> parser handles "auto-destuct 520010" rather than (12 52 00 10).   there are a
> lot of tokens, and i'd rather work with "auto-destruct" than "12".

LOL!  So would I - but that's why I wouldn't use 12, I'd use
AUTO_DESTRUCT, which I would've conveniently #defined to 12 (or
other).  Actually, I'd use a number >= 256, so it doesn't conflict
with single-character tokens, for which I'd usually just want to use
the character-code as the token-code.

> it occurs to me that this is the sort of thing that's perfect for lex/yacc, but
> i've been dragging my feet because it looks fairly complicated.   i think maybe
> i'll take a look at it though.
> pete

It's really not too complicated - if I were anywhere near you, I could
come over and teach you in probably an hour.  The info pages are very
good, but they could be better, too.  I find, amazingly, that a lot of
what I'm learning in the Dragon book (I think I mistakenly said
"Wizard" before - must be playing too much nethack ;) ) is directly
applicable to using bison.

"Computer, pretty please initiate auto-destuct sequence 520010."

Here's a pseudo-code example of a top-down parser, which would be
recursive-descent like what Mark was talking about, were there any
recursion ;).  You could use flex or write your own lexical scanner
(flex is easy to learn, though - read the Texinfo manual - if you
still hate info just read the HTML version from www.gnu.org).

Some caveats about the following code - it doesn't use a look-ahead
token, so it is very inflexible about what grammers it can implement.
It is not a good general-purpose solution, but should work for simple

/* All of these except for NUM represent actual textual tokens which
   are similar to their name.  NUM is used for numeric sequences, and
   is the only token for which VALUE is set */
#define COMPUTER      257
#define INITIATE      258
#define AUTO_DESTRUCT 259
#define SEQUENCE      260
#define PRETTY        261
#define PLEASE        262
#define NUM           263

int token;
long value;

int main (void)
  token = get_next_token();   /* This calls the lexical scanner,
                               undefined here.  Should skip whitespace
                               and punctuation chracters.   If it
                               matches a number, it should put the
                               value into the global VALUE. */

  /* If we've gotten this far, then we've successfully parsed an
     auto-destruct sequence.  The destruction sequence code is left in
     the global variable VALUE. */
  if (is_valid_sequence_code (value))
    destroy_ship();  /* I don't define this here.  It's purpose should
                        be obvious ;)  */
  return 0;   /* Obviously, we'll never get to this either way ;) */

void auto_destruct_command (void)
  match (COMPUTER);
  politeness ();
  match (INITIATE);
  match (AUTO_DESTRUCT);
  sequence_construct ();

void politeness (void)
  match_opt (PRETTY);
  match_opt (PLEASE);

void sequence_construct (void)
  match (SEQUENCE);
  match (NUM);

/* Generates error if it can't match a certain token */
void match (int mtoken)
  if (mtoken != token)
    handle_error_and_quit ();
    token = get_next_token ();

/* Reads next token if it matches mtoken, otherwise leaves the token
   alone for next match attempt */
void match_opt (int mtoken)
  if (mtoken == token)
    token = get_next_token ();

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.