l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2005 Apr 24 15:25

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] simple parsing of multiple-line records...?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] simple parsing of multiple-line records...?

AWK was made for what you're trying to do.  I don't know AWK all that
well, but here's a very simple example that will produce a line-record
from the data you gave.  This works with GAWK, and may or may not work
with pure AWK:

  /easting/ { printf("rec=%d easting=%s ", rec, $2); }
  /northing/ { printf("northing=%s ", $2); }
  /elevation/ { printf("elevation=%s ", $2); }
  /distance along surface/ { printf("dist=%s\n", $4); rec++; }

After creating the above as a file (call it "parse.awk"), you can parse
your data (assuming the data is in a file called "data.txt") like this:

  $gawk -f parse.awk data.txt

which will produce the following:

  rec=0 easting=661674.9375 northing=4035004.0000 elevation=968.8617 dist=15.9540
  rec=1 easting=661683.7500 northing=4034946.7500 elevation=961.4768 dist=58.4077

where "rec" is the record number, and following is the rest of the data.
This code makes several assumptions like data ordering and so forth, SO
YOU SHOULDN'T USE THE ABOVE CODE in production but use it only as a
reference to write a more robust code.

Here's a code that's a little more robust but you should still customize
it for your data:

  # Check to see whether we want to print the record, then print if we do.
  function rec_chk() {
     # If I have all the data for a record, print the record
     if(easting && northing && elevation && dist) {
        # Print the record

        # Reset all data and increment the record counter
        easting = northing = elevation = dist = "";

  # Print the record.
  function print_rec() {
     printf("%d %s %s %s %s\n", rec, easting, northing, elevation, dist);

  # Data updates
  /easting/ { easting=$2; rec_chk(); }
  /northing/ { northing=$2; rec_chk(); }
  /elevation/ { elevation=$2; rec_chk(); }
  /distance along surface/ { dist=$4; rec_chk(); }

There is a lot of built-in functions if you want to do some math
operations or variable type conversions or whatever you can think of.
See `info gawk` under "functions" or "library functions" for more

I hope that gets you started.


PS: I usually just use PERL for something like this than AWK -- PERL has
more support, is more flexible, and I know it better than AWK!  GAWK sure
does make simple parsing like this easier if you know how, though!

PPS: A friend of mine who went to Columbia tells me he took a class from
"A" (Prof. Alfred Aho) of "AWK"... =P  How funny!

On Thu, 21 Apr 2005, Dylan Beaudette wrote:

> Hi everyone,
> this might be a really simple question, but :
> i have a text file with multiple-line records in a format like this:
> ----------------------------
> easting:      661674.9375
>  northing:     4035004.0000
>  elevation:         968.8617
> distance along surface:               15.9540
>  easting:      661683.7500
>  northing:     4034946.7500
>  elevation:         961.4768
> distance along surface:               58.4077
> -----------------------------
> I am familiar with single-line records, and work with them in AWK quite
> frequently. However, i have yet to come upon an elegant solution to
> parsing something like the above file, such that i can assign each
> field in a multi-line record to a unique variable. a solution in AWK
> would be ideal, but i am open to tackle this problem with something
> like python.
> any ideas?
> thanks!
> Dylan
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech

Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.php?uid=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
PGP key available on the homepage
vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.