l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
July 21: Defensive computing: Information security for individuals
Next Installfest:
TBD
Latest News:
Jul. 4: July, August and September: Security, Photography and Programming for Kids
Page last updated:
2001 Dec 30 17:08

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] unions
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] unions



Date: Mon, 27 Aug 2001 20:48:12 -0700 (PDT)
From: Jeff Newmiller <jdnewmil@dcn.davis.ca.us>
To: vox@franz.mother.com
Subject: Re: [vox] [OT] unions

On Mon, 27 Aug 2001, Christine Scobee wrote:

> Unions:
> 
> 1) I read these posts, but I don't quite get what the "computer" union is.
> Does anyone want to have a shot at explaining it to a non-technical
> person?

The only difference between a non-technical person and a technical person
is that the technical person will pay attention to an explanation that
introduces new jargon.

Are you still listening? :)

In a programming language, we use names for groups of bits in
memory.  From the perspective of a userland computer program, memory is
just a big long line of bytes, each distinguished from the others by its
"position" in the line.  (This position is called an "address".)

So we have something like:

    |   |
    +---+
    | 4 | 20005
    +---+
    |255| 20004
    +---+
    | 48| 20003
    +---+
    | 48| 20002
    +---+
    | 48| 20001
    +---+
    | 50| 20000
    +---+
    |   |

In most current operating systems, the default size of an integer is four
bytes.  Addresses 20000 through 20003 could together hold one integer,
that the C programming language might let you refer to as x.  (In this
case, a PC would interpret the value as 48 * 256^3 + 48 * 256^2 + 48 * 256
+ 50 = 808464434, ignoring potential issues of negative sign. A byte can
hold at most 255.)  So we could write,

  int x;   /* tell the computer to set aside some memory for an integer */

  x = 808464434;

to force the program to put those bits there.

But what if we wrote a different program, that regarded those four bytes
as characters? In ASCII, 48='0' and 50='2', so we might regard them as the
four characters '2000'.  In that program, we might regard those bytes as
an array of four bytes:

  unsigned char a[ 4 ];

  a[ 0 ] = '2';
  a[ 1 ] = '0';  /* one byte beyond the '2' */
  a[ 2 ] = '0';  /* two bytes beyond the '2' */
  a[ 3 ] = '0';  /* three bytes beyond the '2' */

and achieve the same resulting pattern of bits in memory.  Which approach
is better depends on what you are trying to accomplish.  If you want to
keep printable characters in mind, the latter approach is better, but if
numbers are all you (the programmer) are thinking about, then the first
approach makes more sense.  Writing "x = 808464434" when you want the
computer to remember '2000' is an extremely un-obvious thing to do (and a
nasty way to make sure you are the only person who understands the
program).  Nevertheless, if you are doing mathematical calculations in
which 808464434 plays a special part, using code that puts four characters
in memory is just as un-obvious.

A union lets us overlay these two meanings in the same program:

  union {
     int x;
     unsigned char a[ 4 ];
  } u;

  u.x = 808464434; /* treat u like an integer */
  /* now print it out byte-by-byte as characters */
  printf( "%c%c%c%c\n", u.a[ 0 ], u.a[ 1 ], u.a[ 2 ], u.a[ 3 ] );

Whether you choose to actually view the same bytes as different data
types, or simply keep track of whether they represent an integer or a
group of four characters at any given time depends on the purpose you had
in mind for doing this at all.  I have shown how numbers can be
"re-interpreted" as desired, but various details of how this works depend
on which compilers or processor is being used, so changing your mind in
the middle is not "portable". (That does not make it unuseful.)

Scripting languages like Perl, which allow you to ignore whether a
variable contains a string or a number in most cases, use unions
internally to represent these "flexible" data types.  They manage to do
this "portably" by keeping track of which data type was last used to store
data in that memory, and automatically converting between types in a
"sensible" way as needed.  For example, a Perl variable might hold the
four characters '2000', but when you add one to it, it will automatically
recognize that this is a string of human-readable digits, and convert them
internally to an integer using the four bytes 208, 7, 0, and 0, with which
the processor is designed to efficiently do integer math.  (Similar logic
handles floating point numbers, and later the need to convert back to
strings of characteers.) It takes some sophisticated programming under the
covers, using unions, to let people think in fairly high-level terms about
solving their problems without getting bogged down keeping track of how
the bits are organized yet still keeping the computer working reasonably
efficiently.


LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
EDGE Tech Corp.
For donating some give-aways for our meetings.