l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
January 6: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2009 Aug 13 14:14

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] Perl - reading fixed width formats
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] Perl - reading fixed width formats



Hi all,

I need to read in ~25000 files whose lines are in a fixed file format:

   field 1: line 1, chars 1-4
   field 2: line 1, chars 5-6
   field 3: line 1, chars 7-11
   field 4: line 1, chars 12 to EOL
   field 5: line 2, chars 1-30
   field 6: line 3, chars 1-10
   field 7: line 4, chars 1-2
   ...

The naive method is looping over each file, incrementally reading 4 chars, 2
chars, 5 chars, etc.  However, the slowest part of all this (I would think)
would be the constant disk access for each field for each file.

There are ~25000 files.  Altogether they're 1.7GB.  The machine I'm on has
32GB of memory.  It only took 27s to read all 25 kfiles into a Perl array on
a typically loaded machine.

I can imagine some fancy scenario where I slurp in all files into an array,
and then process, but there are many different ways of doing this.

Am I correct that reading everything into memory in one fell swoop will make
a noticeable difference in run time?

And once everything is in an array, what's the most efficient way to take a
string representing one line of a file, and break it down into fields?  Is
there anything faster than substr()?

Thanks!
Pete
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.