l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
January 6: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2002 Mar 07 23:38

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] reading a .gz .Z after offset
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] reading a .gz .Z after offset



On Thu, 7 Mar 2002, Mark K. Kim wrote:

> On Thu, 7 Mar 2002, Jeff Newmiller wrote:
> 
> > I don't think you can do seeks in a compressed file... you have to read it
> > sequentially.
> >
> > If you have a plan for dividing up the uncompressed data, perhaps you
> > should do that first and store the split data as separate files
> > (recompressed or not) for purposes of computation.
> 
> The zlib library offers a seek function in its utility function API,
> "gzseek(gzFile, z_off_t, int)".  Since the zlib compression uses the
> deflation algorithm that compresses data in blocks of a known size, it can
> find the block you're seeking, inflate just that block, and return the
> data (I'm not sure if that's how gzseek works, but I'm just sain' it can
> be done.)  I'm sure in all PERL's ingenuity, it can be done in PERL, too.
> 
> Go Eric!  Keep looking! :)

Thanks for the sanity check, Mark...

I took a look at the source for gzseek
(http://www.cs.washington.edu/homes/suciu/XMLTK/xmill/www/XMILL/html/gzio_8c-source.html
line 675), and it just sequentially reads blocks out of the decompression
algorithm beginning from the current position (if the desired location is
after the current location) or from the beginning of the file (if the
desired location is before the current location). (There is an fseek in
there, but that only gets used if the file is not a compressed file.)

I looked at Compress::Zlib, and it seems to omit the interface to gzseek
(I don't know why).  However, you could re-implement it fairly easily
using gzread (where your destination is d=n*m+r, read n blocks of m bytes,
and then read r bytes, and then begin reading data).

Thus it does require a sequential read, but that shouldn't stop you from
extracting any given segment of the file you want.  As long as your actual
data processing is more computationally expensive than the decompression
process, the fact that multiple processors are sequentially decompressiong
and skipping over earlier portions of the file shouldn't be a big deal.

The basic assumption remains that you know where you want to seek
to... that is, you need fixed length records or a program that finds the
appropriate seek offsets and data lengths to give to each parallel
process.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
---------------------------------------------------------------------------

_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.