l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
July 21: Defensive computing: Information security for individuals
Next Installfest:
TBD
Latest News:
Jul. 4: July, August and September: Security, Photography and Programming for Kids
Page last updated:
2003 Oct 20 20:19

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
[vox-tech] Another Perl CSV question - long lines = segfault
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[vox-tech] Another Perl CSV question - long lines = segfault



Thanks to the folks who helped me 'fix' the CSV file the other day.

I'm now stumbling upon another problem, where Perl is segfaulting on long
lines.  If I skip CSV lines > 3500 characters, it goes on its merry way.
Lines nearing 4000 characters or more cause Perl itself to die (segfault)
when I try to parse using the following code (from "Perl Cookbook"):


sub parse_csv
{
  my $text = shift;
  my @fields = ();

  while ($text =~ m{
    # Either some non-quote/non-comma text:
    ([^"',]+)

     # Or...
     |

    # ...a double-quote field:
    " # field's opening quote; don't save this
     (  # now a field is either:
      (?:   [^"]    # non-quotes
          |      # or
            ""      # adjacent quote pairs
      )* # any number
     )
    "
  }gx)

  {
      if (defined $1) { $field = $1; }
      else { ($field = $2) =~ s/""/"/g; }

      push @fields, $field;
  }

  return @fields;
}



At first, I thought it was dying due to character combos (commas, quotes,
double-quotes, or what-have-you), but then I started examining line lengths
just before running the parse subroutine.


Looking at an strace, when I do this:

  open(F, $somefile);

  while (<F>)
  {
    ...
  }


...it's only reading 4096 bytes at a time.


Is there a way to increase that buffer, so that I can ensure it reads,
say, up to 20,000 bytes?  (The longest line in the file seems to be around
14,000 bytes; at least, that's what "wc -L" tells me)


In the meantime, I'll Google, since the books I'm looking at don't have
good examples of this. ;)


Thx!

-bill!

-- 
bill@newbreedsoftware.com                           Got kids?  Get Tux Paint! 
http://newbreedsoftware.com/bill/       http://newbreedsoftware.com/tuxpaint/

_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Sunset Systems
Who graciously hosts our website & mailing lists!