l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
December 2: Social gathering
Next Installfest:
TBD
Latest News:
Nov. 18: Club officer elections
Page last updated:
2008 Dec 01 23:27

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] ARE (Tcl / Postgresql) REGEX question
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] ARE (Tcl / Postgresql) REGEX question



Dylan Beaudette wrote:
> Hi,
> 
> I have a rather complex (for me) regular expression that I am trying to figure 
> out.
> 
> Here is an example that works just fine:
> 
> -- I am trying to extract the two colors:
> -- 10YR 6/4 and 7.5YR 4/4 from the following block of text
> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky; 
> hard, friable, sticky and plastic; few very fine and many fine and medium 
> roots; many very fine and fine interstital and tubular pores; few thin clay 
> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
> 
>       regexp_matches      
> --------------------------
>  {"10YR 6/4","7.5YR 4/4"}
> 
> 
> 
> However, this pattern does not work when there is only one color:
> 
> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
> loam; weak coarse subangular blocky; hard, friable, sticky and plastic; few 
> very fine and many fine and medium roots; many very fine and fine interstital 
> and tubular pores; few thin clay films lining pores; pH 5.4; clear smooth 
> boundary.' , E'([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?
> [0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
> 
> 
> I have tried making the second capturing clause optional by appending the '?' 
> operator. This causes the single color example to be parsed correctly, but 
> now the double color example does not work:
> 
> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay 
> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky; 
> hard, friable, sticky and plastic; few very fine and many fine and medium 
> roots; many very fine and fine interstital and tubular pores; few thin clay 
> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])?') ;
> 
>   regexp_matches   
> -------------------
>  {"10YR 6/4",NULL}
> 
> 
> Any ideas on how to improve this regex?
> 
> Thanks!
> 
> Dylan
> 
> 

Not sure if it helps but I ran into a similar problem running some regex
in python and the only solution was to find another function.
In my case findall on the regex object, do you have another function
that specifies to find all matches and not just the first one, then you
would only run the 1st 1/2 of your regex and iterate over your text
until you find all matches.

Alex
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.