l i n u x - u s e r s - g r o u p - o f - d a v i s
L U G O D
 
Next Meeting:
November 4: Social gathering
Next Installfest:
TBD
Latest News:
Oct. 24: LUGOD election season has begun!
Page last updated:
2008 Dec 02 05:57

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] ARE (Tcl / Postgresql) REGEX question
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] ARE (Tcl / Postgresql) REGEX question



On Mon, Dec 1, 2008 at 9:13 PM, Alex Mandel <tech_dev@wildintellect.com> wrote:
> Dylan Beaudette wrote:
>> Hi,
>>
>> I have a rather complex (for me) regular expression that I am trying to figure
>> out.
>>
>> Here is an example that works just fine:
>>
>> -- I am trying to extract the two colors:
>> -- 10YR 6/4 and 7.5YR 4/4 from the following block of text
>> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay
>> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky;
>> hard, friable, sticky and plastic; few very fine and many fine and medium
>> roots; many very fine and fine interstital and tubular pores; few thin clay
>> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
>> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
>>
>>       regexp_matches
>> --------------------------
>>  {"10YR 6/4","7.5YR 4/4"}
>>
>>
>>
>> However, this pattern does not work when there is only one color:
>>
>> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay
>> loam; weak coarse subangular blocky; hard, friable, sticky and plastic; few
>> very fine and many fine and medium roots; many very fine and fine interstital
>> and tubular pores; few thin clay films lining pores; pH 5.4; clear smooth
>> boundary.' , E'([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?
>> [0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])') ;
>>
>>
>> I have tried making the second capturing clause optional by appending the '?'
>> operator. This causes the single color example to be parsed correctly, but
>> now the double color example does not work:
>>
>> SELECT regexp_matches('B11t Light yellowish brown (10YR 6/4) gravelly clay
>> loam, brown to dark brown (7.5YR 4/4) moist; weak coarse subangular blocky;
>> hard, friable, sticky and plastic; few very fine and many fine and medium
>> roots; many very fine and fine interstital and tubular pores; few thin clay
>> films lining pores; pH 5.4; clear smooth boundary.' , E'([0-9]?[\\.]?[0-9][Y|
>> y|R|r]+[ ]+?[0-9]/[0-9]).*?([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])?') ;
>>
>>   regexp_matches
>> -------------------
>>  {"10YR 6/4",NULL}
>>
>>
>> Any ideas on how to improve this regex?
>>
>> Thanks!
>>
>> Dylan
>>
>>
>
> Not sure if it helps but I ran into a similar problem running some regex
> in python and the only solution was to find another function.
> In my case findall on the regex object, do you have another function
> that specifies to find all matches and not just the first one, then you
> would only run the 1st 1/2 of your regex and iterate over your text
> until you find all matches.
>

I agree that what you want to do is use a global search for *just* the color
pattern. I.e.

([0-9]?[\\.]?[0-9][Y|y|R|r]+[ ]+?[0-9]/[0-9])


I can't tell what system you're using, but every one I've seen has a separate
function for iterating over matches, like Alex mentioned.

By the way, this site might be handy:

http://osteele.com/tools/rework/

I used it to test out your regex. (Which could still be cleaned up... but
regexes never actually get pretty. :) )

-- 
Bryan Richter
_______________________________________________
vox-tech mailing list
vox-tech@lists.lugod.org
http://lists.lugod.org/mailman/listinfo/vox-tech



LinkedIn
LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
facebook
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
Appahost Applications
For a significant contribution towards our projector, and a generous donation to allow us to continue meeting at the Davis Library.