Thread
:
Anyone good at regular expressions? (computing)
View Single Post
05-02-2007, 07:32 AM
#
1
Wmshyrga
Join Date
Oct 2005
Posts
494
Senior Member
Anyone good at regular expressions? (computing)
I have posted this at a specialist forum but activity is rather slow, and was hoping maybe some of you computer scientists might be able to help me with it.
I have spent a long time today trying to build a regular expression for screen scraping a website. The (example) text it will be scanning is:
" Glouster Museums: Abbey Home Museum - Kirkhall Road, Kirkhall, Glouster, GL4 5BY, England
"
What I am trying to get from that is ONLY the museum name, and the address, with none of the html. So I would like:
Abbey Home Museum - Kirkhall Road, Kirkhall, Glouster, GL4 5BY
My regEx at the moment is -
[\w \\=\"]*\-[\w, ]*LS[\d ]*[\w, ]*land
Which returns -
Abbey Home Museum<span class=text ALIGN="justify"> - Kirkhall Road, Kirkhall, Glouster, GL4 5BY, England
This is close, but I need to omit the html tags, and ideally get rid of the trailing 'England' too. The latter is not so essential however, just getting rid of the HTML will do.
I am using this program
http://www.webscrape.com/
, which you run from the command line. I know it might be a bit of a long shot but does anyone have an idea of what I can try? Thanks.
Quote
Wmshyrga
View Public Profile
Find More Posts by Wmshyrga
All times are GMT +1. The time now is
12:31 AM
.