View Full Version : Regular Expression help - only preserve contents of Anchor tags

29-04-2010, 01:59 PM
Hi all

I am currently trying to get info out of a former frontpage site which is a mess to put it bluntly

Basically all i want out of the pages is the anchor links like below

<a href="/files/sopwith/camel.htm">Biggles and Algie</a>

While finding them should be easy the sheer amount of extraneous tags is making the going painful,

But i can't get my regular expressions working

29-04-2010, 02:09 PM
This works to find the links, but what i want it to do is remove everything else, and i am blowed if i can figure it out, I also xan't get the below code to work in notepad++, but it works in an elderly version of dreamweaver


29-04-2010, 02:40 PM
I take that back the above code is only finding some links and not all as it isn't finding any that have line breaks in them
<a href="/files/sopwith/camel.htm">Biggles and Algie
dammit my brain is now officially hurting

29-04-2010, 03:17 PM


probably not the most elegant, and i still can't work out how to get rid of all the other text on the page, or pipe the result into a new file on windows