Home > Interactive knowledge & Tips n Tricks & other reference stuff > How to remove or extract all hyperlinks from a web page using regular expression

How to remove or extract all hyperlinks from a web page using regular expression

October 6th, 2010 659 views Leave a comment Go to comments
This trick will help you to remove or extract all hyperlinks along with their text

Example Text:

< a href="http://article-stack.com">article-stack< /a>
< a href="http://article-stack.com">article-stack< /a>
< a href="http://article-stack.com">article-stack< /a>< a href="http://article-stack.com">article-stack< /a>
In above sample text, last 2 hyperlinks are in same line

Regular expression

	< a [a-zA-Z0-9\=\"\:\.\,\/\- ]*>.*<\/a>
or
        < a.*>.*<\/a>.

Output

[0] => Array
        (
            [0] => < a href="http://article-stack.com">article-stack
            [1] => < a href="http://article-stack.com">article-stack
            [2] => < a href="http://article-stack.com">article-stack< a href="http://article-stack.com">article-stack
        )

Improved Regular Expression

 < a [a-zA-Z0-9\=\"\:\.\,\/\- ]*>(.[^(<\/a>)])*.<\/a> 

Output

[0] => Array
        (
            [0] => < a href="http://article-stack.com">article-stack< /a>
            [1] => < a href="http://article-stack.com">article-stack< /a>
            [2] => < a href="http://article-stack.com">article-stack< /a>
            [3] => < a href="http://article-stack.com">article-stack< /a>
        )

    [1] => Array
        (
            [0] => ac
            [1] => ac
            [2] => ac
            [3] => ac
        )

You can do above task using some programming language like java, awk, PHP etc, or in any text editor.

Amit Gupta

Hey! this is Amit Gupta (amty). By profession, I am a Software Eng. And teaching is my passion. Sometimes I am a teacher, as you can see many technical tutorials on my site, sometimes I am a poet, And sometime just a friend of friends...

  1. No comments yet.