ए भाई Think ज़रा हटके
Note: amtyThumb must be installed for new version of amty thumb post/recent

How to extract website URL using Regular Expression

I am using Java syntax for this example. You can use AWK, PHP or other language in same way.


http www

String RE = "http:.*\\.[a-zA-Z0-9]{2,4}";
Regex r = new Regex(RE);
:

Test string 1:

r.search("I am maintaining http:\\article-stack.com. This will help you to learn.");

Output:

http:\article-stack.com

Test string 2:

r.search("< a href='http:\\article-stack.com' alt='nothing'>article-stack< /a>");

Output:

http:\article-stack.com

Consideration:
length of domain type is 2-4 and it contains alphanumeric characters.

Improve previous RE

Valid website name should contains alphanumeric characters and hyphen sign only. And hyphen must not come in starting of website name.

String RE = "http:\\\\[^\-][a-zA-Z0-9\-]+\.[a-z]{2,4}";

Sample text

I am maintaining http:\\article-stack.com. This will help you to learn.
I am maintaining http:\\article-stack.com. This will help you to learn.http:\\-article-stack.com
I am maintaining http:\\article-stack.com. This will help you to learn.

Output:

        (
            [0] => http:\\article-stack.com
            [1] => http:\\article-stack.com
            [2] => http:\\article-stack.com
        )

In addition:

You can modify upper RE for domain since domain name may be in form of “.co.in”.

Amit Gupta

Hey! this is Amit Gupta (amty). By profession, I am a Software Eng. And teaching is my passion. Sometimes I am a teacher, as you can see many technical tutorials on my site, sometimes I am a poet, And sometime just a friend of friends...

998
views


To book below area mail me




  • Hi,

    I just came to your post and reading above thing it is very impressive me and it is very nice blog.Thanks a lot for sharing this.

captcha

You can follow any responses to this entry through the RSS 2.0 feed.