facebook twitter youtube
by Amit Gupta - no comments

If you find yourself weak in regular expression then complete this article patiently. But never forget to read Regular Expression, an introduction with full of examples. Otherwise this article will scare you surely.


Regular expression

Example content;

< amty > 1st block < / amty>
article-stack .com
< amty src=""> Tag with attributes < / amty> 
I am running article-stack.com

Elements

^ (Shift + 6) beginning of line
$ (Shift + 4) end of line

Example RE

‘/amty/’ It’ll return all lines contain amty word. So it’ll return 1st & 3rd lines.
‘/^article-stack/’ It’ll return 2nd line starting with article-stack. Note that a line starting with white space or tab will not come in result
‘/article-stack$/’ It’ll return no line. Because 2nd line is ending with com.

Note that AWK returns complete line wherever pattern is found. While other language returns matched pattern only. So

/amty/ Will return 4 occurrences of “amty”
/^article-stack/ Will return only “article-stack” 1 time, wherever it is coming in starting of a line.
/article-stack$/ Will return noting since it is not appearing in last of any line.

Move ahead

. (decimal point) Any single character

‘/.amty/’ Will return 4 occurrences. (Or 2 lines in awk)

Result

        (
            [0] => < amty
            [1] => / amty
            [2] => < amty
            [3] => / amty
        )

Dot (.) is regular expression element. So if you searching for dot only then you’ll have to use ‘\’ before this. Eg

‘/article-stack\.com/’ Will return “article-stack.com” from 4th line. Because 2nd line contains space between “stack” and “.com”.

Elements to define occurrence

r* zero or more occurrences of regular expression encounter in left
r+ one or more occurrences of regular expression encounter in left
r? zero or one occurrences of regular expression encounter in left

Sounds difficult? See examples

‘/^< amty >.*<\/amty >/’

Explanation:

  • ^ will search all lines starting with < amty >
  • \/amty, “\” is required before any regular expression element, if you are treating them as simple text.
  • *says zero or more occurrence of dot (.). While dot (.) says any single character.
  • Finally, above expression will extract all lines which are starting with < amty >. And contains any number of characters between “< amty >” and “< / amty >“. It is not necessary that line ends with “< / amty >“

Another example

.+@.+\.com It is simpler version to filter email ids.
Please note this
If you are having fine knowledge of regular expression then you will find that many RE, in this article, are not efficient. They are build just for understanding. I have tuned them in further articles. So keep reading
Amit Gupta

Hey! this is Amit Gupta (amty). By profession, I am a Software Eng. And teaching is my passion. Sometimes I am a teacher, as you can see many technical tutorials on my site, sometimes I am a poet, And sometime just a friend of friends...

Leave a Reply

captcha