Regular Expressions: Common elements part 1
If you find yourself weak in regular expression then complete this article patiently. But never forget to read Regular Expression, an introduction with full of examples. Otherwise this article will scare you surely.

Example content;
< amty > 1st block < / amty> article-stack .com < amty src=""> Tag with attributes < / amty> I am running article-stack.com
Elements
| ^ (Shift + 6) | beginning of line |
| $ (Shift + 4) | end of line |
Example RE
| ‘/amty/’ | It’ll return all lines contain amty word. So it’ll return 1st & 3rd lines. |
| ‘/^article-stack/’ | It’ll return 2nd line starting with article-stack. Note that a line starting with white space or tab will not come in result |
| ‘/article-stack$/’ | It’ll return no line. Because 2nd line is ending with com. |
Note that AWK returns complete line wherever pattern is found. While other language returns matched pattern only. So
| /amty/ | Will return 4 occurrences of “amty” |
| /^article-stack/ | Will return only “article-stack” 1 time, wherever it is coming in starting of a line. |
| /article-stack$/ | Will return noting since it is not appearing in last of any line. |
Move ahead
. (decimal point) Any single character
| ‘/.amty/’ | Will return 4 occurrences. (Or 2 lines in awk) |
Result
(
[0] => < amty
[1] => / amty
[2] => < amty
[3] => / amty
)
Dot (.) is regular expression element. So if you searching for dot only then you’ll have to use ‘\’ before this. Eg
| ‘/article-stack\.com/’ | Will return “article-stack.com” from 4th line. Because 2nd line contains space between “stack” and “.com”. |
Elements to define occurrence
| r* | zero or more occurrences of regular expression encounter in left |
| r+ | one or more occurrences of regular expression encounter in left |
| r? | zero or one occurrences of regular expression encounter in left |
Sounds difficult? See examples
‘/^< amty >.*<\/amty >/’
Explanation:
- ^ will search all lines starting with < amty >
- \/amty, “\” is required before any regular expression element, if you are treating them as simple text.
- *says zero or more occurrence of dot (.). While dot (.) says any single character.
- Finally, above expression will extract all lines which are starting with < amty >. And contains any number of characters between “< amty >” and “< / amty >“. It is not necessary that line ends with “< / amty >“
Another example
| .+@.+\.com | It is simpler version to filter email ids. |
Please note this
If you are having fine knowledge of regular expression then you will find that many RE, in this article, are not efficient. They are build just for understanding. I have tuned them in further articles. So keep reading 118
views
views


No Comments