Tutorials to .com

Tutorials to .com » Php » Others » Regular expression syntax opened the mystery

Regular expression syntax opened the mystery

Print View , by: iSee ,Total views: 13 ,Word Count: 1554 ,Date: Mon, 20 Apr 2009 Time: 3:01 AM

Regular expressions (REs) are often mistakenly believe that only a small number of people to understand the mysterious language. They do on the surface appear chaotic, if you do not know its syntax, the code then it is only in your eyes just a pile of garbage text. In fact, regular expressions is very simple and can be understood. After reading this article, you will be proficient in common regular expression syntax.

Support for multiple platforms

The first regular expression is by the mathematician Stephen Kleene in 1956 that he is in the incremental natural language based on the results of research put forward. With a complete regular expression syntax used in the form of matching characters, the melt was applied to the field of information technology. Since then, several regular expression after the development period, and now has been the standard ISO (International Standards Organization) approved and that the Open Group organizations.

Regular expression is not a special language, but it can be used in a document or find and alternative characters in a standard text. It has two standards: a basic regular expression (BRE), extended regular expression (ERE). ERE, including BRE other functions and other concepts.

Many procedures are used in regular expressions, including xsh, egrep, sed, vi and procedures under the Unix platform. They can be adopted by many languages such as HTML and XML, these are usually adopted only a subset of the standard.

Than you might have imagined even ordinary
With regular expressions transplanted into cross-platform programming language, this function has become more complete, the use of a wide range gradually. On the network to use its search engine, e-mail programs use it, even if you are not a UNIX programmer, you can also use the rules of language to simplify the procedures and shorten your your development time.

Regular expressions 101
A lot of regular expression syntax looks very similar, it is because you do not you have studied them. RE wildcard is a structure type, that is, to repeat the operation. Let us first take a look at the ERE of the most common standard of basic grammar types. In order to provide examples of specific purposes, I will use several different procedures.

Characters match



Regular expressions to determine the key lies in matching you to search for things, if not the concept, Res will be useless.

Each expression includes the need to find the instructions, such as shown in Table A.

Table A: Character-matching regular expressions

Operation
Explained
Examples
Results

.
Match any one character
grep. ord sample.txt
Will match "ford", "lord", "2ord", etc. In the file sample.txt.

[]
Match any one character listed between the brackets
grep [cng] ord sample.txt
Will match only "cord", "nord", and "gord"

[^]
Match any one character not listed between the brackets
grep [^ cn] ord sample.txt
Will match "lord", "2ord", etc. But not "cord" or "nord"



grep [a-zA-Z] ord sample.txt
Will match "aord", "bord", "Aord", "Bord", etc.



grep [^ 0-9] ord sample.txt
Will match "Aord", "aord", etc. But not "2ord", etc.



Repeat operator
Repeat operator, orare described to find a specific number of characters. They are often used to match characters to find multi-line syntax of the characters, can refer to table B.

Table B: Regular expression repetition operators

Operation
Explained
Examples
Results

?
Match any character one time, if it exists
egrep "? erd" sample.txt
Will match "berd", "herd", etc. And "erd"

*
Match declared element multiple times, if it exists
egrep "n. * rd" sample.txt
Will match "nerd", "nrd", "neard", etc.

+
Match declared element one or more times
egrep "[n] + erd" sample.txt
Will match "nerd", "nnerd", etc., But not "erd"

(n)
Match declared element exactly n times
egrep "[az] (2) erd" sample.txt
Will match "cherd", "blerd", etc. But not "nerd", "erd", "buzzerd", etc.

(n,)
Match declared element at least n times
egrep ". (2,) erd" sample.txt
Will match "cherd" and "buzzerd", but not "nerd"

(n, N)
Match declared element at least n times, but not more than N times
egrep "n [e] (1,2) rd" sample.txt
Will match "nerd" and "neerd"


Anchor
Anchor refers to it to match the format, as shown in Figure C. You find it convenient to use universal characters combined. For example, I used the vi command line editor: s to represent the substitute, the basic syntax of this command is:

s / pattern_to_match / pattern_to_substitute /


Table C: Regular expression anchors

Operation
Explained
Examples
Results

^
Match at the beginning of a line
s / ^ / blah /
Inserts "blah" at the beginning of the line

$
Match at the end of a line
s / $ / blah /
Inserts "blah" at the end of the line

\ <
Match at the beginning of a word
s / \ </ blah /
Inserts "blah" at the beginning of the word



egrep "\ <blah" sample.txt
Matches "blahfield", etc.

\>
Match at the end of a word
s / \> / blah /
Inserts "blah" at the end of the word



egrep "\> blah" sample.txt
Matches "soupblah", etc.

\ b
Match at the beginning or end of a word
egrep "\ bblah" sample.txt
Matches "blahcake" and "countblah"

\ B
Match in the middle of a word
egrep "\ Bblah" sample.txt
Matches "sublahper", etc.




Interval

Res another is to be interval (or insert) symbol. In fact, this symbol is equivalent to an OR statement on behalf of | symbols. Statement to return the following documents sample.txt the "nerd" and "merd" handle:

egrep "(n | m) erd" sample.txt


Interval very powerful, especially when you find time to document the different spelling, but you can be in the following example the same results:

egrep "[nm] erd" sample.txt

When you use the interval function of the advanced features and Res connected, it's really useful to reflect more.

Some reservations about the characters
Res the final one of the most important characteristic is to retain the character (also called specific characters). For example, if you want to find "ne * rd" and "ni * rd" characters to match the format statement "n [ei] * rd" and "neeeeerd" and "nieieierd" line, but not you want to find characters. Because the '*' (asterisk) is a reserved characters, you must use a backslash to replace it, that is: "n [ei] \ * rd". Other reserved characters include:

^ (Carat)
. (Period)
[(Left bracket)
$ (Dollar sign)
((Left parenthesis)
) (Right parenthesis)
| (Pipe)
* (Asterisk)
+ (Plus symbol)
? (Question mark)
((Left curly bracket, or left brace)
\ Backslash
Once you put these characters, including characters in your search, there is no doubt Res become very difficult to pronounce. For example, in the following php code eregi search engine it is hard to read.

eregi ("^[_ a-z0-9-] + (\. [_a-z0-9-]+)*@[ a-z0-9-] + (\. [a-z0-9-] +) *$",$ sendto)

You can see it is very difficult to grasp the intent of the procedure. But if you put aside their reservations about the characters, you often misunderstand the meaning of the code.

Summary
In this paper, we opened a regular expression the mystery, and a list of common grammar ERE standards. If you want to read the rules of the Open Group organizations complete description, you can see: Regular Expressions, welcome you to discuss areas in which questions or express your point of view.


php other Articles


Can't Find What You're Looking For?


Rating: Not yet rated

Comments

No comments posted.