SakhaliaNetHome PageHistory of the RailwayVorKutaAcceptance of cookiesAcceptance of cookies

PHP Tutorial :: Regex (II)

PHP Example #122

Ambition

The quantifiers in the regex engine of the PHP interpreter are ambitious, this is, they match with so many characters as they can. For example, the pattern <b>.*</b> means something like "the string <b>, then zero or more characters, then the string </b>". The expression "zero or more characters" matches so many characters as it can, so when the expression is comparing, the ambition of the quantifier takes all the characters from the first <b> that it finds up to the last </b> in the string.

To turn a quantifier from ambitious to non-ambitious, we have to place an interrogation sign after it. The pattern <b>.*?</b> still has the same meaning than the previous example, but now the expression "zero or more characters" means that it will match so few characters as possible. The code example shows the difference between an ambitious match and a non-ambitious one, by using the function preg_match_all(). In the first pattern, the quantifier will stop in each occurrence of </b>, while in the second one the quantifier will cross at once the entire string up to the last occurrence of </b>.

<?php
$meats = "<b>Chicken</b>, <b>Beef</b>, <b>Duck</b>";
// With a non-ambitious quantifier, each type of meat matches separately
preg_match_all('@<b>.*?</b>@', $meats, $matches);
foreach ($matches[0] as $meat) {
print "Meat A: $meat<br/>";
}
// With an ambitious quantifier, all the string matches at once
preg_match_all('@<b>.*</b>@', $meats, $matches);
foreach ($matches[0] as $meat) {
print "Meat B: $meat<br/>";
}
?>
Meat A: Chicken
Meat A: Beef
Meat A: Duck
Meat B: Chicken, Beef, Duck

PHP Example #123

The PCRE functions of PHP

The functions on the PCRE extension allow to match a string with a pattern and alter a string based on how it matches with a pattern. When a pattern is passed to a PCRE function, it has to de enclosed within delimiters. Traditionally, delimiters are slashes, but they can be as well any character that is not a letter, a digit or backslash. If the character chosen as delimiter appears in the pattern, it has to be escaped with a backslash, so we should use a delimiter without slash only when there is a slash included in the pattern.

After the closing delimiter, we can add one or more pattern modifiers to alter the way the pattern is interpreted. An useful modifier is i, that makes the pattern to match both lowercase and upercase letters. For example, the patterns (with delimiters) /[a-zA-Z]+/ and /[a-z]+/i are equivalent.

Another useful modifier is s, that makes the metacharacter . to match with break lines. For example, the pattern (with delimiters) @<b>.*?</b>@ matches with a set of tags <b></b> and the text within them, but only if the text is all in the same line. To match with text that can have more lines, we add the modifier s like this: @<b>.*?</b>@s

PHP Example #124

Matching with preg_match()

The function preg_match() takes as arguments a string and a pattern to check if they match, returning 1 if they match and 0 if they don't.

<?php
$_POST['zip'] = '10840';
// Checks the value of $_POST['zip'] against the pattern ^\d{5}(-\d{4}?$
if (preg_match('/^\d{5}(-d{4})?$/', $_POST['zip'])) {
print $_POST['zip'] . ' is a valid US ZIP Code.';
}
print '<br/><br/>';
// Checks the value of $html against the pattern <b>[^<]+</b>
// The delimiter is @ for / appears in the pattern
$html = '<b>This is a bold text indeed!</b>';
$is_bold = preg_match('@<b>[^<]+</b>@', $html);
if ($is_bold) { print 'The text is bold.'; }
?>
10840 is a valid US ZIP Code.

The text is bold.

PHP Example #125

Matching with preg_match()

A set of parentheses in a pattern captures what matches with the part of the pattern enclosed by the parentheses. To access the captured strings, we have to pass an array as a third argument for preg_match(), where those strings will be stored. The first element on the array will contain the string that matches with the entire pattern, and the subsequent elements on the array will contain the strings that match with the parts of the pattern enclosed by each set of parentheses.

<?php
$_POST['zip'] = '99577-0727';
// Checks the value of $_POST['zip'] against the pattern ^(\d{5})(-\d{4})?$
if (preg_match('/^(\d{5})(-\d{4})?$/', $_POST['zip'], $matches)) {
// $matches[0] contains the entire ZIP Code
print "$matches[0] is a valid US ZIP Code.\n";
// $matches[1] contains the five-digits part within the first
// set of parentheses
print "$matches[1] is the five-digit part of the ZIP Code.\n";
// If they were present in the string, the hyphen and the four-digit code
// would be in $matches[2]
if (isset($matches[2])) {
print "$matches[2] is the four-digit part of the ZIP Code.\n";
} else {
print "There is no four-digit part of the ZIP Code.";
}
}
print '<br/><br/>';
// Checks the value of $html against the pattern <b>([^<]+)</b>
// The delimiter is @ for / appears in the pattern
$html = '<b>This is a bold text indeed!</b>';
$is_bold = preg_match('@<b>([^<]+)</b>@', $html, $matches);
if ($is_bold) {
// $matches[1] contains the text within the bold tags
print "The bold text is: $matches[1]";
}
?>
99577-0727 is a valid US ZIP Code. 99577 is the five-digit part of the ZIP Code. -0727 is the four-digit part of the ZIP Code.

The bold text is: This is a bold text indeed!

PHP Example #126

Matching with preg_match()

Each part of the string that matches with the parts of the pattern in each set of parentheses goes in its own element in the array. This example uses preg_match() with nested parentheses to ilustrate how the captured strings are incorporated to the array. Each set of nested parentheses is stored just after the immediate preceding set of parentheses.

<?php
$_POST['zip'] = '19096-2321';
if (preg_match('/^(\d{5})(-(\d{4}))?$/', $_POST['zip'], $matches)) {
print "The beginning of the ZIP Code is: $matches[1].\n";
// $matches[2] contains what is inside the second set of parentheses:
// the hyphen and the four last digits
print "The second part of the ZIP Code is: $matches[2].\n";
// $matches[3] contains only the four last digits
if (isset($matches[2])) {
print "The ZIP+4 part is: $matches[3].";
}
}
?>
The beginning of the ZIP Code is: 19096. The second part of the ZIP Code is: -2321. The ZIP+4 part is: 2321.

PHP Example #127

Matching with preg_match_all()

While preg_match() simply matches a pattern with a string a single time, preg_match_all() matches a pattern with a string so many times as the pattern allows it and returns the number of times that it matches. This example illustrates the differences between the two functions.

<?php
$html = "<ul>";
$html .= "<li>Beef Chow-Fun</li>";
$html .= "<li>Sauteed Pea Shoots</li>";
$html .= "<li>Soy Sauce Noodles</li>";
$html .= "</ul>";
preg_match('@<li>(.*?)</li>@', $html, $matches);
$match_count = preg_match_all('@<li>(.*?)</li>@',$html, $matches_all);
print "preg_match_all() matched $match_count times.";
print '<br/><br/>';
print "preg_match() array: ";
print '<pre>';
var_dump($matches);
print '</pre>';
print "preg_match_all() array: ";
print '<pre>';
var_dump($matches_all);
print '</pre>';
?>
preg_match_all() matched 3 times.

preg_match() array:
array(2) {
  [0]=>
  string(22) "
  • Beef Chow-Fun
  • " [1]=> string(13) "Beef Chow-Fun" }
    preg_match_all() array:
    array(2) {
      [0]=>
      array(3) {
        [0]=>
        string(22) "
  • Beef Chow-Fun
  • " [1]=> string(27) "
  • Sauteed Pea Shoots
  • " [2]=> string(26) "
  • Soy Sauce Noodles
  • " } [1]=> array(3) { [0]=> string(13) "Beef Chow-Fun" [1]=> string(18) "Sauteed Pea Shoots" [2]=> string(17) "Soy Sauce Noodles" } }