VC正则表达式的使用2010年9月11日星期六邵盛松正则表达式是一种对字符进行模糊匹配的一个公式。
在数据有效性验证,查找,替换文本中都可以使用正则表达式。
本篇文章主要描述的是使用ATL中两个模板类CAtlRegExp和CAtlREMatchContext。
在使用CAtlRegExp类之前需要添加#include <atlrx.h> 这个头文件。
RegExp是Regular Expression的缩写以匹配邮件地址字符串为例说明两个类的使用该示例更改自/en-us/library/k3zs4axe(VS.80).aspx CString strRegex=L"({[0-9_]+@[a-zA-Z0-9]+[.][a-zA-Z0-9]+[.]?[a-zA-Z0-9]+})";CString strInput;strInput=L"admin@";CAtlRegExp<CAtlRECharTraitsW> reRule;wchar_t *wt = (wchar_t *)(LPCTSTR)strRegex;REParseError status = reRule.Parse((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);if (REPARSE_ERROR_OK != status){return 0;}CAtlREMatchContext<CAtlRECharTraitsW> mcRule;wt = (wchar_t *)(LPCTSTR)strInput;if (!reRule.Match((const ATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt,&mcRule)){AfxMessageBox(L"您输入的邮件地址不合法!");}else{for (UINT nGroupIndex = 0; nGroupIndex < mcRule.m_uNumGroups; ++nGroupIndex){const CAtlREMatchContext<>::RECHAR* szStart = 0;const CAtlREMatchContext<>::RECHAR* szEnd = 0;mcRule.GetMatch(nGroupIndex, &szStart, &szEnd);ptrdiff_t nLength = szEnd - szStart;CString strEmailAddress(szStart, static_cast<int>(nLength));if(pare(strInput)!=0){CString strPrompt;strPrompt.Format(L"您输入的邮件地址不合法,您要输入%s 吗!",strEmailAddress);AfxMessageBox(strPrompt);}else{AfxMessageBox(L"输入的邮件地址正确!");}}}这两个模板类由另一个描述字符集特性的类参数化,可以是ASCII,WCHAR 或多字节。
可以将此忽略掉,因为根据设置的字符集,模板类自动生成具体的类。
在atlrx.h文件中供选择的有三个类CAtlRECharTraitsA 用于ASCIICAtlRECharTraitsW 用于UNICODECAtlRECharTraitsMB 用于多字节在VC2005默认的字符集是使用Unicode字符集根据正则的源码#ifndef _UNICODEtypedef CAtlRECharTraitsA CAtlRECharTraits;#else // _UNICODEtypedef CAtlRECharTraitsW CAtlRECharTraits;#endif // !_UNICODE所以构造CAtlRegExp类可以是CAtlRegExp<> reRule;REParseError status = reRule.Parse((constATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);也可以是CAtlRegExp<CAtlRECharTraitsW> reRule;REParseError status = reRule.Parse((constATL::CAtlRegExp<CAtlRECharTraitsW>::RECHAR *)wt);通过调用CAtlRegExp的Parse()方法,使用正则表达式字符串作为参数,就可以构造出一个我们所需要的类。
调用CATLRegExp的Match()函数Match()函数参数说明第一个参数是要对比的字符串,第二个参数是存储match的结果CAtlREMatchContext的成员变量m_uNumGroups表示匹配的GroupCAtlREMatchContext的GetMatch()函数返回匹配上的字符串的pStart和pEnd指针以下从MSDN摘录的正则表达语法原文是/en-us/library/k3zs4axe(VS.80).aspxRegular Expression SyntaxThis table lists the metacharacters understood by CAtlRegExp.Metacharacter Meaning.Matches any single character.[ ]Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").^If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches allcharacters except "a", "b", and "c").If ^ is at the beginning of the regular expression, it matches the beginning of the input (forexample,^[abc] will only match input that begins with "a", "b", or "c").-In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0"through "9").?Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").+Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "456", and so on).*Indicates that the preceding expression matches zero or more times.??, +?, *?Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions that match as much as possible (for example, given the input "<abc><def>", <.*?> matches "<abc>"while<.*> matches "<abc><def>").( )Grouping operator. Example: (/d+,)*/d+ matches a list of numbers separated by commas (for example, "1" or "1,23,456").{ }Indicates a match group. The actual text in the input that matches the expression inside the braces can be retrieved through the CAtlREMatchContext object./Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]/+ matches a digit followed by a plus character). Also used for abbreviations (suchas/a for any alphanumeric character; see the following table).If / is followed by a number n, it matches the n th match group (starting from 0).Example:<{.*?}>.*?<//0> matches "<head>Contents</head>".Note that, in C++ string literals, two backslashes must be used: "//+", "//a", "<{.*?}>.*?<///0>". $At the end of a regular expression, this character matches the end of the input (forexample,[0-9]$matches a digit at the end of the input).|Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").!Negation operator: the expression following ! does not match the input (for example, a!b matches "a"not followed by "b").Abbreviations字符元意义. 匹配单个字符[ ] 指定一个字符类,匹配方括号内的任意字符。