当前位置:文档之家› 正则表达式 Regular Expression 例子 sample VB版

正则表达式 Regular Expression 例子 sample VB版

VS SDK Regular Expression Language Service Example Deep Dive (VB)István Novák (DiveDeeper), Grepton Ltd.May, 2008IntroductionThis example implements a small language service for demonstration purposes. This is called Regular Expression Language Service since it can tokenize text by RegEx patterns (lower case letters, capital letters, digits) and can use its own syntax coloring scheme for each token. However, the functionality of this sample is quite far away from a full language service it illustrates the basics. The source files belonging to this code have only about three hundred lines of essential code. When reading through this deep dive you are going to get familiar with the following concepts: How language services should be registered with Visual Studio?What kind of lifecycle management tasks a simple language service has?How to create a very simple language service?How to implement a scanner supporting syntax coloring?To understand concepts treated here it is assumed that you are familiar with the idea of VSPackages and you know how to build and register very simple (even non-functional) packages. To get more information about packages, please have a look at the Package Reference Sample (VisualBasic Reference.Package sample). Very basic knowledge about regular expressions is also expected.Regular Expression Language ServiceOpen the Microsoft Visual Studio 2008 SDK Browser and select the Samples tab. In the top middle list you can search for the “VisualBasic Example.RegExLangServ” sample. Please, use the “Open this sample in Visual Studio” link at the top right panel of the browser app to prepare the sample. The application opens in Visual Studio 2008.Running the sampleRebuild the package and start it with the Experimental Hive! Without creating a new solution, add a new text file with the File|New|File... menu function. Use theFile|Save As menu function to store the text file with the RegexFile.rgx name. To avoid attaching the .txt extension to the end of the file name, set the “Save as type” to “All files (*.*)” as illustrated in Figure 1:Figure 1: Save the file with .rgx extensionType a few words, number and punctuation characters into the editor and see how they are colored! You can see an illustration in Figure 2:Figure 2: Our language service has effect on syntax coloringNow, try to save the file again with the File|Save As menu function. This time the Sav e As dialog contains the “RegEx File (*.rgx)” in its “Save as type” field indicating that it recognizes this file type with .rgx extension.The structure of the sampleThe solution contains a VSPackage project named RegExLangServ that uses a few reference assemblies for VS interop starting with name “Microsoft.VisualStudio”. The project’s source files are the following:The essential code of this sample is in the RegExLangServ.vb, RegExScanner.vb and VsPkg.vb files; in the next scenarios I focus on them. In code extracts used in this deep dive I will omit or change comments to support better readability and remove using clauses, namespace declarations or other non-relevant elements. Scenario: Registering the Language Service with an associated file extensionThe language service this sample implements is intended to be used by Visual Studio Shell and by any other third party packages that want to use the functionality of the service. For example, the code window of Visual Studio uses this service for syntax coloring. Just as for any other services a language service also has to be registered with Visual Studio. The registration information is provided by attributes decorating the package class (VsPkg.vb):<ProvideLanguageExtension(GetType(RegularExpressionLanguageService), ".rgx")> _ <ProvideService(GetType(RegularExpressionLanguageService))> _' --- Other attributes omittedPublic NotInheritable Class RegularExpressionLanguageServicePackageInherits Shell.PackageImplements IDisposable' ...End ClassPlease note, there are a few attributes not indicated in the code extract above. If you are not familiar with them, take a look at the Package Reference Sample Deep Dive. Language service registration uses the following two attributes:With these attributes we registered the regular expression language service. However to use the service we have to take care of service instantiation. Scenario: Lifecycle management of a language serviceJust as in case of other local or proffered services, our package must manage the lifecycle of the regular expression language service. For most services created with the Managed Package Framework lifecycle management is about creating the service instance. For language services we must take care of the cleanup process, since at the back language services use unmanaged code and unmanaged resources. Our package class uses the standard pattern for managing the lifecycle of the language service instance:Public NotInheritable Class RegularExpressionLanguageServicePackageInherits Shell.PackageImplements IDisposablePrivate langService As RegularExpressionLanguageServiceProtected Overrides Sub Initialize()MyBase.Initialize()langService = New RegularExpressionLanguageService()langService.SetSite(Me)Dim sc As IServiceContainer = CType(Me, IServiceContainer)sc.AddService(GetType(RegularExpressionLanguageService), langService, True)End SubProtected Overrides Overloads Sub Dispose(ByVal disposing As Boolean) TryIf disposing ThenIf langService IsNot Nothing ThenlangService.Dispose()End IfEnd IfFinallyMyBase.Dispose(disposing)End TryEnd SubPublic Sub Dispos() Implements IDisposable.DisposeDispose(True)GC.SuppressFinalize(Me)End SubEnd ClassSince our package’s goal is to provide the regular expression language service, if our package gets loaded into the memory and sited (this is the time when the overridden Initialize method is called), we instantly create the service instance. The language service gets sited in our package and then added to the package’s service container and also promoted to the parent container.In the overridden Dispose method we release the resources held by the language service then clean up the other resources held by the package. The overridden Dispose is called from public Dispose that is implicit implementation of the IDisposable interface. Since our package is cleaned up here, we must use the GC.SuppressFinalize method call to avoid double cleanup of the package instance. The lifecycle management pattern used here should be applied for your own language services.Scenario: Implementing a small language serviceThe code editor built in Visual Studio can be customized by language services. This customization features include brace matching, syntax coloring, IntelliSence and many others. In order the code editor can leverage on a language service, it must access a few functions of them.Such kind of function is the access to the so-called scanner and the parser of the language service. The scanner is responsible for retrieving tokens like keywords, identifiers, double precision numbers, strings, comments and many others from the source text. The parser is responsible to understand what the sequence of tokens means, whether it matches with the expected language syntax, and so on.Syntax coloring basically uses only the scanner, but can use the parser, for example to use different colors for value and reference types. Brace matching generally uses the parser to find the matching pairs of opening and closing braces.In this example we use a small language service based on regular expressions that use only the scanner for syntax coloring and no parser for more complex tasks.To be a language service, we must create a COM object implementing a few interfaces with the IVsLanguage prefix in their names. The Managed Package Framework provides the LanguageService class that is the best type to start with instead of implementing the interfaces from scratch. To create a language service of our own, we must create a LanguageService derived class and override a few methods as the following code extract shows:<ComVisible(True)> _<Guid("C674518A-3127-4f00-9C4D-BE0EAAB8C761")> _Friend Class RegularExpressionLanguageServiceInherits LanguageServicePrivate scanner As RegularExpressionScannerPrivate preference As LanguagePreferencesPublic Overrides Function ParseSource(ByVal req As ParseRequest) As AuthoringScopeThrow New NotImplementedException()End FunctionPublic Overrides ReadOnly Property Name() As StringGetReturn "Regular Expression Language Service"End GetEnd PropertyPublic Overrides Function GetFormatFilterList() As StringReturn VSPackage.RegExFormatFilterEnd FunctionPublic Overrides Function GetScanner(ByVal buffer As _Microsoft.VisualStudio.TextManager.Interop.IVsTextLines) As IScannerIf scanner Is Nothing Thenscanner = New RegularExpressionScanner()End IfReturn scannerEnd FunctionPublic Overrides Function GetLanguagePreferences() As LanguagePreferencesIf preference Is Nothing Thenpreference = New LanguagePreferences(Me.Site,GetType(RegularExpressionLanguageService).GUID, _"Regular Expression Language Service")End IfReturn preferenceEnd FunctionEnd Class(This is the full code of the class; I have only changed indenting and omitted comments.)Our RegularExpressionLanguageService must be visible by COM and so must have an explicit GUID. The overridden Name property is used to obtain the name ofour language service. The GetFormatFilterList method retrieves the file filter expression u sed by the Save As dialog (“RegEx File (*.rgx)”).The overridden ParseSource method is to parse the specified source code according to a ParseRequest instance. Since our language service does not implement a parser, we throw a NotImplementedException here.Visual Studio supports language preference settings. Such kind of preference is for example IntelliSense support (supported or not), line numbers (should be displayed or not), the tab size used by the language and so on. By overriding the GetLanguagePreferences method we can tell what preferences are used by our service. In this implementation we use the default settings.The GetScanner method is the most important one in our language service. This method retrieves an object implementing the IScanner interface. As its name suggests, the returned object represents the scanner used to tokenize the source code text. The responsibility of a scanner object is delegated to a RegularExpressionScanner instance I treat in the next scenario.Scenario: Creating the scanner to support syntax coloring The scanner object is crucial for our regular expression language service. It implements the IScanner interface that has only two methods:Public Interface IScannerSub SetSource (source As String, offset As Integer)Function ScanTokenAndProvideInfoAboutIt (tokenInfo As TokenInfo, _ByRef state As Integer) As BooleanEnd InterfaceThe SetSource method is used to set a line to be parsed and also an offset is provided to start the parsing from. The ScanTokenAndProvideInfoAboutIt method is to obtain the next token from the currently parsed line. The TokenInfo parameter passed in is a structure to be filled up by the method, this represents the token scanned. The state parameter is an integer value representing the scanner state (it is used for so-called context-dependent scanning).The RegularExpressionScanner class implements this interface:Friend Class RegularExpressionScannerImplements IScannerPrivate sourceString As StringPrivate currentPos As IntegerPrivate Shared patternTable As RegularExpressionTableEntry() = _New RegularExpressionTableEntry(3) _{ _New RegularExpressionTableEntry("[A-Z]?", ment), _ New RegularExpressionTableEntry("[a-z]?", TokenColor.Keyword), _ New RegularExpressionTableEntry("[0-9]?", TokenColor.Number), _New RegularExpressionTableEntry(".", TokenColor.Text) _}Private Shared Sub MatchRegEx(ByVal source As String, ByRef charsMatched As Integer, _ByRef color As TokenColor)' --- Implementation omitted from this code extractEnd SubPublic Function ScanTokenAndProvideInfoAboutIt(ByVal tokenInfo As TokenInfo, _ByRef state As Integer) As Boolean _Implements IScanner.ScanTokenAndProvideInfoAboutItIf sourceString.Length = 0 ThenReturn FalseEnd IfDim color As TokenColor = TokenColor.TextDim charsMatched As Integer = 0MatchRegEx(sourceString, charsMatched, color)If tokenInfo IsNot Nothing ThentokenInfo.Color = colortokenInfo.Type = TokenType.TexttokenInfo.StartIndex = currentPostokenInfo.EndIndex = Math.Max(currentPos, currentPos + charsMatched - 1)End IfcurrentPos += charsMatchedsourceString = sourceString.Substring(charsMatched)Return TrueEnd FunctionPublic Sub SetSource(ByVal source As String, ByVal offset As Integer) _Implements IScanner.SetSourcesourceString = sourcecurrentPos = offsetEnd SubEnd ClassThe implementation of the SetSource method is trivial. The ScanTokenAndProvideInfoAboutIt method uses MatchRegEx to obtain the next token. According to the token it retrieves the TokenInfo structure is filled up and the position where the next token starts is set.The scanner defines a nested class called RegularExpressionTableEntry that describes a token represented by a RegEx and also assigns a token color to it. The static patternTable array demonstrates how this structure is set up. The MatchRegEx method uses this array to obtain the next token.SummaryThe Regular Expression Language Service sample demonstrates how easy is to create a very simple language service. In this case, the service created here implements a scanner that is able to tokenize the source text according to regular expression patterns. The language service also supports syntax coloring: each token accepted has a distinguishing color.Language services must be registered in order to be accessible by the VS Shell and third party packages. With a simple decorating attribute on the package owning a language service it can be associated with a file extension. When the file with the specified extension is opened in the code editor the corresponding language service is used to edit it.。

相关主题