本文由我司收集整编,推荐下载,如有疑问,请与我司联系
在Python 中逐步查找流数据中的正则表达式匹配
在Python 中逐步查找流数据中的正则表达式匹配[英]Incrementally finding regular expression matches in streaming data in Python I have data streaming into a
number of TCP sockets continuously. For each, I have a different regular expression that I
need to pull out matches for. For example, one might match numbers of the format
##.#
followed by the letter f:
我有数据流连续进入许多TCP 套接字。
对于每一个,我有一个不同的正则表达式,
我需要拉出匹配。
例如,可以匹配格式##。
#的数字,后跟字母f:
r = repile(rb’([0-9][0-9]\.[0-9])f’)Another might match numbers of the format ### preceded by the letter Q:
另一个可能匹配字母Q 前面的###格式的数字:
r = repile(rb’Q([0-9][0-9][0-9])’) In reality, the expressions may be of arbitrary length and complexity, and are pulled from configuration files and not known in advance. They are not hard-coded.
实际上,表达式可以具有任意长度和复杂性,并且从配置文件中提取并且事先不知
道。
它们不是硬编码的。
When new data comes in, I append it to a buffer of type bytearray() (here called self.buffer). Then I call a function like this (with self.r being the compiled regular expression):
当新数据进入时,我将它附加到bytearray()类型的缓冲区(此处称为self.buffer)。
然
后我调用这样的函数(self.r 是编译的正则表达式):
def advance(self): m = self.r.search(self.buffer) # No match. Return. if m is None: return None # Match. Advance the buffer and return the matched groups. self.buffer = self.buffer[m.end():] return m.groups() If there is no match yet, it returns None. If there is a match, it returns the match and discards the buffer up to the end of the match, making
itself ready to be called again.。