上面的例子,词法分析器都是在单个的Python模块中指定的。如果你想将标记的规则放到不同的模块,使用module关键字参数。例如,你可能有一个专有的模块,包含了标记的规则:
# module: tokrules.py
# This module just contains the lexing rules
# List of token names. This is always required
tokens = (
"NUMBER",
"PLUS",
"MINUS",
"TIMES",
"DIVIDE",
"LPAREN",
"RPAREN",
)
# Regular expression rules for simple tokens
t_PLUS = r"+"
t_MINUS = r"-"
t_TIMES = r"*"
t_DIVIDE = r"/"
t_LPAREN = r"("
t_RPAREN = r")"
# A regular expression rule with some action code
def t_NUMBER(t):
r"d+"
t.value = int(t.value)
return t
# Define a rule so we can track line numbers
def t_newline(t):
r"
+"
t.lexer.lineno += len(t.value)
# A string containing ignored characters (spaces and tabs)
t_ignore = " "
# Error handling rule
def t_error(t):
print "Illegal character "%s"" % t.value[0]
t.lexer.skip(1)
现在,如果你想要从不同的模块中构建分析器,应该这样(在交互模式下):
>>> import tokrules
>>> lexer = lex.lex(module=tokrules)
>>> lexer.input("3 + 4")
>>> lexer.token()
LexToken(NUMBER,3,1,1,0)
>>> lexer.token()
LexToken(PLUS,"+",1,2)
>>> lexer.token()
LexToken(NUMBER,4,1,4)
>>> lexer.token()
None
module
选项也可以指定类型的实例,例如:
import ply.lex as lex
class MyLexer:
# List of token names. This is always required
tokens = (
"NUMBER",
"PLUS",
"MINUS",
"TIMES",
"DIVIDE",
"LPAREN",
"RPAREN",
)
# Regular expression rules for simple tokens
t_PLUS = r"+"
t_MINUS = r"-"
t_TIMES = r"*"
t_DIVIDE = r"/"
t_LPAREN = r"("
t_RPAREN = r")"
# A regular expression rule with some action code
# Note addition of self parameter since we"re in a class
def t_NUMBER(self,t):
r"d+"
t.value = int(t.value)
return t
# Define a rule so we can track line numbers
def t_newline(self,t):
r"
+"
t.lexer.lineno += len(t.value)
# A string containing ignored characters (spaces and tabs)
t_ignore = " "
# Error handling rule
def t_error(self,t):
print "Illegal character "%s"" % t.value[0]
t.lexer.skip(1)
# Build the lexer
def build(self,**kwargs):
self.lexer = lex.lex(module=self, **kwargs)
# Test it output
def test(self,data):
self.lexer.input(data)
while True:
tok = lexer.token()
if not tok: break
print tok
# Build the lexer and try it out
m = MyLexer()
m.build() # Build the lexer
m.test("3 + 4") # Test it
当从类中定义lexer,你需要创建类的实例,而不是类本身。这是因为,lexer的方法只有被绑定(bound-methods)对象后才能使PLY正常工作。
当给lex()方法使用module选项时,PLY使用dir()
方法,从对象中获取符号信息,因为不能直接访问对象的__dict__
属性。(译者注:可能是因为兼容性原因,dict这个方法可能不存在)
最后,如果你希望保持较好的封装性,但不希望什么东西都写在类里面,lexers可以在闭包中定义,例如:
import ply.lex as lex
# List of token names. This is always required
tokens = (
"NUMBER",
"PLUS",
"MINUS",
"TIMES",
"DIVIDE",
"LPAREN",
"RPAREN",
)
def MyLexer():
# Regular expression rules for simple tokens
t_PLUS = r"+"
t_MINUS = r"-"
t_TIMES = r"*"
t_DIVIDE = r"/"
t_LPAREN = r"("
t_RPAREN = r")"
# A regular expression rule with some action code
def t_NUMBER(t):
r"d+"
t.value = int(t.value)
return t
# Define a rule so we can track line numbers
def t_newline(t):
r"
+"
t.lexer.lineno += len(t.value)
# A string containing ignored characters (spaces and tabs)
t_ignore = " "
# Error handling rule
def t_error(t):
print "Illegal character "%s"" % t.value[0]
t.lexer.skip(1)
# Build the lexer from my environment and return it
return lex.lex()