4.14 其他方式定义词法规则

上面的例子,词法分析器都是在单个的Python模块中指定的。如果你想将标记的规则放到不同的模块,使用module关键字参数。例如,你可能有一个专有的模块,包含了标记的规则:

# module: tokrules.py
# This module just contains the lexing rules

# List of token names.   This is always required
tokens = (
   "NUMBER",
   "PLUS",
   "MINUS",
   "TIMES",
   "DIVIDE",
   "LPAREN",
   "RPAREN",
)

# Regular expression rules for simple tokens
t_PLUS    = r"+"
t_MINUS   = r"-"
t_TIMES   = r"*"
t_DIVIDE  = r"/"
t_LPAREN  = r"("
t_RPAREN  = r")"

# A regular expression rule with some action code
def t_NUMBER(t):
    r"d+"
    t.value = int(t.value)    
    return t

# Define a rule so we can track line numbers
def t_newline(t):
    r"
+"
    t.lexer.lineno += len(t.value)

# A string containing ignored characters (spaces and tabs)
t_ignore  = " 	"

# Error handling rule
def t_error(t):
    print "Illegal character "%s"" % t.value[0]
    t.lexer.skip(1)

现在,如果你想要从不同的模块中构建分析器,应该这样(在交互模式下):

>>> import tokrules
>>> lexer = lex.lex(module=tokrules)
>>> lexer.input("3 + 4")
>>> lexer.token()
LexToken(NUMBER,3,1,1,0)
>>> lexer.token()
LexToken(PLUS,"+",1,2)
>>> lexer.token()
LexToken(NUMBER,4,1,4)
>>> lexer.token()
None

module选项也可以指定类型的实例,例如:

import ply.lex as lex

class MyLexer:
    # List of token names.   This is always required
    tokens = (
       "NUMBER",
       "PLUS",
       "MINUS",
       "TIMES",
       "DIVIDE",
       "LPAREN",
       "RPAREN",
    )

    # Regular expression rules for simple tokens
    t_PLUS    = r"+"
    t_MINUS   = r"-"
    t_TIMES   = r"*"
    t_DIVIDE  = r"/"
    t_LPAREN  = r"("
    t_RPAREN  = r")"

    # A regular expression rule with some action code
    # Note addition of self parameter since we"re in a class
    def t_NUMBER(self,t):
        r"d+"
        t.value = int(t.value)    
        return t

    # Define a rule so we can track line numbers
    def t_newline(self,t):
        r"
+"
        t.lexer.lineno += len(t.value)

    # A string containing ignored characters (spaces and tabs)
    t_ignore  = " 	"

    # Error handling rule
    def t_error(self,t):
        print "Illegal character "%s"" % t.value[0]
        t.lexer.skip(1)

    # Build the lexer
    def build(self,**kwargs):
        self.lexer = lex.lex(module=self, **kwargs)

    # Test it output
    def test(self,data):
        self.lexer.input(data)
        while True:
             tok = lexer.token()
             if not tok: break
             print tok

# Build the lexer and try it out
m = MyLexer()
m.build()           # Build the lexer
m.test("3 + 4")     # Test it

当从类中定义lexer,你需要创建类的实例,而不是类本身。这是因为,lexer的方法只有被绑定(bound-methods)对象后才能使PLY正常工作。

当给lex()方法使用module选项时,PLY使用dir()方法,从对象中获取符号信息,因为不能直接访问对象的__dict__属性。(译者注:可能是因为兼容性原因,dict这个方法可能不存在)

最后,如果你希望保持较好的封装性,但不希望什么东西都写在类里面,lexers可以在闭包中定义,例如:

import ply.lex as lex

# List of token names.   This is always required
tokens = (
  "NUMBER",
  "PLUS",
  "MINUS",
  "TIMES",
  "DIVIDE",
  "LPAREN",
  "RPAREN",
)

def MyLexer():
    # Regular expression rules for simple tokens
    t_PLUS    = r"+"
    t_MINUS   = r"-"
    t_TIMES   = r"*"
    t_DIVIDE  = r"/"
    t_LPAREN  = r"("
    t_RPAREN  = r")"

    # A regular expression rule with some action code
    def t_NUMBER(t):
        r"d+"
        t.value = int(t.value)    
        return t

    # Define a rule so we can track line numbers
    def t_newline(t):
        r"
+"
        t.lexer.lineno += len(t.value)

    # A string containing ignored characters (spaces and tabs)
    t_ignore  = " 	"

    # Error handling rule
    def t_error(t):
        print "Illegal character "%s"" % t.value[0]
        t.lexer.skip(1)

    # Build the lexer from my environment and return it    
    return lex.lex()
文章导航