netease-modsdk-wiki/docs/mcguide/20-玩法开发/18-性能优化/代码优化.md

---
front: https://nie.res.netease.com/r/pic/20210728/2dc2a94f-71f6-4cc5-8700-3c3696f79a0c.jpg
hard: 进阶
time: 30分钟
---

# 代码优化

## 前言

本文介绍了基于Python 2的一些常用技巧，能够优化代码，提升程序运行效率。

## 使用缓存(内存换CPU)

对象的重复创建与销毁会有一定性能消耗，对于需要频繁使用的数据，建议保存起来，下次从内存取出来直接使用，是一种常用的空间换时间(内存换CPU)的优化手段，对于减少游戏卡顿有较好效果。

### 避免在tick函数内使用import

import模块的消耗并没有小到可以忽略的地步，建议挪到文件的顶部进行import。如果这样会导致循环引用，则可以将模块缓存为类的成员变量

- 错误写法:

```python
class DemoClientSystem(ClientSystem):
    def Update(self):
        # 在每帧执行的逻辑内import模块
        import mod.client.extraClientApi as clientApi
        clientApi.xxx
```

- 正确写法:

```python
# 在文件顶部import模块
import mod.client.extraClientApi as clientApi
class DemoClientSystem(ClientSystem):
    def Update(self):
        clientApi.xxx
```

如果两个模块需要相互引用，那么同时在文件顶部import对方，会导致循环引用报错，则可以用下面的方法处理：

```python
class DemoClientSystem(ClientSystem):
    def __init__(self, namespace, systemName):
        ClientSystem.__init__(self, namespace, systemName)
        # 假设当前模块与另一个otherModule模块需要相互引用
        import demoScripts.client.otherModule as otherModule
        self.otherModule = otherModule

    def Update(self):
        self.otherModule.xxx
```

### 避免多次初始化常量

- 错误写法:

在频繁调用的函数中进行声明，例如每次Update的时候

```python
class DemoClientSystem(ClientSystem):
    def Update(self):
        # 常量，每帧创建，实际中可能这里会是比较多的数据
        bigDict = {
            (-1, -1): 1,
            (-1, 0): 2,
            (-1, 1): 3,
            (0, -1): 4,
            (0, 0): 5,
            (0, 1): 6,
            (1, -1): 7,
            (1, 0): 8,
            (1, 1): 9,
        }
        # 读取常量做一些逻辑
        do_something(bigDict)
```

- 正确写法:

包含数据比较多的一些常量，特别是List或者Dict类型的，可以放到类的__init__函数当中

```python
class DemoClientSystem(ClientSystem):
    # 构造函数
    def __init__(self, namespace, systemName):
        ClientSystem.__init__(self, namespace, systemName)
        # 在初始化时创建
        self.bigDict = {
            (-1, -1): 1,
            (-1, 0): 2,
            (-1, 1): 3,
            (0, -1): 4,
            (0, 0): 5,
            (0, 1): 6,
            (1, -1): 7,
            (1, 0): 8,
            (1, 1): 9,
        }

    def Update(self):
        do_something(self.bigDict)
```

### 缓存多次用到的中间数据

一些方法多次调用的返回值是一样，可以使用临时变量缓存，不需要重复调用

- 错误写法:
```python
class DemoServerSystem(ServerSystem):
    # 监听的ServerItemUseOnEvent事件回调
    def ServerItemUseOnEvent(self, args):
        # 设置多个方块
        self.SetBlock(args['dimensionId'], (args['x']-1, args['y'], args['z']), 'minecraft:air')
        self.SetBlock(args['dimensionId'], (args['x']-1, args['y'], args['z']), 'minecraft:air')
        self.SetBlock(args['dimensionId'], (args['x'], args['y'], args['z']), 'minecraft:air')
        self.SetBlock(args['dimensionId'], (args['x'], args['y'], args['z']-1), 'minecraft:air')
        self.SetBlock(args['dimensionId'], (args['x'], args['y'], args['z']+1), 'minecraft:air')

    def SetBlock(self, dimensionId, pos, blockName):
        serverApi.GetEngineCompFactory().CreateBlockInfo(levelId).SetBlockNew(pos, {'name': blockName}, 0, dimensionId)
```

- 正确写法:

```python
# compFactory使用缓存
serverCompFactory = serverApi.GetEngineCompFactory()
class DemoServerSystem(ServerSystem):
    # 监听的ServerItemUseOnEvent事件回调
    def ServerItemUseOnEvent(self, args):
        # 对字典内的值做缓存
        dimensionId = args['dimensionId']
        x = args['x']
        y = args['y']
        z = args['z']
        self.SetBlock(dimensionId, (x-1, y, z), 'minecraft:air')
        self.SetBlock(dimensionId, (x-1, y, z), 'minecraft:air')
        self.SetBlock(dimensionId, (x, y, z), 'minecraft:air')
        self.SetBlock(dimensionId, (x, y, z-1), 'minecraft:air')
        self.SetBlock(dimensionId, (x, y, z+1), 'minecraft:air')

    def SetBlock(self, dimensionId, pos, blockName):
        serverCompFactory.CreateBlockInfo(levelId).SetBlockNew(pos, {'name': blockName}, 0, dimensionId)
```

### 使用dict代替多个else if

当条件判断的分支很多时，dict跳转的性能会比一连串的else高很多。如果一定要用if，推荐把命中概率较高的判断放前面。

- 错误写法:

```python
serverCompFactory = serverApi.GetEngineCompFactory()
class DemoServerSystem(ServerSystem):
    def HandleBlocks(self, pos, dimensionId):
        # 获取方块信息
        blockIdentifier = serverCompFactory.CreateBlockInfo(levelId).GetBlockNew(pos, dimensionId)[0]
        # 根据方块类型做出不同的处理
        if blockIdentifier == "minecraft:iron_ore":
            self.handleIronBlock()
        elif blockIdentifier == "minecraft:gold_ore":
            self.handleGoldBlock()
        elif blockIdentifier == "minecraft:diamond_ore":
            self.handleDiamondBlock()
        ...
```

- 正确写法:

```python
serverCompFactory = serverApi.GetEngineCompFactory()
class DemoServerSystem(ServerSystem):
    def __init__(self):
        # 注册处理函数
        self.blockHandlers = {
            "minecraft:iron_ore": self.handleIronBlock,
            "minecraft:gold_ore": self.handleGoldBlock,
            "minecraft:diamond_ore": self.handleDiamondBlock,
        }

    def HandleBlocks(self, data):
        blockIdentifier = serverCompFactory.CreateBlockInfo(levelId).GetBlockNew(pos, dimensionId)[0]
        # 从dict中选取处理函数
        handler = self.blockHandlers.get(blockIdentifier)
        if handler:
            handler()
```

## 使用分帧（实时性换CPU）

同一时刻内处理大量的逻辑，容易造成卡顿。这时候需要把逻辑执行的时间错开到多帧去执行，让每一帧的任务量不要太重。

### 大批量修改数据分多帧处理

这里以方块为例：

- 错误写法: (同一时刻全部处理，需要处理 100 * 100 * 100 即一百万个方块，必然会卡)

```python
# 修改某个区域 100 * 100 * 100范围内的方块为空气
def SetBlocksToAir(self, fromPos):
    blockcomp = serverApi.CreateComponent(id, "Minecraft", "blockInfo")
    for x in range(1, 100):
        for y in range(1, 100):
            for z in range(1, 100):
                blockcomp.SetBlockNew((fromPos[0] + x, fromPos[1] + y, fromPos[2] + z), {'name':'minecraft:air'})
```

- 正确写法: (分开每帧只处理5个)

```python
# 修改某个区域 100 * 100 * 100范围内的方块为空气
def SetBlocksToAir(self, fromPos):
    # 命令队列
    self.posList = []
    self.posIndex = 0

    for x in range(1, 100):
        for y in range(1, 100):
            for z in range(1, 100):
                self.posList.append((fromPos[0] + x, fromPos[1] + y, fromPos[2] + z))

# 被引擎直接执行的父类的重写函数，引擎会执行该Update回调，1秒钟30帧
def Update(self):
    if self.posList:
        posListLen = len(self.posList)
        blockcomp = serverApi.CreateComponent(id, "Minecraft", "blockInfo")
        #每帧处理5个
        handleNum = 5
        while(handleNum > 0 and self.posIndex < posListLen):
            blockcomp.SetBlockNew(self.posList[self.posIndex], {'name':'minecraft:air'})
            self.posIndex = self.posIndex + 1
            handleNum = handleNum - 1

        # 全部处理完成
        if self.posIndex >= posListLen:
            self.posList = None
```

### 非重要逻辑降帧处理

不要每帧执行所有逻辑更新，不同的逻辑实际中根据实时性要求进行间隔更新

- 错误写法:
(每帧执行所有更新逻辑)
```python
def Update(self):
    self.do_something1()
    self.do_something2()
    self.do_something3()
```

- 正确写法:
(分开每帧只处理5个)
```python
class DemoClientSystem(ClientSystem):
    # 构造函数
    def __init__(self, namespace, systemName):
        ClientSystem.__init__(self, namespace, systemName)
        self.tick = 0

    def Update(self):
        self.tick = self.tick + 1
        # 重要逻辑每帧执行
        self.do_something1()

        if self.tick % 5 == 0:
            # 次要逻辑降帧执行
            self.do_something2()

        if self.tick % 10 == 0:
            # 更次要的逻辑，使用更低的帧率执行
            self.do_something3()
```

### 少用轮询逻辑

使用事件或一些适用的接口来代替每帧尝试的操作。

假想有一个需求：我想删除一个实体，但是当前这个实体没有被加载

- 错误写法：

  每帧尝试删除该实体，直到成功为止

- 推荐写法：

  1. 监听AddEntityServerEvent，在该实体的回调中删除。
  2. 如果该实体是手动创建的，可以使用SetPersistence接口将其设置为不存盘，那就不再需要处理该实体被卸载而无法删除的情况。

## 优化字节码

Python 是解释型语言，代码在运行时会先编译为字节码（Bytecode），再由解释器逐行执行字节码，优化字节码可以直接提升执行效率。

### 使用推导式

如果要对容器进行操作，使用推导式是最快的办法。在可以使用列表/字典/集合推导式时，尽量使用推导式，而不是使用for循环。

**列表添加元素：**

```python
a = []
for i in xrange(1000):
    if i % 2 == 0:
        a.append(i*i)
```

**缓存append方法：**

```python
a = []
l = a.append
for i in xrange(1000):
    if i % 2 == 0:
        l(i*i)
```

**列表推导式：**

```python
a = [i*i for i in xrange(1000) if i % 2 == 0]
```

**测试样例：**

```python
from timeit import timeit
print "loop + append:", timeit("for i in xrange(1000):\n if i % 2 == 0:\n  a.append(i*i)", "a=[]", number=10000)
print "loop + append(cache):", timeit("for i in xrange(1000):\n if i % 2 == 0:\n  l(i*i)", "a=[];l=a.append", number=10000)
print "list comprehenshion:", timeit("a = [i*i for i in xrange(1000) if i % 2 == 0]", number=10000)
```

**测试结果：**

```python
loop + append: 0.6161811
loop + append(cache): 0.5132234
list comprehenshion: 0.4063318
```

**结论：**

列表推导式，能获得明显的性能提升，元素越多差距越明显。

还有**字典推导式：**

```python
g =  (('a',1),('b',2),('c',3),('d',4),('e',5),('f',6))
d = {k:v for k, v in g if v % 2 == 0}
```

**集合推导式：**

```python
g = (1,2,3,4,5,6)
s = {v for v in g if v % 2 == 0}
```

### 字符串拼接

我们有很多办法拼接字符串，比如直接相加、使用format、使用%、使用join，那么到底哪种办法最快呢？

**常见写法：**

```python
s = s1 + s2 + s3

s = s1; s += s2; s += s3

s = '%s%s%s' % (s1,s2,s3)

s = ''.join((s1,s2,s3))
```

**测试样例1：**

```python
from timeit import timeit
N = 10000000
setup = 's1="hello"*35; s2="world"*25; s3="!"*30; s4=s3*2; s5=s3*2'
print(timeit("s = s1 + s2 + s3", setup, number=N))
print(timeit("s = s1; s+=s2; s+=s3", setup, number=N))
print(timeit("s = '%s%s%s' % (s1,s2,s3)", setup, number=N))
print(timeit("s = '{}{}{}'.format(s1,s2,s3)", setup, number=N))
print(timeit("s = ''.join((s1,s2,s3))", setup, number=N))
```

**测试结果1：**

```python
0.7396258
0.8553558
1.5691264
3.8130296
1.0085892
```

**测试样例2：**

```python
from timeit import timeit
N = 10000000
setup = 's1="hello"*35; s2="world"*25; s3="!"*30; s4=s3*2; s5=s3*2'
print(timeit("s = s1 + s2 + s3 + s4 + s5", setup, number=N))
print(timeit("s = s1; s+=s2; s+= s3; s+= s4; s+= s5", setup, number=N))
print(timeit("s = '%s%s%s%s%s' % (s1,s2,s3,s4,s5)", setup, number=N))
print(timeit("s = '{}{}{}{}{}'.format(s1,s2,s3,s4,s5)", setup, number=N))
print(timeit("s = ''.join((s1,s2,s3,s4,s5))", setup, number=N))
```

**测试结果2：**

```python
1.4091635
1.6201083
3.4721674
4.6679361
1.2252783
```

**结论：**

- 要拼接的子串数量较少时（如不多于3个），直接相加是最快的
- 当拼接的子串数量较多时，`join`方法是最快的
- 如果只是想纯粹拼接一下字符串，不要使用格式化方法

### 变量访问

局部变量访问速度最快，其次是全局变量。如果要访问对象的属性，比如self.client.aaa.bbb中出现了三个点，而每一个点代表一次访问，就会多消耗一次性能。建议在频繁使用时缓存为局部变量。

```python
# 缓存为全局变量CF，减少了一次访问
CF = serverApi.GetEngineCompFactory()
def OnCustomCommandTrigger(self, args):
    # 在循环前，将api方法缓存为局部变量
    createExplosion = CF.CreateExplosion(levelId).CreateExplosion
    for _ in xrange(1000):
        createExplosion(...)# 直接调用

    # 将自己的方法/属性缓存为局部变量
    func = self.xxxsystem.aaa.bbb
    for _ in xrange(1000):
        func(...)
```

### 字典查询

字典的查询属于属性访问中的一个特例。取字典中特定key的值，如取不到返回None，可有下列写法：

```python
def get1(d, key):
    if key in d:
        return d[key]
    return None

def get2(d, key):
    if d.has_key(key):
        return d[key]
    return None

def get3(d, key):
    return d.get(key)

def get4(d, key):
    return d.get(key, None)

def get5(d, key):
    try:
        return d[key]
    except KeyError:
        pass
```

**测试样例：**

```python
g_d = {"a": 23, "b": 11, "c": 88, "d": 2, "e": 3, "f": 4, "g": 11, "h": 25, "i": 46}
from timeit import timeit
print(timeit('get1(g_d, "b")', 'from __main__ import get1, g_d', number=100000))
print(timeit('get2(g_d, "b")', 'from __main__ import get2, g_d', number=100000))
print(timeit('get3(g_d, "b")', 'from __main__ import get3, g_d', number=100000))
print(timeit('get4(g_d, "b")', 'from __main__ import get4, g_d', number=100000))
print(timeit('get5(g_d, "b")', 'from __main__ import get5, g_d', number=100000))

print(timeit('get1(g_d, "z")', 'from __main__ import get1, g_d', number=100000))
print(timeit('get2(g_d, "z")', 'from __main__ import get2, g_d', number=100000))
print(timeit('get3(g_d, "z")', 'from __main__ import get3, g_d', number=100000))
print(timeit('get4(g_d, "z")', 'from __main__ import get4, g_d', number=100000))
print(timeit('get5(g_d, "z")', 'from __main__ import get5, g_d', number=100000))
```

结果分命中、不命中两种情况汇总：

| 单位：ms/1w次 | 命中       | 不命中      |
| --------- | -------- | -------- |
| get1      | 1.17     | **1.05** |
| get2      | 1.59     | 1.43     |
| get3      | 1.62     | 1.59     |
| get4      | **1.75** | 1.80     |
| get5      | **1.04** | **9.01** |

从这个表可以看到，get1用in来判断，平均表现是最好的，是否命中，都是1ms多一点。而最后这个try except，命中的时候是最佳的，不命中的时候性能就大幅恶化。

**结论：**

- 对于key是否存在，直接用in来做判断即可，has_key接口比in慢。当然in方法不止可以对字典用，也可以对任何iterable的对象用，python是动态语言，要清楚你in的对象到底是什么。
- get的default参数不必填None，因为它本来就是None，填进去反而更慢。

### 函数调用

函数调用是有额外开销的，效率敏感场合不容忽略。

**测试样例：**

```python
log = lambda msg: None

def foo(msg):
    log(msg)

from timeit import timeit
print(timeit('foo("hello")', 'from __main__ import foo', number=100000))
print(timeit('log("hello")', 'from __main__ import log', number=100000))
```

**测试结果：**

```python
0.0104322
0.0051873
```

**结论：**

python里1万次的函数调用的消耗，约1毫秒的量级。在效率敏感场合，尽量省去不必要的几行代码的函数包装，减少调用层级，以及减少默认参数个数。

### 方法调用

类与实例方法的调用和函数调用类似，封装太多也会有明显的效率下降，而且情况可能更严重。

**测试样例：**

```python
# -*- coding: gbk -*-
import time
# 定义时间测量装饰器
def time_it(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print "函数 {} 耗时: {:.0f} 毫秒".format(func.__name__, (end_time - start_time) * 1000)
        return result
    return wrapper
def show_warn(message):
    pass
HP_TH = 10
class Player(object):
    def __init__(self):
        self.hp = 0
        self.hp_th = HP_TH
    def tick(self):
        if self.hp < self.hp_th:
            self.perform_warn()
    def perform_warn(self):
        show_warn("warn")
class Player2(object):
    def __init__(self):
        self.hp = 0
        self.hp_th = HP_TH
    def tick(self):
        if self.hp < self.hp_th:
            show_warn("warn")
# 性能测试
if __name__ == "__main__":
    N = 10000
    # 测试 Player 类
    players = []
    for _ in xrange(N):
        players.append(Player())
    @time_it
    def run(n):
        for _ in xrange(n):
            for p in players:
                p.tick()
    run(100)
    # 测试 Player2 类
    players = []
    for _ in xrange(N):
        players.append(Player2())
    @time_it
    def run2(n):
        for _ in xrange(n):
            for p in players:
                p.tick()
    run2(100)
```

**测试结果：**

```python
函数 run 耗时: 274 毫秒
函数 run2 耗时: 168 毫秒
```

可见，减少一层方法调用后，耗时274ms能降到168ms。

**结论：**

为了效率的话，请尽量避免过多的类方法封装；同一实例方法的频繁调用，请先缓存下来（如第一个例子中的l=a.append）

### 模块导入

关于import写在什么地方，我们都知道，写在模块开头，有这么一些弊端：

- 首次加载卡顿
- 内存过多
- 带来冗余
- 循环引用

但写在函数内就一定是最好的办法吗？

**测试样例：**

```python
def tick():
    from packageA.subpackageA import math
    math.fabs(100)

from packageA.subpackageA import math
def tick2():
    math.fabs(100)

from timeit import timeit
print timeit("tick()", "from __main__ import tick", number=100000)
print timeit("tick2()", "from __main__ import tick2", number=100000)

# 假设把tick函数移到另一个package下（packageB/test.py）：
print timeit("tick()", "from packageB.test import tick", number=100000)
```

**测试结果：**

```python
0.1006268
0.0177434
0.1125192
```

可见，函数内import明显要慢很多，尤其是在另外一个package里面import。

**结论：**

基础性/通用性模块的导入，import写在模块头，当然前提是这些基础模块要做好规划，不要过于臃肿，不要互相耦合严重。

对于频繁调用的函数，函数开头不适宜有太多import，package结构也不宜搞得过于复杂。