草根吧_非盈利论坛_免费discuz插件_discuz风格模板_微擎资源_phpwind插件_phpwind教程

[color=rgb(153, 153, 153)]本帖最后由吾爱开源 Ganlv 分享于 2018-2-8 04:56 编辑

上一篇文章：【原创】PHP加密中的“VMProtect”——魔方加密反编译分析过程 https://www.cgzz8.cn/t-32248-1-1.html

楼主并没有学过编译原理、数据结构、汇编语言等等的专业课程，本文纯属自己想出来的东西，并不一定是最优的解决方案，欢迎大家回帖交流

上回书说道，我们已经分析出来了代码的一些功能了，也有部分半自动化的反编译成果了。

总而言之，我们的目标是可行的，我们可以继续做下去。

自动反汇编

首先我们根据上回的分析手动整理出了一套指令集，整理指令集这个过程谁也帮不了你。我用了 4 天时间开源了这个加密方式，想想编写这套加密方式的人说不定可能要花上一个多月呢，想想 4 天也不算太多哈。

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-1.jpg

看看左侧那一堆 .1 .2 .3 你就知道我尝试了多少种方法来编写这堆东西。

反汇编器分析

可以参考一下附件的文件，我这里只讲一部分，其他请自行分析。

这个函数是我统一用来记录反编译之后结果的。注意 $eip - 6，上回分析过，1 字节秘钥，4 字节指令，1 字节指令长度，指令长度都是 6，同时我们要以指令开始处作为引用的基准，所以统一减去 6。

protected function dasmLine($eip, $asm, $args = [])
{
   $this->asm[$eip - 6] = [
   'asm' => $asm,
   'args' => $args,
   ];
}

由于这套指令集大部分都是无操作数的指令，部分是以下 1 字节为立即数的指令，部分是以下 2 字节为立即数的指令，另外还有少量其他类型立即数的指令，我这里使用了 php 的魔术方法 __call 就是对部分函数统一处理

public function __call($name, $arguments)
{
   if (in_array($name, ['func7', 'func8', 'func10', 'func20', 'func33', 'func35', 'func41', 'func44', 'func56'])) {
   // 1 字节立即数
   $this->dasmLine($arguments[0], $this->asmMap[$name], [$this->getInt($arguments[0], 1)]);
   return 1;
   } elseif (in_array($name, ['func1', 'func16'])) {
   // 2 字节立即数
   $this->dasmLine($arguments[0], $this->asmMap[$name], [$this->getInt($arguments[0], 2)]);
   return 2;
   } elseif (method_exists($this, '_' . $name)) {
   // 特殊指令
   $result = $this->{'_' . $name}($arguments[0], $this->asmMap[$name]);
   $this->dasmLine($arguments[0], $this->asmMap[$name], $result[1]);
   return $result[0];
   } elseif (isset($this->asmMap[$name])) {
   // 无操作数指令
   $this->dasmLine($arguments[0], $this->asmMap[$name]);
   return 0;
   } else {
   throw new \Exception('Call undefined function ' . $name);
   }
}

然后就是一些通用的函数了，注意这个 getFunc 就是获取函数名（或者说获取下一条指令）的函数，就是把原本文件的算法抄过来了，然后加上函数名映射。

protected function getFunc($eip)
{
   $key = $this->getMemory($eip, 1);
   $func = $this->getMemory($eip + 1, 2) . $this->getMemory($eip + 4, 2);
   $func = str_repeat($key, 4) ^ $func;
   $func = base64_decode('zb+8') . $func;
   if (isset($this->fnm[$func])) { // function name map
   $func = $this->fnm[$func];
   } else {
   throw new \Exception('Function not exists: $eip=' . $eip . ' func=' . $func);
   }
   if (ord($key ^ $this->getMemory($eip + 3, 1)) != 6) {
   throw new \Exception('Instruction length is not 6');
   }
   return $func;
}

其他就是对每一个特殊指令进行单独操作，这个过程也很漫长。

然后我们就会得到反汇编的文件。

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-2.jpg

为了方便阅读，我有写了一个输出为文本文件的函数，我们之后分析起来比较方便。

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-3.jpg

然后想了想，我又加了一个功能，就是跳转列表，这个功能在我们分析的过程中还是非常重要的，因为程序不可能都是线性的，条件分支会给反编译带来麻烦，不规则的跳转更麻烦，我们之后要想办法消除这些跳转。

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-4.jpg

[indent]

你要是自己实现反编译了，你就知道为什么不推荐使用 goto 语句了。

[/indent]

至此，我们的反编译器实现完成了。我们只用到了那个导出的 bin，完全脱离了原本的虚拟机。

反编译之后的指令，已经没有 1/2/4/6/12 位立即数的说法了，数字已经完全解码出来了，不用再考虑位数的问题了，可以说把指令集简化了。

代码分块

之前说到了跳转很烦人，我们要想办法消除这些跳转，就是代码分块。

[indent]

IDA 也有这个功能，这个功能好像还非常强大，只不过我还没想过开源过什么东西，还没学 IDA。

本来我是没有分块这一步骤的，直接反编译，但是后来发现跟踪跳转太麻烦了，所以就单独出来，先分块，然后再反编译。

[/indent]

分块的基础就是根据跳转，这套指令集只有两个跳转，jmp 和 jnz，这两个一个是 if 语句，一个是 if 的 stmts 块结束到 else 块结束出的跳转，一个是跳出 while 的条件，一个是继续循环的跳回指令。

这里说的 stmts 与 else

if (cond)
stmts-block
else
else-block

有两种方法，一种是顺序搜索，另一种是跟踪搜索。

最开始我就是想的顺序搜索，遇到 jnz 和 jmp 都不跳转，只是记录他们的位置和相关信息，之后统一处理，顺序分析代码，之后再区分 stmts 块和 else 块。后来发现代码中有一些难点不太好实现。比如两个位置同时跳往同一处，到这里时，我需要同时组装两个 if 块，有时候两个 if 的代码块并不容易区分。

后来我就改用跟踪搜索了，先说 jnz，遇到 jnz 的话，代码分两支执行，遇到 jnz 就继续分，遇到 jmp 就直接跳转，直到两个代码会合（或者往回跳转），这样可以把原来混乱的 jnz 与 jmp 统一。

说到会合，说起来简单，做起来就要一定技巧了，这里又不能分成两个线程去做，怎么办呢？

简而言之就是，分为两个指令指针，一个指向正常执行的下一条指令，一个指向跳转之后的指令，然后总让小的指针往后执行，直到跳转指令或者相遇。

jnz a
...... <---- 指针 1
jmp b
a:
...... <---- 指针 2
jmp c
b:
......
jmp d
c:
......
d:
......jnz a
......
jmp b
a:
...... <---- 指针 2
jmp c
b:
...... <---- 指针 1
jmp d
c:
......
d:
......jnz a
......
jmp b
a:
......
jmp c
b:
...... <---- 指针 1
jmp d
c:
...... <---- 指针 2
d:
......jnz a
......
jmp b
a:
......
jmp c
b:
......
jmp d
c:
......
d:
...... <---- 指针 1 & 指针 2

这就是大概的过程，我的代码实现中使用了 JumpException 这个东西，就是为了遇到跳转指令，直接停止指针的继续移动，重新判断移动哪个指针。

关键代码

/**
* 条件分支语句
* @param $jump_pointer
* @param $next_pointer
* home.php?mod=space&uid=155549 mixed
* @throws \Exception
*/
protected function _jnz($jump_pointer, $next_pointer)
{
   if ($jump_pointer < $next_pointer) {
   throw new \Exception('jump pointer < next pointer');
   }

   // 备份 $asmTree
   $asmTreeElse = $asmTreeStmts = $asmTree = $this->asmTree;
   $asmTreePointer = count($this->asmTree);

   // 并分别走 stmts 块和 else 块
   ++$this->jnzStack;
   while ($jump_pointer != $next_pointer) {
   if (($jump_pointer > $next_pointer && $next_pointer > 0) || $jump_pointer < 0) {
   $this->asmTree = $asmTreeElse;
   try {
   $next_pointer = $this->dissect($next_pointer, $jump_pointer);
   } catch (JumpException $exception) {
   $next_pointer = $exception->jump_pointer;
   }
   $asmTreeElse = $this->asmTree;
   } else {
   $this->asmTree = $asmTreeStmts;
   try {
   $jump_pointer = $this->dissect($jump_pointer, $next_pointer);
   } catch (JumpException $exception) {
   $jump_pointer = $exception->jump_pointer;
   }
   $asmTreeStmts = $this->asmTree;
   }
   }
   --$this->jnzStack;

   // 检测循环
   $asmTreeStmtsLastOne = $asmTreeStmts[count($asmTreeStmts) - 1];
   $loop_begin = false;
   if ($asmTreeStmtsLastOne['asm'] == 'loop_end') {
   $loop_begin = $asmTreeStmtsLastOne['args']['begin'];
   array_pop($asmTreeStmts);
   $asmTreeElse[] = [
   'asm' => 'iter_break',
   'args' => [],
   ];
   }

   // 恢复 $asmTree
   $this->asmTree = $asmTree;

   // 构造 if 指令
   $this->asmTree[] = [
   'asm' => 'if [esp]',
   'args' => [
   'stmts' => array_slice($asmTreeStmts, $asmTreePointer),
   'else' => array_slice($asmTreeElse, $asmTreePointer),
   ],
   ];

   // 构造 loop 指令
   if ($loop_begin !== false) {
   $loop_stmts = array_slice($this->asmTree, $loop_begin);
   $this->asmTree = array_slice($this->asmTree, 0, $loop_begin);
   $this->asmTree[] = [
   'asm' => 'loop',
   'args' => [
   'stmts' => $loop_stmts,
   ],
   ];
   }

   return $next_pointer;
}

代码中对循环也做出了判断，如果指针往已经运行过的地方跳转，就是循环了，循环。

同样，我们还是美观地输出一下我们的成果

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-5.jpg

这回，我们的代码已经不需要指令指针这个东西了，顺序执行即可。

反编译

这一步，我们需要把已经分块的代码，根据每一个操作指令对栈做出的操作，推算出原本的代码是什么样的。

线性代码add esp, 16
push (null)
db [esp], 'is_admin'
call (0) [esp]
not [esp]
if [esp]
   pop
else
   pop
   push (null)
   link [esp], [ebp+1]
   push (null)
   db [esp], 'Grace'
   ......

add esp,16 申请局部变量，我们要在程序中生成 16 个局部变量的变量名。

/**
* 分配局部变量
* @param int $count
*/
protected function _add($count)
{
   for ($i = 0; $i < $count; ++$i) {
   $this->v[] = new Variable('v' . $i);
   }
}

push (null) 这里向栈中压入了一个 null，我们向语法树种压入一个 null 即可

protected function _push()
{
++$this->astp;
$this->ast[$this->astp] = new ConstFetch(new Name('null'));
}

为什么要向语法树中压入 null，而不是向一个栈中压入？

因为我们是解释程序代码，反编译，我们分析指令的用途，把指令对栈的操作，转换为构造语法树的操作。所以不能像栈中压入，虽然我们构造语法树的方式也是用压栈、出栈的方式。

db 指令，就直接写入就行了。注意，是构造语法树相应的数据类型的节点，而不是直接输入数据。

/**
* 读取数据
* @param string|int $data
* @throws \Exception
*/
protected function _db($data)
{
   if (is_string($data)) {
   $this->ast[$this->astp] = new String_($data);
   } elseif (is_numeric($data)) {
   $this->ast[$this->astp] = new LNumber($data);
   } elseif (is_array($data)) {
   $this->ast[$this->astp] = new Array_($data);
   } else {
   throw new \Exception('Move invalid data.');
   }
}

call 那就用“FuncCall 当前位置的表达式”代替当前位置原本的表达式。

protected function _call($argCount)
{
   $args = [];
   for ($i = $argCount - 1; $i >= 0; --$i) {
   $args[] = new \PhpParser\Node\Arg($this->ast[$this->astp - $i]);
   }
   $this->ast[$this->astp - $argCount] = new FuncCall(
   new Name($this->ast[$this->astp - $argCount]->value),
   $args
   );
}

not 就是在外面再套一层 Not

protected function _not()
{
$this->ast[$this->astp] = new BooleanNot($this->ast[$this->astp]);
}

现在你应该已经基本了解如何根据指令堆栈的操作来构造语法树了。接下来我们来分析一下条件分支结构。

if [esp]
pop
else
pop

对当前位置进行判断，很简单

$this->ast[$this->astp] = new If_($this->ast[$this->astp]);

请注意，每句后面都跟着 pop，意思就是我刚才进行判断的表达式这个栈，我完全不要了，这个表达式是一次性使用的。

link [esp], [ebp+1]： link 指令是我自己起的名字，在 php 中就是 =&，设置引用，我们不用设置这个引用，我们只需要把变量的名填上就行了过去就行了。

/**
* 引用
* @param int $offset link [esp], [ebp+{$offset}] 中的 $offset
*/
protected function _link($offset)
{
$this->ast[$this->astp] = $this->v[$offset - 1];
}

这里的 v 就是一个储存 Variable 对象的变量。ebp 的含义大家都知道吧，ebp+1 就是第 1 个局部变量，记为 $v0（你愿意记成 $v1 我也没有意见）。

[indent]

[ebp] = 0, [ebp-1] = -1, [ebp-2] 为输入变量个数，这个在最开始定义过了

[/indent]

同理，你可以完成所有的线性代码了。

条件分支结构

对全篇所有的 jnz 分析，我发现 4 类

这几种形式主要看是不是 stmts 为空，看看是不是紧跟着一个 pop，看 if 跳出之后栈是否平衡。

三元运算符

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-6.jpg

逻辑或短路

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-7.jpg

这里讲一部分代码

// 普通 if 语句
$this->_pop();
$cond = $this->ast[$this->astp];
// 备份 AST
$ast = $this->ast;
$astp = $this->astp;
$astbp = $this->astbp;
$stackMap = $this->stackMap;
// 解析 stmts 块
$this->decompile(array_slice($item['args']['stmts'], 1));
$stmts = array_slice($this->ast, $astp + 1, $this->astp - $astp);
// 恢复 AST
$this->ast = $ast;
$this->astp = $astp;
$this->astbp = $astbp;
$this->stackMap = $stackMap;
// 解析 else 块
$this->decompile(array_slice($item['args']['else'], 1));
$else = array_slice($this->ast, $astp + 1, $this->astp - $astp);
// 如果栈差1、只有一条表达式，就换成三元运算符
$is_ternary = ($this->astp - $this->astbp == 1 && count($stmts) == 1 && count($else) == 1
   && $stmts[0] instanceof Expr && $else[0] instanceof Expr);
$this->ast = $ast;
$this->astp = $astp;
$this->astbp = $astbp;
$this->stackMap = $stackMap;
// 构造 AST
if ($is_ternary) {
   $this->ast[$this->astp] = new Ternary($cond, $stmts[0], $else[0]);
} else {
   $this->ast[$this->astp] = new If_($cond);
   if ($stmts) {
   $this->ast[$this->astp]->stmts = $stmts;
   }
   if ($else) {
   $this->ast[$this->astp]->else = new Else_($else);
   }
}循环结构

循环其实并不麻烦，因为虚拟机实现方法为，while(true) 死循环 + if (循环指针结束) break; 的方式，这个直接把 loop 的代码放进 while(true) {} 中即可，这里并不详细叙述了。

看看成果吧（似乎还有一些错误）

图片：[调试逆向] 【原创】PHP魔方加密手动反编译基于栈的指令-8.jpg

代码基本上人就可以读了，再回头看看原本的乱码，很有成就感嘛。

附录

想要代码的话我可以发，其实我的这个代码也没有什么大用，只是专门针对这个文件的，对其他文件要重新修改。其实你想直接读我的代码也不一定能看懂，还不如弄懂原理自己写呢。

想要代码的可以免费评个分，毕竟好歹我研究了四天呢。

本文应该不会继续更新了，其实还差一步代码整理，这个就是体力活了。比如常量折叠、否定否定等于肯定、if 语句 else 和 stmts 互换等等。研究的差不多了，也就不太想做了。

mfenc-decompiler.zip

附魔方解密自动化解密反编译编程全套源码：

本部分内容设定了隐藏,需要回复后才能看到

【原创】PHP魔方加密解密 手动反编译基于栈的指令 魔方解密自动化编程反编译

【原创】PHP魔方加密解密手动反编译基于栈的指令魔方解密自动化编程反编译