<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
><channel><title>GCCFeli.cn &#187; GCC</title> <atom:link href="http://gccfeli.cn/tag/gcc/feed" rel="self" type="application/rss+xml" /><link>http://gccfeli.cn</link> <description></description> <lastBuildDate>Thu, 14 Jul 2011 08:18:00 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.1</generator> <atom:link rel='hub' href='http://gccfeli.cn/?pushpress=hub'/> <item><title>GCC中SIMD指令的应用方法</title><link>http://gccfeli.cn/2009/04/gcc-simd.html</link> <comments>http://gccfeli.cn/2009/04/gcc-simd.html#comments</comments> <pubDate>Tue, 14 Apr 2009 14:44:17 +0000</pubDate> <dc:creator>Felicia</dc:creator> <category><![CDATA[精华]]></category> <category><![CDATA[编译原理]]></category> <category><![CDATA[转载]]></category> <category><![CDATA[GCC]]></category> <category><![CDATA[MMX]]></category> <category><![CDATA[SIMD]]></category> <category><![CDATA[汇编]]></category><guid
isPermaLink="false">http://gccfeli.cn/?p=696</guid> <description><![CDATA[<p><strong>最近做图形学方面的编程，对SIMD指令比较感兴趣，于是转载了这篇文章。文章格式我稍微修正了一下。</strong></p><h3>X86的SIMD指令 &#8211; SIMD instrucitons in X86</h3><p>IA-32 Intel体系结构的指令主要分为以下几类 <a
href="#resources">[1]</a>：</p> <span
class="readmore"><a
href="http://gccfeli.cn/2009/04/gcc-simd.html" title="GCC中SIMD指令的应用方法">阅读全文（5046字）</a></span>]]></description> <content:encoded><![CDATA[<p><strong>最近做图形学方面的编程，对SIMD指令比较感兴趣，于是转载了这篇文章。文章格式我稍微修正了一下。</strong></p><h3>X86的SIMD指令 &#8211; SIMD instrucitons in X86</h3><p>IA-32 Intel体系结构的指令主要分为以下几类 <a
href="#resources">[1]</a>：</p><ul><li>通用</li><li>x87 FPU</li><li>MMX技术</li><li>SSE/SSE2/SSE3扩展</li></ul><p>MMX/SSE类扩展引入了SIMD（单指令多数据）的执行模式，可用于加速多媒体应用。 下面简要介绍一下这些指令的执行环境和特征。<br
/> <span
id="more-696"></span></p><ul><li>8个32位通用寄存器可为各个SIMD扩展所使用；</li><li>MMX：8个64位MMX寄存器（mm0 &#8211; mm7），也可为各SSE扩展所使用；<ul><li>数据为整数，最多支持两个32位</li><li>运算中没有寄存器能够进行溢出指示</li></ul></li><li>SSE：8个128位xmm寄存器，MXSCR寄存器，EFLAGS寄存器<ul><li>支持单精度浮点</li><li>MXSCR含有rounding, overflow标志</li><li>支持64位SIMD整数</li></ul></li><li>SSE2：执行环境同sse<ul><li>双精度浮点</li><li>128位整数</li><li>双—单精度转换</li></ul></li><li>SSE3：与Inte Prescott处理器一同发布不久，共13条指令<ul><li>主要增强了视频解码、3D图形优化和超线程性能</li></ul></li></ul><p>MMX技术出现最早，目前几乎所有的X86处理器都提供支持，包括嵌入式X86， 所以下面的讨论主要基于MMX，但方法完全适用于SSEn， 包括像AMD的3D Now等其它SIMD扩展。</p><p>MMX指令又分为以下几种：</p><ul><li>数据传送：movd, movq</li><li>数据转换：packsswb, packssdw, packuswb, punpckhbw, punpckhwd, punpckhdq, punpcklbw, punpcklwd, punpckldq</li><li>并行算术：paddb, paddw, paddd, paddsb, paddsw, paddusb, paddusw, psubb, psubw, psubd, psubsb, psubsw, psubusb, psubusb, psubusw, pmulhw, pmullw, pmaddwd</li><li>并行比较：pcmpeqb, pcmpeqw, pcmpeqd, pcmpgtb, pcmpgtw, pcmpgtd</li><li>并行逻辑：pand, pandn, por, pxor</li><li>移位与旋转：psllw, pslld, psllq, psrlw, psrld, psrlq, psraw, psrad</li><li>状态管理：emms</li></ul><p>这些指令除了需要注意功能外，还需要注意处理的数据类型。以上内容为背景介绍，细节请参考手册。</p><h3>性能优化 &#8211; Performance Optimization</h3><p>当使用C/C++完成了一个嵌入式应用的所有功能，性能问题常摆在面前， 这时可以使用profile工具(如gprof)找出产生瓶颈的函数， 将这些函数使用汇编彻底重写， 例如MPEG-4编解码器xvid项目 [4]就使用了这种方法， 而且针对不同处理器／指令集分别给出了不同的优化， 正是如此该项目无论功能、还是性能均为一流， 显然这是深度优化的目标所在。</p><p>在使用流水线、VLIW以及SIMD的体系结构（比如某些DSP）上， 整个函数的手工优化可以带来几倍到几十倍的性能提升。 不过，性能允许，对于函数内关键部分使用一些特定的实现， 既突出重点提高性能，又可以尽多地利用C/C++的高级特征， 相对缩短开发周期。 下面给出使用GCC时，应用MMX指令的几种混合编程方法：</p><ul><li>Intel C/C++ 编译器intrinsics</li><li>GCC builtin操作</li><li>嵌入汇编asm construct</li></ul><h3>Intel C/C++ 编译器intrinsics &#8211; Intel C/C++ Compiler Intrinsics</h3><p>查看IA-32 Intel指令集手册<a
href="#resources">[2]</a>时， 部分指令的解释中会有一项“Intel C/C++ Compiler Intrinsic Equivalent”， 会指出该指令对等的intrinsic。 intrinsic在C/C++程序中的语法是以函数形式出现， 编译时可以直接翻译为一条MMX指令（复合情况会生成最直接的几条）， 换言之，如果不使用intrinsic，可能需要多条C/C++语句完成， 而编译器却并不能保证将这几条语句能够生成这条最高效的MMX指令。 并不是每条MMX指令都有对等的intrinsic， 手册的附录中列出了所有的， 它们分为简单型（simple）和复合型（composite）两种， 每个简单型的就是对应一条指令，而复合型则对应多条指令。</p><p>GCC支持Intel C/C++ Compiler Intrinsics。用法如下示例：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#include </span><span
class="hl-quotes">&lt;</span><span
class="hl-string">stdio.h</span><span
class="hl-quotes">&gt;</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#include </span><span
class="hl-quotes">&lt;</span><span
class="hl-string">xmmintrin.h</span><span
class="hl-quotes">&gt;</span><span
class="hl-prepro"> /*一定需要包括此头文件*/</span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/*gcc -Wall -march=pentium4 -mmmx -o ins&nbsp; mmx_ins.c*/</span><span
class="hl-code"><br
/></span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">main</span><span
class="hl-brackets">(</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">argc</span><span
class="hl-code">,</span><span
class="hl-types">char</span><span
class="hl-code"> *</span><span
class="hl-identifier">argv</span><span
class="hl-brackets">[])</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*使用MMX做以下向量的点积*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">in1</span><span
class="hl-brackets">[]</span><span
class="hl-code"> = </span><span
class="hl-brackets">{</span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">2</span><span
class="hl-code">, </span><span
class="hl-number">3</span><span
class="hl-code">, </span><span
class="hl-number">4</span><span
class="hl-brackets">}</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">in2</span><span
class="hl-brackets">[]</span><span
class="hl-code"> = </span><span
class="hl-brackets">{</span><span
class="hl-number">2</span><span
class="hl-code">, </span><span
class="hl-number">3</span><span
class="hl-code">, </span><span
class="hl-number">4</span><span
class="hl-code">, </span><span
class="hl-number">5</span><span
class="hl-brackets">}</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">out1</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">out2</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">__m64</span><span
class="hl-code"> </span><span
class="hl-identifier">m1</span><span
class="hl-code">;&nbsp; &nbsp; </span><span
class="hl-mlcomment">/* MMX支持64位整数的mm寄存器 */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">__m64</span><span
class="hl-code"> </span><span
class="hl-identifier">m2</span><span
class="hl-code">;&nbsp; &nbsp; </span><span
class="hl-mlcomment">/* MMX操作需要使用mm寄存器 */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">__m128</span><span
class="hl-code"> </span><span
class="hl-identifier">m128</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/* for SSEn only*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*每次往mm寄存器装入两个short型的数，注意是两个*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">m1</span><span
class="hl-code"> = </span><span
class="hl-identifier">_mm_cvtsi32_si64</span><span
class="hl-brackets">(((</span><span
class="hl-types">int</span><span
class="hl-code">*</span><span
class="hl-brackets">)</span><span
class="hl-identifier">in1</span><span
class="hl-brackets">)[</span><span
class="hl-number">0</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">m2</span><span
class="hl-code"> = </span><span
class="hl-identifier">_mm_cvtsi32_si64</span><span
class="hl-brackets">(((</span><span
class="hl-types">int</span><span
class="hl-code">*</span><span
class="hl-brackets">)</span><span
class="hl-identifier">in2</span><span
class="hl-brackets">)[</span><span
class="hl-number">0</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*一条指令进行4个16位整数的乘加*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*生成两个32位整数*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">m2</span><span
class="hl-code">&nbsp; = </span><span
class="hl-identifier">_mm_madd_pi16</span><span
class="hl-brackets">(</span><span
class="hl-identifier">m1</span><span
class="hl-code">, </span><span
class="hl-identifier">m2</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*将低32位整数放入通用寄存器*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">out1</span><span
class="hl-code"> =&nbsp; </span><span
class="hl-identifier">_mm_cvtsi64_si32</span><span
class="hl-brackets">(</span><span
class="hl-identifier">m2</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*将高32位整数右移后，放入通用寄存器*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">m2</span><span
class="hl-code">&nbsp; = </span><span
class="hl-identifier">_mm_slli_pi32</span><span
class="hl-brackets">(</span><span
class="hl-identifier">m2</span><span
class="hl-code">, </span><span
class="hl-number">32</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">out2</span><span
class="hl-code"> =&nbsp; </span><span
class="hl-identifier">_mm_cvtsi64_si32</span><span
class="hl-brackets">(</span><span
class="hl-identifier">m2</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*清除MMX状态*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">_mm_empty</span><span
class="hl-brackets">()</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*将两个32位数相加，结果为8*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">out1</span><span
class="hl-code"> += </span><span
class="hl-identifier">out2</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">printf</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">a: %d</span><span
class="hl-special">\</span><span
class="hl-string">n</span><span
class="hl-quotes">&quot;</span><span
class="hl-code">, </span><span
class="hl-identifier">out1</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">return</span><span
class="hl-code"> </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/></span><span
class="hl-brackets">}</span></div></div></div><p>几点说明：</p><ul><li>即使你不是P4平台，编译时也请使用以下选项，<div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">gcc</span><span
class="hl-code"> -</span><span
class="hl-identifier">Wall</span><span
class="hl-code"> -</span><span
class="hl-identifier">march</span><span
class="hl-code">=</span><span
class="hl-identifier">pentium4</span><span
class="hl-code"> -</span><span
class="hl-identifier">mmmx</span><span
class="hl-code"> -</span><span
class="hl-identifier">o</span><span
class="hl-code"> </span><span
class="hl-identifier">ins</span><span
class="hl-code">&nbsp; </span><span
class="hl-identifier">mmx_ins</span><span
class="hl-code">.</span><span
class="hl-identifier">c</span></div></div></div><p> 否则，会出现如下类似信息：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main">...xmmintrin.h:34:3: #error &quot;SSE instruction set not enabled&quot;</div></div></div></li><li>最终结果实际并没有求得四对乘积的和，只是前两对的， instrinsic _mm_cvtsi32_si64只向mm寄存器放入了低32位，高32位为零， 但mmx有指令movq可以做到64位的数据传送，intrinsic没有对应， 这也说明并不是所有的指令有等价的intrinsic。</li><li>当计算的向量为两对0&#215;8000, 0&#215;8000时，即(-2^15)*(-2^15) + (-2^15)*(-2^15) ， 结果应该为 2^31，但计算出来的值是<br
/> -2^31， 因为发生了溢出，可程序无从知道。 这是使用MMX时，应特别注意的，计算溢出没有任何标志位指示，一个极大的值变为极小，SSE对此做了改善。</li><li>程序不再使用MMX之时，注意使用emms指令清除MMX状态。</li></ul><h3>使用built-in操作 &#8211; GCC built-in Operation</h3><p>什么是built-in操作？就是对待MMX操作数，就如int, float等基本数据类型一般， 有相应定义的操作，如加(+)、减(-)，或者数据类型之间的转换。 详细内容参考GNU GCC Manual<br
/> <a
href="#resources">[5]</a> Extensions to the C Language Family4#4Built-in Functions4#4 X86 Built-in Functions一节。</p><p>一些MMX指令有其相应的built-in操作， 下面一段代码为例：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#include </span><span
class="hl-quotes">&lt;</span><span
class="hl-string">stdio.h</span><span
class="hl-quotes">&gt;</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/*无需特别的头文件，built-in嘛*/</span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/* gcc -Wall&nbsp; -o bins&nbsp; builtinmmx.c*/</span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/*定义了一个vector数据类型，hi表示16位，4表示4个*/</span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/*typedef int v4hi __attribute__ ((mode(V4HI)));*/</span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/*新版的gcc认为这么定义更好，vector_size(8)表示8byte长度的vector，short表示按照short方式存储*/</span><span
class="hl-code"><br
/></span><span
class="hl-types">typedef</span><span
class="hl-code"> </span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">v4hi</span><span
class="hl-code"> </span><span
class="hl-identifier">__attribute__</span><span
class="hl-code"> </span><span
class="hl-brackets">((</span><span
class="hl-identifier">vector_size</span><span
class="hl-brackets">(</span><span
class="hl-number">8</span><span
class="hl-brackets">)))</span><span
class="hl-code">;<br
/></span><span
class="hl-mlcomment">/*定义了2个32位的vector类型，si表示32位*/</span><span
class="hl-code"><br
/></span><span
class="hl-types">typedef</span><span
class="hl-code"> </span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">v2si</span><span
class="hl-code"> </span><span
class="hl-identifier">__attribute__</span><span
class="hl-code"> </span><span
class="hl-brackets">((</span><span
class="hl-identifier">mode</span><span
class="hl-brackets">(</span><span
class="hl-identifier">V2SI</span><span
class="hl-brackets">)))</span><span
class="hl-code">;<br
/>&nbsp;<br
/></span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">main</span><span
class="hl-brackets">(</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">argc</span><span
class="hl-code">,</span><span
class="hl-types">char</span><span
class="hl-code"> *</span><span
class="hl-identifier">argv</span><span
class="hl-brackets">[])</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"> <br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">pa</span><span
class="hl-brackets">[</span><span
class="hl-number">4</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-brackets">{</span><span
class="hl-number">0x8000</span><span
class="hl-code">, </span><span
class="hl-number">0x8000</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, -</span><span
class="hl-number">1</span><span
class="hl-brackets">}</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">pb</span><span
class="hl-brackets">[</span><span
class="hl-number">4</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-brackets">{</span><span
class="hl-number">0x8000</span><span
class="hl-code">, </span><span
class="hl-number">0x7FFF</span><span
class="hl-code">, -</span><span
class="hl-number">1</span><span
class="hl-code">, -</span><span
class="hl-number">2</span><span
class="hl-brackets">}</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">v4hi</span><span
class="hl-code"> </span><span
class="hl-identifier">va</span><span
class="hl-code">, </span><span
class="hl-identifier">vb</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">v4hi</span><span
class="hl-code"> </span><span
class="hl-identifier">vsum</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">va</span><span
class="hl-code"> = </span><span
class="hl-brackets">((</span><span
class="hl-identifier">v4hi</span><span
class="hl-code">*</span><span
class="hl-brackets">)</span><span
class="hl-identifier">pa</span><span
class="hl-brackets">)[</span><span
class="hl-number">0</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">vb</span><span
class="hl-code"> = </span><span
class="hl-brackets">((</span><span
class="hl-identifier">v4hi</span><span
class="hl-code">*</span><span
class="hl-brackets">)</span><span
class="hl-identifier">pb</span><span
class="hl-brackets">)[</span><span
class="hl-number">0</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/* 4个16位进行饱和加 */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-comment">//vsum = __builtin_ia32_paddsw(va, vb);</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/* 4个16位还可以直接进行加法，但不同于两个long long相加 */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">vsum</span><span
class="hl-code"> =&nbsp; </span><span
class="hl-identifier">va</span><span
class="hl-code"> + </span><span
class="hl-identifier">vb</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*vector的输出还需要强制转换为long long*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">printf</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">...with MMX instructions...to compute vec_add: %llx </span><span
class="hl-special">\</span><span
class="hl-string">n</span><span
class="hl-quotes">&quot;</span><span
class="hl-code">, </span><span
class="hl-brackets">(</span><span
class="hl-types">long</span><span
class="hl-code"> </span><span
class="hl-types">long</span><span
class="hl-brackets">)</span><span
class="hl-identifier">vsum</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-comment">//结果1：0xfffd0000ffff8000</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-comment">//结果2：0xfffd0000ffff0000</span><span
class="hl-code"><br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">return</span><span
class="hl-code"> </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/></span><span
class="hl-brackets">}</span></div></div></div><p>几点说明：</p><ul><li>是的，这里built-in vector及其操作，随着GCC的发展正在加强。如果需要使用以上范例，应使用GCC 3.4以上版本；</li><li>使用builtin函数时，与intrinsic相似；但本质却是不同，这里两个向量使用‘+’操作就说明了vector也如其它数据类型一样，编译器直接支持，只不过这里的加法就是指四个单元数分别相加，低位单元的进位不会影响相邻高位单元的数据；</li><li>vector还可以强制转换为通用数据。</li></ul><h3>嵌入汇编 &#8211; Inline asm</h3><p>GCC一开始就允许C代码中嵌入asm指令，并不只是针对MMX指令， 不过对于MMX技术，显然也是一个很好的利用方法， 详细的语法请参考GNU GCC手册<a
href="#resources">[5]</a>， 或者GCC: The Complete Reference<a
href="#resources">[6]</a>&#8221;Inline Assembly&#8221;一节。<br
/> 如下是一个点积的例子：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#include </span><span
class="hl-quotes">&lt;</span><span
class="hl-string">stdio.h</span><span
class="hl-quotes">&gt;</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-mlcomment">/** GCC -o ins&nbsp; inlinemmx.c **/</span><span
class="hl-code"><br
/></span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">main</span><span
class="hl-brackets">(</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">argc</span><span
class="hl-code">,</span><span
class="hl-types">char</span><span
class="hl-code"> *</span><span
class="hl-identifier">argv</span><span
class="hl-brackets">[])</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"> <br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">i</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">result</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">a</span><span
class="hl-brackets">[]</span><span
class="hl-code"> = </span><span
class="hl-brackets">{</span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">2</span><span
class="hl-code">, </span><span
class="hl-number">3</span><span
class="hl-code">, </span><span
class="hl-number">4</span><span
class="hl-code">, </span><span
class="hl-number">5</span><span
class="hl-code">, </span><span
class="hl-number">6</span><span
class="hl-code">, </span><span
class="hl-number">7</span><span
class="hl-code">, </span><span
class="hl-number">8</span><span
class="hl-brackets">}</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">short</span><span
class="hl-code"> </span><span
class="hl-identifier">b</span><span
class="hl-brackets">[]</span><span
class="hl-code"> = </span><span
class="hl-brackets">{</span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-code">, </span><span
class="hl-number">1</span><span
class="hl-brackets">}</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">printf</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">...with MMX instructions...</span><span
class="hl-special">\</span><span
class="hl-string">n</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*首先，将点积合累积寄存器清零，实际缺省就为0？*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">pandn %%mm5,%%mm5;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code">::</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*读入a, b，每四对数相乘后分两组相加，形成两组和*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*这里的循环控制是C在做*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">i</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> &lt; </span><span
class="hl-reserved">sizeof</span><span
class="hl-brackets">(</span><span
class="hl-identifier">a</span><span
class="hl-brackets">)</span><span
class="hl-code">/</span><span
class="hl-reserved">sizeof</span><span
class="hl-brackets">(</span><span
class="hl-types">short</span><span
class="hl-brackets">)</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> += </span><span
class="hl-number">4</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movq %0,%%mm0;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; movq %1,%%mm1;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pmaddwd %%mm1,%%mm0;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; paddd %%mm0,%%mm5; #相乘后相加 </span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; :<br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; : </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">])</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">b</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">]))</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-brackets">}</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*将两组和分离，并相加*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movq %%mm5, %%mm0;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; psrlq $32,%%mm5;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; paddd %%mm0, %%mm5;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; movd %%mm5,%0;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; emms</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; :</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=r</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">result</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; :</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">printf</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">result: 0x%x</span><span
class="hl-special">\</span><span
class="hl-string">n</span><span
class="hl-quotes">&quot;</span><span
class="hl-code">, </span><span
class="hl-identifier">result</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-comment">//这里结果为0x24</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">return</span><span
class="hl-code"> </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/></span><span
class="hl-brackets">}</span></div></div></div><p>几点说明：</p><ul><li>这里是典型的在函数中C和汇编混合编程；</li><li>注意汇编指令中操作数的顺序；</li><li>这里可以直接使用movq等没有intrinsics/built-in对应的指令；</li><li>注意在asm指令序列中间不要加杂注释，可能导致生成的代码不正确。</li></ul><h3>MMX实用一例：合成滤波器 &#8211; Synthesis Filter in X86 SIMD INSTRUCTIONS</h3><p>下面是合成滤波器(Synthesis Filter)的一个优化过程， 合成滤波器在语音编解码中有广泛应用， 运行时也占用了整个算法中较高比例的时间。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">i</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> &lt; </span><span
class="hl-identifier">lg</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code">++</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mult</span><span
class="hl-brackets">(</span><span
class="hl-identifier">x</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-number">0</span><span
class="hl-brackets">])</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/*L_mult是相乘后左移*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">j</span><span
class="hl-code"> = </span><span
class="hl-number">1</span><span
class="hl-code">; </span><span
class="hl-identifier">j</span><span
class="hl-code"> &lt;= </span><span
class="hl-identifier">M</span><span
class="hl-code">; </span><span
class="hl-identifier">j</span><span
class="hl-code">++</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"> </span><span
class="hl-mlcomment">/*M这里固定为10*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_msu</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-identifier">j</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-identifier">j</span><span
class="hl-brackets">])</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/*L_msu是乘减后左移操作*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-brackets">}</span><span
class="hl-code"><br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_shl</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-number">3</span><span
class="hl-brackets">)</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/*左移三位*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;*</span><span
class="hl-identifier">yy</span><span
class="hl-code">++ = </span><span
class="hl-identifier">g729round</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/></span><span
class="hl-brackets">}</span></div></div></div><p>上面的代码，因为内存循环为10，可以考虑展开，并统一操作为乘加指令。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-mlcomment">/*为了使用乘加操作，需要调整10个系数的顺序*/</span><span
class="hl-code"><br
/></span><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">i</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> &lt; </span><span
class="hl-identifier">M</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code">++</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code"> = -</span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-identifier">M</span><span
class="hl-code"> - </span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;<br
/></span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">11</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/></span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">10</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-number">0</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;<br
/></span><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">i</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> &lt; </span><span
class="hl-identifier">lg</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code">++</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;*</span><span
class="hl-identifier">yy</span><span
class="hl-code"> = </span><span
class="hl-identifier">x</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-number">1</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">11</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-number">1</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">10</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-number">0</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">9</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">1</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">8</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">2</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">7</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">3</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">6</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">4</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">5</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">5</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">4</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">6</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">3</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">7</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">2</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">8</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">1</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">9</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_mac</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">0</span><span
class="hl-brackets">]</span><span
class="hl-code">, </span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">10</span><span
class="hl-brackets">])</span><span
class="hl-code">;<br
/>&nbsp;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_shl</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-number">3</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;*</span><span
class="hl-identifier">yy</span><span
class="hl-code">++ = </span><span
class="hl-identifier">g729round</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/></span><span
class="hl-brackets">}</span></div></div></div><p>以上循环内核正好可以将MMX的8个寄存器全部利用。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-mlcomment">/*为了使用乘加操作，需要调整10个系数的顺序*/</span><span
class="hl-code"><br
/></span><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">i</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> &lt; </span><span
class="hl-identifier">M</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code">++</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code"> = -</span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-identifier">M</span><span
class="hl-code"> - </span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;<br
/></span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">11</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/></span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">10</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-identifier">a</span><span
class="hl-brackets">[</span><span
class="hl-number">0</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;<br
/></span><span
class="hl-mlcomment">/*11个系数分别放入3个MMX寄存器，0作填充*/</span><span
class="hl-code"><br
/></span><span
class="hl-reserved">asm</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movq %0,%%mm0;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; movq %1,%%mm1;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; movq %2,%%mm2</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; :<br
/>&nbsp;&nbsp; &nbsp; : </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">0</span><span
class="hl-brackets">])</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">4</span><span
class="hl-brackets">])</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">ta</span><span
class="hl-brackets">[</span><span
class="hl-number">8</span><span
class="hl-brackets">]))</span><span
class="hl-code">;<br
/>&nbsp;<br
/></span><span
class="hl-mlcomment">/*利用MMX技术进行滤波器核心操作*/</span><span
class="hl-code"><br
/></span><span
class="hl-reserved">for</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">i</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code"> &lt; </span><span
class="hl-identifier">lg</span><span
class="hl-code">; </span><span
class="hl-identifier">i</span><span
class="hl-code">++</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;*</span><span
class="hl-identifier">yy</span><span
class="hl-code"> = </span><span
class="hl-identifier">x</span><span
class="hl-brackets">[</span><span
class="hl-identifier">i</span><span
class="hl-brackets">]</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-number">1</span><span
class="hl-brackets">]</span><span
class="hl-code"> = </span><span
class="hl-number">0</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">pandn %%mm6,%%mm6;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; movq %1,%%mm3;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; movq %2,%%mm4;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; movq %3,%%mm5;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; pmaddwd %%mm0,%%mm3;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; pmaddwd %%mm1,%%mm4;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; pmaddwd %%mm2,%%mm5;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; paddd %%mm3, %%mm6;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; paddd %%mm4, %%mm6;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; paddd %%mm5, %%mm6;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; movq&nbsp; %%mm6, %%mm7;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; psrlq $32, %%mm6;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; paddd %%mm7, %%mm6;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; movd %%mm6,%0;</span><span
class="hl-special">\</span><span
class="hl-string"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; emms</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; :<br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; :</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-brackets">)</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">10</span><span
class="hl-brackets">])</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">6</span><span
class="hl-brackets">])</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">yy</span><span
class="hl-brackets">[</span><span
class="hl-code">-</span><span
class="hl-number">2</span><span
class="hl-brackets">]))</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-mlcomment">/*因为指令结果饱和属性的限制，s还没有左移，所以下面多做一位饱和左移*/</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">s</span><span
class="hl-code"> = </span><span
class="hl-identifier">L_shl</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-code">, </span><span
class="hl-number">4</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;*</span><span
class="hl-identifier">yy</span><span
class="hl-code">++ = </span><span
class="hl-identifier">g729round</span><span
class="hl-brackets">(</span><span
class="hl-identifier">s</span><span
class="hl-brackets">)</span><span
class="hl-code">;<br
/></span><span
class="hl-brackets">}</span></div></div></div><p>几点说明：</p><ul><li>注意：以上嵌入的汇编代码输出结果s放在了输入处，属于实践中的个案；</li><li>MMX没有乘左移之类的DSP指令，甚至还没有加饱和之类的操作，SSE中有一定增强；</li><li>以上操作，理论上存在溢出可能，所以最后使用原有的饱和左移操作，减少了一定风险；</li><li>上面的部分代码操作显然允许并行，这在VLIW系统中十分有用；</li><li>这已经形成了该滤波器全面优化的核心。</li></ul><h3>总结 &#8211; Conclusion</h3><p>如果愿意尽多地利用SIMD技术，可能需要更多地使用汇编级的编码， 不过也有一些高级语言和汇编的混合编程技术能够帮助你， 它们有的提高性能更大一些， 有的形式上更优雅些，本质上效率也不错， 都不失好的方法，建议尝试。</p><p>正是如此，一方面CPU上支持越来越多的SIMD指令集扩展， 另一方面GCC也正在加紧支持这些扩展的易用，对，正在， 碰到一些问题，先想办法绕过去， 这里使用GCC 3.4.1，根据经验效果还是不错的。</p><h3>关于文档</h3><h3>GCC中SIMD指令的应用方法</h3><p>This document was generated using the LaTeX2HTML translator Version 2002 (1.62)</p><p>Copyright ® 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.<br
/> Copyright ®, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.</p><p>The command line arguments were:<br
/> latex2html -iso_language CN -html_version 4.0,unicode -address &#8216;®2004 CoreUp Designs&#8217; -local_icons -split 0 -nonavigation gccsimd</p><p>The translation was initiated by on 2004-12-13</p><h3>参考资料</h3><ol><li>Intel: IA-32 Intel Architechture Software Developer&#8217;s Manual, Volume 1: Basic Architecture(2002)</li><li>Intel: IA-32 Intel Architechture Software Developer&#8217;s Manual, Volume 2: Instruction Set Reference(2003)</li><li>Intel: IA-32 Intel Architechture Software Developer&#8217;s Manual, Volume 3: System Programming Guide(2003)</li><li>XviD.org，http://www.xvid.org/(up-to-date)</li><li>GNU, GCC online documentation, http://www.gnu.org/software/GCC/onlinedocs/(up-to-date)</li><li>Authur Griffith, GCC: The Complete Referencea, McGraw Hill(2002)</li></ol> ]]></content:encoded> <wfw:commentRss>http://gccfeli.cn/2009/04/gcc-simd.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>GCC嵌入汇编概述</title><link>http://gccfeli.cn/2009/03/gcc-embed-asm.html</link> <comments>http://gccfeli.cn/2009/03/gcc-embed-asm.html#comments</comments> <pubDate>Thu, 19 Mar 2009 01:42:23 +0000</pubDate> <dc:creator>Felicia</dc:creator> <category><![CDATA[精华]]></category> <category><![CDATA[编译原理]]></category> <category><![CDATA[转载]]></category> <category><![CDATA[GCC]]></category> <category><![CDATA[汇编]]></category><guid
isPermaLink="false">http://www.gccfeli.cn/?p=321</guid> <description><![CDATA[<p>如果您是 Linux 内核的开发人员，您会发现自己经常要对与体系结构高度相关的功能进行编码或优化代码路径。您很可能是通过将汇编语言指令插入到 C 语句的中间（又称为内联汇编的一种方法）来执行这些任务的。让我们看一下 Linux 中内联汇编的特定用法。（注：内联汇编即嵌入汇编）<br
/></p><h3>GNU 汇编程序简述</h3> <span
class="readmore"><a
href="http://gccfeli.cn/2009/03/gcc-embed-asm.html" title="GCC嵌入汇编概述">阅读全文（7231字）</a></span>]]></description> <content:encoded><![CDATA[<p>如果您是 Linux 内核的开发人员，您会发现自己经常要对与体系结构高度相关的功能进行编码或优化代码路径。您很可能是通过将汇编语言指令插入到 C 语句的中间（又称为内联汇编的一种方法）来执行这些任务的。让我们看一下 Linux 中内联汇编的特定用法。（注：内联汇编即嵌入汇编）<br
/> <span
id="more-321"></span></p><h3>GNU 汇编程序简述</h3><p>让我们首先看一下 Linux 中使用的基本汇编程序语法。GCC（用于 Linux 的 GNU C 编译器）使用 AT&#038;T 汇编语法。下面列出了这种语法的一些基本规则。（该列表肯定不完整；只包括了与内联汇编相关的那些规则。）</p><h4>寄存器命名</h4><p>寄存器名称有 % 前缀。即，如果必须使用 eax，它应该用作 %eax。</p><h4>源操作数和目的操作数的顺序</h4><p>在所有指令中，先是源操作数，然后才是目的操作数。这与将源操作数放在目的操作数之后的 Intel 语法不同。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">mov</span><span
class="hl-code"> %</span><span
class="hl-identifier">eax</span><span
class="hl-code">, %</span><span
class="hl-identifier">ebx</span><span
class="hl-code">, </span><span
class="hl-identifier">transfers</span><span
class="hl-code"> </span><span
class="hl-identifier">the</span><span
class="hl-code"> </span><span
class="hl-identifier">contents</span><span
class="hl-code"> </span><span
class="hl-identifier">of</span><span
class="hl-code"> </span><span
class="hl-identifier">eax</span><span
class="hl-code"> </span><span
class="hl-identifier">to</span><span
class="hl-code"> </span><span
class="hl-identifier">ebx</span><span
class="hl-code">.</span></div></div></div><h4>操作数大小</h4><p>根据操作数是字节 (byte)、字 (word) 还是长型 (long)，指令的后缀可以是 b、w 或 l。这并不是强制性的；GCC 会尝试通过读取操作数来提供相应的后缀。但手工指定后缀可以改善代码的可读性，并可以消除编译器猜测不正确的可能性。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">movb</span><span
class="hl-code"> %</span><span
class="hl-identifier">al</span><span
class="hl-code">, %</span><span
class="hl-identifier">bl</span><span
class="hl-code"> -- </span><span
class="hl-identifier">Byte</span><span
class="hl-code"> </span><span
class="hl-identifier">move</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movw</span><span
class="hl-code"> %</span><span
class="hl-identifier">ax</span><span
class="hl-code">, %</span><span
class="hl-identifier">bx</span><span
class="hl-code"> -- </span><span
class="hl-identifier">Word</span><span
class="hl-code"> </span><span
class="hl-identifier">move</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">eax</span><span
class="hl-code">, %</span><span
class="hl-identifier">ebx</span><span
class="hl-code"> -- </span><span
class="hl-identifier">Longword</span><span
class="hl-code"> </span><span
class="hl-identifier">move</span></div></div></div><h4>立即操作数</h4><p>通过使用 $ 指定直接操作数。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">movl</span><span
class="hl-code"> $</span><span
class="hl-number">0</span><span
class="hl-identifier">xffff</span><span
class="hl-code">, %</span><span
class="hl-identifier">eax</span><span
class="hl-code"> -- </span><span
class="hl-identifier">will</span><span
class="hl-code"> </span><span
class="hl-identifier">move</span><span
class="hl-code"> </span><span
class="hl-identifier">the</span><span
class="hl-code"> </span><span
class="hl-identifier">value</span><span
class="hl-code"> </span><span
class="hl-identifier">of</span><span
class="hl-code"> </span><span
class="hl-number">0</span><span
class="hl-identifier">xffff</span><span
class="hl-code"> </span><span
class="hl-identifier">into</span><span
class="hl-code"> </span><span
class="hl-identifier">eax</span><span
class="hl-code"> </span><span
class="hl-types">register</span><span
class="hl-code">.</span></div></div></div><h4>间接内存引用</h4><p>任何对内存的间接引用都是通过使用 ( ) 来完成的。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">movb</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">esi</span><span
class="hl-brackets">)</span><span
class="hl-code">, %</span><span
class="hl-identifier">al</span><span
class="hl-code"> -- </span><span
class="hl-identifier">will</span><span
class="hl-code"> </span><span
class="hl-identifier">transfer</span><span
class="hl-code"> </span><span
class="hl-identifier">the</span><span
class="hl-code"> </span><span
class="hl-identifier">byte</span><span
class="hl-code"> </span><span
class="hl-identifier">in</span><span
class="hl-code"> </span><span
class="hl-identifier">the</span><span
class="hl-code"> </span><span
class="hl-identifier">memory</span><span
class="hl-code">&nbsp; </span><span
class="hl-identifier">pointed</span><span
class="hl-code"> </span><span
class="hl-identifier">by</span><span
class="hl-code"> </span><span
class="hl-identifier">esi</span><span
class="hl-code"> </span><span
class="hl-identifier">into</span><span
class="hl-code"> </span><span
class="hl-identifier">al</span><span
class="hl-code"> </span><span
class="hl-types">register</span></div></div></div><h3>内联汇编</h3><p>GCC 为内联汇编提供特殊结构，它具有以下格式：</p><h4>GCC 的 &#8220;asm&#8221; 结构</h4><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-code"> </span><span
class="hl-identifier">assembler</span><span
class="hl-code"> </span><span
class="hl-types">template</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; : </span><span
class="hl-identifier">output</span><span
class="hl-code"> </span><span
class="hl-identifier">operands</span><span
class="hl-code">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">optional</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; : </span><span
class="hl-identifier">input</span><span
class="hl-code"> </span><span
class="hl-identifier">operands</span><span
class="hl-code">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </span><span
class="hl-brackets">(</span><span
class="hl-identifier">optional</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; : </span><span
class="hl-identifier">list</span><span
class="hl-code"> </span><span
class="hl-identifier">of</span><span
class="hl-code"> </span><span
class="hl-identifier">clobbered</span><span
class="hl-code"> </span><span
class="hl-identifier">registers</span><span
class="hl-code">&nbsp; &nbsp;<br
/>&nbsp;&nbsp; &nbsp; &nbsp; </span><span
class="hl-brackets">(</span><span
class="hl-identifier">optional</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/></span><span
class="hl-brackets">)</span><span
class="hl-code">;</span></div></div></div><p>本例中，汇编程序模板由汇编指令组成。输入操作数是充当指令输入操作数使用的 C 表达式。输出操作数是将对其执行汇编指令输出的 C 表达式。</p><p>内联汇编的重要性体现在它能够灵活操作，而且可以使其输出通过 C 变量显示出来。因为它具有这种能力，所以 &#8220;asm&#8221; 可以用作汇编指令和包含它的 C 程序之间的接口。</p><p>一个非常基本但很重要的区别在于 简单内联汇编只包括指令，而 扩展内联汇编包括操作数。要说明这一点，考虑以下示例：</p><h4>内联汇编的基本要素</h4><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">a</span><span
class="hl-code">=</span><span
class="hl-number">10</span><span
class="hl-code">, </span><span
class="hl-identifier">b</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %1, %%eax;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %%eax, %0;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">b</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* output */</span><span
class="hl-code">&nbsp; &nbsp; <br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">a</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; &nbsp;</span><span
class="hl-mlcomment">/* input */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">%eax</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">)</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/* clobbered register */</span><span
class="hl-code"><br
/></span><span
class="hl-brackets">}</span></div></div></div><p>在上例中，我们使用汇编指令使 &#8220;b&#8221; 的值等于 &#8220;a&#8221;。请注意以下几点：</p><ul><li>&#8220;b&#8221; 是输出操作数，由 %0 引用，&#8221;a&#8221; 是输入操作数，由 %1 引用。</li><li>&#8220;r&#8221; 是操作数的约束，它指定将变量 &#8220;a&#8221; 和 &#8220;b&#8221; 存储在寄存器中。请注意，输出操作数约束应该带有一个约束修饰符 &#8220;=&#8221;，指定它是输出操作数。</li><li>要在 &#8220;asm&#8221; 内使用寄存器 %eax，%eax 的前面应该再加一个 %，换句话说就是 %%eax，因为 &#8220;asm&#8221; 使用 %0、%1 等来标识变量。任何带有一个 % 的数都看作是输入／输出操作数，而不认为是寄存器。</li><li>第三个冒号后的修饰寄存器 %eax 告诉将在 &#8220;asm&#8221; 中修改 GCC %eax 的值，这样 GCC 就不使用该寄存器存储任何其它的值。</li><li>movl %1, %%eax 将 &#8220;a&#8221; 的值移到 %eax 中， movl %%eax, %0 将 %eax 的内容移到 &#8220;b&#8221; 中。</li><li>因为 &#8220;b&#8221; 被指定成输出操作数，因此当 &#8220;asm&#8221; 的执行完成后，它将反映出更新的值。换句话说，对 &#8220;asm&#8221; 内 &#8220;b&#8221; 所做的更改将在 &#8220;asm&#8221; 外反映出来。</li></ul><p>现在让我们更详细的了解每一项的含义。</p><h3>汇编程序模板</h3><p>汇编程序模板是一组插入到 C 程序中的汇编指令（可以是单个指令，也可以是一组指令）。每条指令都应该由双引号括起，或者整组指令应该由双引号括起。每条指令还应该用一个定界符结尾。有效的定界符为新行 (n) 和分号 (;)。 &#8216;n&#8217; 后可以跟一个 tab(t) 作为格式化符号，增加 GCC 在汇编文件中生成的指令的可读性。 指令通过数 %0、%1 等来引用 C 表达式（指定为操作数）。</p><p>如果希望确保编译器不会在 &#8220;asm&#8221; 内部优化指令，可以在 &#8220;asm&#8221; 后使用关键字 &#8220;volatile&#8221;。如果程序必须与 ANSI C 兼容，则应该使用 __asm__ 和 __volatile__，而不是 asm 和 volatile。</p><h3>操作数</h3><p>C 表达式用作 &#8220;asm&#8221; 内的汇编指令操作数。在汇编指令通过对 C 程序的 C 表达式进行操作来执行有意义的作业的情况下，操作数是内联汇编的主要特性。</p><p>每个操作数都由操作数约束字符串指定，后面跟用括弧括起的 C 表达式，例如：&#8221;constraint&#8221; (C expression)。操作数约束的主要功能是确定操作数的寻址方式。</p><p>可以在输入和输出部分中同时使用多个操作数。每个操作数由逗号分隔开。</p><p>在汇编程序模板内部，操作数由数字引用。如果总共有 n 个操作数（包括输入和输出），那么第一个输出操作数的编号为 0，逐项递增，最后那个输入操作数的编号为 n -1。总操作数的数目限制在 10，如果机器描述中任何指令模式中的最大操作数数目大于 10，则使用后者作为限制。</p><h3>修饰寄存器列表</h3><p>如果 &#8220;asm&#8221; 中的指令指的是硬件寄存器，可以告诉 GCC 我们将自己使用和修改它们。这样，GCC 就不会假设它装入到这些寄存器中的值是有效值。通常不需要将输入和输出寄存器列为 clobbered，因为 GCC 知道 &#8220;asm&#8221; 使用它们（因为它们被明确指定为约束）。不过，如果指令使用任何其它的寄存器，无论是明确的还是隐含的（寄存器不在输入约束列表中出现，也不在输出约束列表中出现），寄存器都必须被指定为修饰列表。修饰寄存器列在第三个冒号之后，其名称被指定为字符串。</p><p>至于关键字，如果指令以某些不可预知且不明确的方式修改了内存，则可能将 &#8220;memory&#8221; 关键字添加到修饰寄存器列表中。这样就告诉 GCC 不要在不同指令之间将内存值高速缓存在寄存器中。</p><h3>操作数约束</h3><p>前面提到过，&#8221;asm&#8221; 中的每个操作数都应该由操作数约束字符串描述，后面跟用括弧括起的 C 表达式。操作数约束主要是确定指令中操作数的寻址方式。约束也可以指定：</p><ul><li>是否允许操作数位于寄存器中，以及它可以包括在哪些种类的寄存器中</li><li>操作数是否可以是内存引用，以及在这种情况下使用哪些种类的地址</li><li>操作数是否可以是立即数</li></ul><p>约束还要求两个操作数匹配。</p><h3>常用约束</h3><p>在可用的操作数约束中，只有一小部分是常用的；下面列出了这些约束以及简要描述。有关操作数约束的完整列表，请参考 GCC 和 GAS 手册。</p><h4>寄存器操作数约束 (r)</h4><p>使用这种约束指定操作数时，它们存储在通用寄存器中。请看下例：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %%cr3, %0n</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> :</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">cr3val</span><span
class="hl-brackets">))</span><span
class="hl-code">;</span></div></div></div><p>这里，变量 cr3val 保存在寄存器中，%cr3 的值复制到寄存器上，cr3val 的值从该寄存器更新到内存中。指定 &#8220;r&#8221; 约束时，GCC 可以将变量 cr3val 保存在任何可用的 GPR 中。要指定寄存器，必须通过使用特定的寄存器约束直接指定寄存器名。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">a</span><span
class="hl-code">&nbsp; &nbsp;%</span><span
class="hl-identifier">eax</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">b</span><span
class="hl-code">&nbsp; &nbsp;%</span><span
class="hl-identifier">ebx</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">c</span><span
class="hl-code">&nbsp; &nbsp;%</span><span
class="hl-identifier">ecx</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">d</span><span
class="hl-code">&nbsp; &nbsp;%</span><span
class="hl-identifier">edx</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">S</span><span
class="hl-code">&nbsp; &nbsp;%</span><span
class="hl-identifier">esi</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">D</span><span
class="hl-code">&nbsp; &nbsp;%</span><span
class="hl-identifier">edi</span></div></div></div><h4>内存操作数约束 (m)</h4><p>当操作数位于内存中时，任何对它们执行的操作都将在内存位置中直接发生，这与寄存器约束正好相反，后者先将值存储在要修改的寄存器中，然后将它写回内存位置中。但寄存器约束通常只在对于指令来说它们是绝对必需的，或者它们可以大大提高进程速度时使用。当需要在 &#8220;asm&#8221; 内部更新 C 变量，而您又确实不希望使用寄存器来保存其值时，使用内存约束最为有效。例如，idtr 的值存储在内存位置 loc 中：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">sidt %0n</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> : :</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">loc</span><span
class="hl-brackets">))</span><span
class="hl-code">;</span></div></div></div><h4>匹配（数字）约束</h4><p>在某些情况下，一个变量既要充当输入操作数，也要充当输出操作数。可以通过使用匹配约束在 &#8220;asm&#8221; 中指定这种情况。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">incl %0</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> :</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=a</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">var</span><span
class="hl-brackets">)</span><span
class="hl-code">:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">0</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">var</span><span
class="hl-brackets">))</span><span
class="hl-code">;</span></div></div></div><p>在匹配约束的示例中，寄存器 %eax 既用作输入变量，也用作输出变量。将 var 输入读取到 %eax，增加后将更新的 %eax 再次存储在 var 中。这里的 &#8220;0&#8243; 指定第 0 个输出变量相同的约束。即，它指定 var 的输出实例只应该存储在 %eax 中。该约束可以用于以下情况：</p><ul><li>输入从变量中读取，或者变量被修改后，修改写回到同一变量中</li><li>不需要将输入操作数和输出操作数的实例分开</li></ul><p>使用匹配约束最重要的意义在于它们可以导致有效地使用可用寄存器。</p><h3>一般内联汇编用法示例</h3><p>以下示例通过各种不同的操作数约束说明了用法。有如此多的约束以至于无法将它们一一列出，这里只列出了最经常使用的那些约束类型。</p><h4>&#8220;asm&#8221; 和寄存器约束 &#8220;r&#8221;</h4><p>让我们先看一下使用寄存器约束 r 的 &#8220;asm&#8221;。我们的示例显示了 GCC 如何分配寄存器，以及它如何更新输出变量的值。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">main</span><span
class="hl-brackets">(</span><span
class="hl-types">void</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">x</span><span
class="hl-code"> = </span><span
class="hl-number">10</span><span
class="hl-code">, </span><span
class="hl-identifier">y</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %1, %%eax;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %%eax, %0;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">y</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* y is output operand */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">x</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; &nbsp;</span><span
class="hl-mlcomment">/* x is input operand */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">%eax</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">)</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/* %eax is clobbered register */</span><span
class="hl-code"><br
/></span><span
class="hl-brackets">}</span></div></div></div><p>在该例中，x 的值复制为 &#8220;asm&#8221; 中的 y。x 和 y 都通过存储在寄存器中传递给 &#8220;asm&#8221;。为该例生成的汇编代码如下：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">main</span><span
class="hl-code">:<br
/></span><span
class="hl-identifier">pushl</span><span
class="hl-code"> %</span><span
class="hl-identifier">ebp</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">esp</span><span
class="hl-code">,%</span><span
class="hl-identifier">ebp</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">subl</span><span
class="hl-code"> $</span><span
class="hl-number">8</span><span
class="hl-code">,%</span><span
class="hl-identifier">esp</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> $</span><span
class="hl-number">10</span><span
class="hl-code">,-</span><span
class="hl-number">4</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; &nbsp; <br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> -</span><span
class="hl-number">4</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">,%</span><span
class="hl-identifier">edx</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* x=10 is stored in %edx */</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#APP</span><span
class="hl-code">&nbsp; &nbsp; </span><span
class="hl-mlcomment">/* asm starts here */</span><span
class="hl-code">&nbsp; &nbsp;</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">edx</span><span
class="hl-code">, %</span><span
class="hl-identifier">eax</span><span
class="hl-code">&nbsp; &nbsp; &nbsp;</span><span
class="hl-mlcomment">/* x is moved to %eax */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">eax</span><span
class="hl-code">, %</span><span
class="hl-identifier">edx</span><span
class="hl-code">&nbsp; &nbsp; &nbsp;</span><span
class="hl-mlcomment">/* y is allocated in edx and updated */</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#NO</span><span
class="hl-identifier">_APP</span><span
class="hl-code"> </span><span
class="hl-mlcomment">/* asm ends here */</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">edx</span><span
class="hl-code">,-</span><span
class="hl-number">8</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* value of y in stack is updated with the value in %edx */</span></div></div></div><p>当使用 &#8220;r&#8221; 约束时，GCC 在这里可以自由分配任何寄存器。在我们的示例中，它选择 %edx 来存储 x。在读取了 %edx 中 x 的值后，它为 y 也分配了相同的寄存器。</p><p>因为 y 是在输出操作数部分中指定的，所以 %edx 中更新的值存储在 -8(%ebp)，堆栈上 y 的位置中。如果 y 是在输入部分中指定的，那么即使它在 y 的临时寄存器存储值 (%edx) 中被更新，堆栈上 y 的值也不会更新。</p><p>因为 %eax 是在修饰列表中指定的，GCC 不在任何其它地方使用它来存储数据。</p><p>输入 x 和输出 y 都分配在同一个 %edx 寄存器中，假设输入在输出产生之前被消耗。请注意，如果您有许多指令，就不是这种情况了。要确保输入和输出分配到不同的寄存器中，可以指定 &#038; 约束修饰符。下面是添加了约束修饰符的示例。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">main</span><span
class="hl-brackets">(</span><span
class="hl-types">void</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-brackets">{</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">int</span><span
class="hl-code"> </span><span
class="hl-identifier">x</span><span
class="hl-code"> = </span><span
class="hl-number">10</span><span
class="hl-code">, </span><span
class="hl-identifier">y</span><span
class="hl-code">;<br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %1, %%eax;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl %%eax, %0;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=&amp;r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">y</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-mlcomment">/* y is output operand, note the &amp; constraint modifier. */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">r</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">x</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; &nbsp;</span><span
class="hl-mlcomment">/* x is input operand */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; &nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">%eax</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">)</span><span
class="hl-code">; </span><span
class="hl-mlcomment">/* %eax is clobbered register */</span><span
class="hl-code"><br
/></span><span
class="hl-brackets">}</span></div></div></div><p>以下是为该示例生成的汇编代码，从中可以明显地看出 x 和 y 存储在 &#8220;asm&#8221; 中不同的寄存器中。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">main</span><span
class="hl-code">:<br
/></span><span
class="hl-identifier">pushl</span><span
class="hl-code"> %</span><span
class="hl-identifier">ebp</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">esp</span><span
class="hl-code">,%</span><span
class="hl-identifier">ebp</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">subl</span><span
class="hl-code"> $</span><span
class="hl-number">8</span><span
class="hl-code">,%</span><span
class="hl-identifier">esp</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> $</span><span
class="hl-number">10</span><span
class="hl-code">,-</span><span
class="hl-number">4</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> -</span><span
class="hl-number">4</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">,%</span><span
class="hl-identifier">ecx</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* x, the input is in %ecx */</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#APP</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">ecx</span><span
class="hl-code">, %</span><span
class="hl-identifier">eax</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">eax</span><span
class="hl-code">, %</span><span
class="hl-identifier">edx</span><span
class="hl-code">&nbsp; &nbsp; &nbsp;</span><span
class="hl-mlcomment">/* y, the output is in %edx */</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#NO</span><span
class="hl-identifier">_APP</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">edx</span><span
class="hl-code">,-</span><span
class="hl-number">8</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span></div></div></div><h3>特定寄存器约束的使用</h3><p>现在让我们看一下如何将个别寄存器作为操作数的约束指定。在下面的示例中，cpuid 指令采用 %eax 寄存器中的输入，然后在四个寄存器中给出输出：%eax、%ebx、%ecx、%edx。对 cpuid 的输入（变量 &#8220;op&#8221;）传递到 &#8220;asm&#8221; 的 eax 寄存器中，因为 cpuid 希望它这样做。在输出中使用 a、b、c 和 d 约束，分别收集四个寄存器中的值。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">cpuid</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=a</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">_eax</span><span
class="hl-brackets">)</span><span
class="hl-code">,<br
/>&nbsp;&nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=b</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">_ebx</span><span
class="hl-brackets">)</span><span
class="hl-code">,<br
/>&nbsp;&nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=c</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">_ecx</span><span
class="hl-brackets">)</span><span
class="hl-code">,<br
/>&nbsp;&nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=d</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">_edx</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">a</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">op</span><span
class="hl-brackets">))</span><span
class="hl-code">;</span></div></div></div><p>在下面可以看到为它生成的汇编代码（假设 _eax、_ebx 等&#8230; 变量都存储在堆栈上）：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">movl</span><span
class="hl-code"> -</span><span
class="hl-number">20</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">,%</span><span
class="hl-identifier">eax</span><span
class="hl-code"> </span><span
class="hl-mlcomment">/* store 'op' in %eax -- input */</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#APP</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">cpuid</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#NO</span><span
class="hl-identifier">_APP</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">eax</span><span
class="hl-code">,-</span><span
class="hl-number">4</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* store %eax in _eax -- output */</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">ebx</span><span
class="hl-code">,-</span><span
class="hl-number">8</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* store other registers in respective output variables */</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">ecx</span><span
class="hl-code">,-</span><span
class="hl-number">12</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code"> <br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">edx</span><span
class="hl-code">,-</span><span
class="hl-number">16</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span></div></div></div><p>strcpy 函数可以通过以下方式使用 &#8220;S&#8221; 和 &#8220;D&#8221; 约束来实现：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">cldn<br
/>&nbsp;&nbsp; &nbsp; &nbsp;repn<br
/>&nbsp;&nbsp; &nbsp; &nbsp;movsb</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;: </span><span
class="hl-mlcomment">/* no input */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">S</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">src</span><span
class="hl-brackets">)</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">D</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">dst</span><span
class="hl-brackets">)</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">c</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">count</span><span
class="hl-brackets">))</span><span
class="hl-code">;</span></div></div></div><p>通过使用 &#8220;S&#8221; 约束将源指针 src 放入 %esi 中，使用 &#8220;D&#8221; 约束将目的指针 dst 放入 %edi 中。因为 rep 前缀需要 count 值，所以将它放入 %ecx 中。</p><p>在下面可以看到另一个约束，它使用两个寄存器 %eax 和 %edx 将两个 32 位的值合并在一起，然后生成一个64 位的值：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#define</span><span
class="hl-code"> </span><span
class="hl-identifier">rdtscll</span><span
class="hl-brackets">(</span><span
class="hl-identifier">val</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-prepro"></span><span
class="hl-code"><br
/>&nbsp;</span><span
class="hl-identifier">__asm__</span><span
class="hl-code"> </span><span
class="hl-identifier">__volatile__</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">rdtsc</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> : </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=A</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">val</span><span
class="hl-brackets">))</span></div></div></div><p>The generated assembly looks like this (if val has a 64 bit memory space).</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#APP</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">rdtsc</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#NO</span><span
class="hl-identifier">_APP</span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">eax</span><span
class="hl-code">,-</span><span
class="hl-number">8</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code">&nbsp; </span><span
class="hl-mlcomment">/* As a result of A constraint %eax and %edx serve as outputs */</span><span
class="hl-code"><br
/></span><span
class="hl-identifier">movl</span><span
class="hl-code"> %</span><span
class="hl-identifier">edx</span><span
class="hl-code">,-</span><span
class="hl-number">4</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span></div></div></div><p>Note here that the values in %edx:%eax serve as 64 bit output.</p><h3>使用匹配约束</h3><p>在下面将看到系统调用的代码，它有四个参数：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#define</span><span
class="hl-code"> </span><span
class="hl-identifier">_syscall4</span><span
class="hl-brackets">(</span><span
class="hl-identifier">type</span><span
class="hl-code">,</span><span
class="hl-identifier">name</span><span
class="hl-code">,</span><span
class="hl-identifier">type1</span><span
class="hl-code">,</span><span
class="hl-identifier">arg1</span><span
class="hl-code">,</span><span
class="hl-identifier">type2</span><span
class="hl-code">,</span><span
class="hl-identifier">arg2</span><span
class="hl-code">,</span><span
class="hl-identifier">type3</span><span
class="hl-code">,</span><span
class="hl-identifier">arg3</span><span
class="hl-code">,</span><span
class="hl-identifier">type4</span><span
class="hl-code">,</span><span
class="hl-identifier">arg4</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-prepro"></span><span
class="hl-code"><br
/></span><span
class="hl-identifier">type</span><span
class="hl-code"> </span><span
class="hl-identifier">name</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">type1</span><span
class="hl-code"> </span><span
class="hl-identifier">arg1</span><span
class="hl-code">, </span><span
class="hl-identifier">type2</span><span
class="hl-code"> </span><span
class="hl-identifier">arg2</span><span
class="hl-code">, </span><span
class="hl-identifier">type3</span><span
class="hl-code"> </span><span
class="hl-identifier">arg3</span><span
class="hl-code">, </span><span
class="hl-identifier">type4</span><span
class="hl-code"> </span><span
class="hl-identifier">arg4</span><span
class="hl-brackets">)</span><span
class="hl-code"> <br
/></span><span
class="hl-brackets">{</span><span
class="hl-code"> <br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-types">long</span><span
class="hl-code"> </span><span
class="hl-identifier">__res</span><span
class="hl-code">; <br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">__asm__</span><span
class="hl-code"> </span><span
class="hl-types">volatile</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">int $0x80</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> <br
/>&nbsp;&nbsp; &nbsp;: </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=a</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">__res</span><span
class="hl-brackets">)</span><span
class="hl-code"> <br
/>&nbsp;&nbsp; &nbsp;: </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">0</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">__NR_</span><span
class="hl-code">##</span><span
class="hl-identifier">name</span><span
class="hl-brackets">)</span><span
class="hl-code">,</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">b</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">((</span><span
class="hl-types">long</span><span
class="hl-brackets">)(</span><span
class="hl-identifier">arg1</span><span
class="hl-brackets">))</span><span
class="hl-code">,</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">c</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">((</span><span
class="hl-types">long</span><span
class="hl-brackets">)(</span><span
class="hl-identifier">arg2</span><span
class="hl-brackets">))</span><span
class="hl-code">, <br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">d</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">((</span><span
class="hl-types">long</span><span
class="hl-brackets">)(</span><span
class="hl-identifier">arg3</span><span
class="hl-brackets">))</span><span
class="hl-code">,</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">S</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">((</span><span
class="hl-types">long</span><span
class="hl-brackets">)(</span><span
class="hl-identifier">arg4</span><span
class="hl-brackets">)))</span><span
class="hl-code">; <br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">__syscall_return</span><span
class="hl-brackets">(</span><span
class="hl-identifier">type</span><span
class="hl-code">,</span><span
class="hl-identifier">__res</span><span
class="hl-brackets">)</span><span
class="hl-code">; <br
/></span><span
class="hl-brackets">}</span></div></div></div><p>在上例中，通过使用 b、c、d 和 S 约束将系统调用的四个自变量放入 %ebx、%ecx、%edx 和 %esi 中。请注意，在输出中使用了 &#8220;=a&#8221; 约束，这样，位于 %eax 中的系统调用的返回值就被放入变量 __res 中。通过将匹配约束 &#8220;0&#8243; 用作输入部分中第一个操作数约束，syscall 号 __NR_##name 被放入 %eax 中，并用作对系统调用的输入。这样，这里的 %eax 既可以用作输入寄存器，又可以用作输出寄存器。没有其它寄存器用于这个目的。另请注意，输入（syscall 号）在产生输出（syscall 的返回值）之前被消耗（使用）。</p><h3>内存操作数约束的使用</h3><p>请考虑下面的原子递减操作：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-identifier">__asm__</span><span
class="hl-code"> </span><span
class="hl-identifier">__volatile__</span><span
class="hl-brackets">(</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">lock; decl %0</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">=m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">counter</span><span
class="hl-brackets">)</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">m</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-identifier">counter</span><span
class="hl-brackets">))</span><span
class="hl-code">;</span></div></div></div><p>为它生成的汇编类似于：</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-prepro">#APP</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">lock</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;</span><span
class="hl-identifier">decl</span><span
class="hl-code"> -</span><span
class="hl-number">24</span><span
class="hl-brackets">(</span><span
class="hl-code">%</span><span
class="hl-identifier">ebp</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-mlcomment">/* counter is modified on its memory location */</span><span
class="hl-code"><br
/></span><span
class="hl-prepro">#NO</span><span
class="hl-identifier">_APP</span><span
class="hl-code">.</span><span
class="hl-prepro"></span></div></div></div><p>您可能考虑在这里为 counter 使用寄存器约束。如果这样做，counter 的值必须先复制到寄存器，递减，然后对其内存更新。但这样您会无法理解锁定和原子性的全部意图，这些明确显示了使用内存约束的必要性。</p><h3>使用修饰寄存器</h3><p>请考虑内存拷贝的基本实现。</p><div
class="hl-wrapper"><div
class="hl-surround"><div
class="hl-main"><span
class="hl-reserved">asm</span><span
class="hl-code"> </span><span
class="hl-brackets">(</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">movl $count, %%ecx;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">up: lodsl;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">stosl;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp; </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">loop up;</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;</span><span
class="hl-mlcomment">/* no output */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">S</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">src</span><span
class="hl-brackets">)</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">D</span><span
class="hl-quotes">&quot;</span><span
class="hl-brackets">(</span><span
class="hl-identifier">dst</span><span
class="hl-brackets">)</span><span
class="hl-code"> </span><span
class="hl-mlcomment">/* input */</span><span
class="hl-code"><br
/>&nbsp;&nbsp; &nbsp;:</span><span
class="hl-quotes">&quot;</span><span
class="hl-string">%ecx</span><span
class="hl-quotes">&quot;</span><span
class="hl-code">, </span><span
class="hl-quotes">&quot;</span><span
class="hl-string">%eax</span><span
class="hl-quotes">&quot;</span><span
class="hl-code"> </span><span
class="hl-brackets">)</span><span
class="hl-code">;&nbsp; </span><span
class="hl-mlcomment">/* clobbered list */</span></div></div></div><p>当 lodsl 修改 %eax 时，lodsl 和 stosl 指令隐含地使用它。%ecx 寄存器明确装入 count。但 GCC 在我们通知它以前是不知道这些的，我们是通过将 %eax 和 %ecx 包括在修饰寄存器集中来通知 GCC 的。在完成这一步之前，GCC 假设 %eax 和 %ecx 是自由的，它可能决定将它们用作存储其它的数据。请注意，%esi 和 %edi 由 &#8220;asm&#8221; 使用，它们不在修饰列表中。这是因为已经声明 &#8220;asm&#8221; 将在输入操作数列表中使用它们。这里最低限度是，如果在 &#8220;asm&#8221; 内部使用寄存器（无论是明确还是隐含地），既不出现在输入操作数列表中，也不出现在输出操作数列表中，必须将它列为修饰寄存器。</p><h3>结束语</h3><p>总的来说，内联汇编非常巨大，它提供的许多特性我们甚至在这里根本没有涉及到。但如果掌握了本文描述的基本材料，您应该可以开始对自己的内联汇编进行编码了。</p><p>这篇文章是IBM网站上收集的，非本人所写。</p> ]]></content:encoded> <wfw:commentRss>http://gccfeli.cn/2009/03/gcc-embed-asm.html/feed</wfw:commentRss> <slash:comments>0</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk
Page Caching using disk (enhanced) (user agent is rejected)
Database Caching 11/15 queries in 0.004 seconds using disk

Served from: gccfeli.cn @ 2012-05-23 02:25:58 -->
