Acknowledgments |
|
xi | |
|
Part I What is Intel® Wireless MMX™ Technology? |
|
|
1 | (86) |
|
|
3 | (10) |
|
The Growing Need for Mobile Multimedia |
|
|
4 | (4) |
|
|
5 | (2) |
|
Building the Application Base |
|
|
7 | (1) |
|
Introducing Intel® Wireless MMX™ Technology |
|
|
8 | (1) |
|
|
9 | (1) |
|
|
9 | (4) |
|
Part 1: Introduction to Intel® Wireless MMX™ Technology |
|
|
9 | (1) |
|
Part 2: Optimization Techniques |
|
|
10 | (1) |
|
|
11 | (1) |
|
|
12 | (1) |
|
Understanding SIMD Processing |
|
|
13 | (12) |
|
|
13 | (4) |
|
|
14 | (1) |
|
|
15 | (1) |
|
|
16 | (1) |
|
|
17 | (1) |
|
When Should SIMD Processing Be Used? |
|
|
17 | (7) |
|
|
17 | (3) |
|
|
20 | (4) |
|
|
24 | (1) |
|
|
24 | (1) |
|
Intel® Wireless MMX™ Technology: The Big Picture |
|
|
25 | (42) |
|
|
26 | (1) |
|
Understanding Instruction Syntax |
|
|
27 | (3) |
|
|
28 | (1) |
|
|
28 | (1) |
|
|
28 | (2) |
|
|
30 | (6) |
|
Retrieving and Saving Data |
|
|
30 | (2) |
|
|
32 | (1) |
|
Transferring a Whole Register to the Core |
|
|
33 | (1) |
|
Transferring Elements to the Core |
|
|
34 | (2) |
|
Preparing the Data for Processing |
|
|
36 | (5) |
|
|
36 | (1) |
|
|
37 | (1) |
|
|
38 | (3) |
|
|
41 | (5) |
|
|
43 | (1) |
|
|
44 | (1) |
|
|
45 | (1) |
|
|
46 | (1) |
|
|
47 | (9) |
|
Multiply Accumulate Instructions |
|
|
49 | (4) |
|
|
53 | (3) |
|
|
56 | (2) |
|
Sum of Absolute Difference |
|
|
56 | (1) |
|
|
57 | (1) |
|
Miscellaneous Instructions |
|
|
58 | (3) |
|
|
61 | (4) |
|
|
61 | (1) |
|
Group Conditional Execution |
|
|
62 | (2) |
|
|
64 | (1) |
|
|
65 | (2) |
|
Understanding the Hardware |
|
|
67 | (20) |
|
|
68 | (3) |
|
|
69 | (1) |
|
The Intel XScale® Microarchitecture |
|
|
70 | (1) |
|
|
71 | (4) |
|
Pipeline Structure Overview |
|
|
75 | (6) |
|
|
75 | (2) |
|
|
77 | (1) |
|
|
78 | (1) |
|
|
79 | (2) |
|
|
81 | (5) |
|
Understanding How Instructions Execute |
|
|
81 | (5) |
|
|
86 | (1) |
|
|
86 | (1) |
|
Part II Enabling Applications with Intel® Wireless MMX™ Technology |
|
|
87 | (168) |
|
|
89 | (12) |
|
|
89 | (1) |
|
|
90 | (1) |
|
Pipelined Microprocessors |
|
|
91 | (1) |
|
Understanding the Impact of Stalls |
|
|
91 | (5) |
|
The Supply of Instructions |
|
|
91 | (4) |
|
|
95 | (1) |
|
Identifying Algorithm Techniques |
|
|
96 | (2) |
|
|
97 | (1) |
|
Multi-Sample Calculations |
|
|
97 | (1) |
|
|
98 | (1) |
|
|
98 | (1) |
|
|
98 | (2) |
|
|
100 | (1) |
|
|
100 | (1) |
|
|
101 | (20) |
|
Intel® VTune™ Performance Analyzer |
|
|
103 | (2) |
|
|
103 | (2) |
|
Before You Begin Sampling |
|
|
105 | (2) |
|
|
105 | (1) |
|
|
106 | (1) |
|
|
107 | (2) |
|
Determining the Time Profile |
|
|
109 | (3) |
|
Using Events to Analyze Hotspots |
|
|
112 | (7) |
|
What Are the Important Events? |
|
|
112 | (5) |
|
Deriving Performance Metrics |
|
|
117 | (2) |
|
|
119 | (1) |
|
More About VTune Performance Analyzer |
|
|
120 | (1) |
|
|
120 | (1) |
|
Intel® Integrated Performance Primitives |
|
|
121 | (22) |
|
What Are the Intel Integrated Performance Primitives (Intel IPP)? |
|
|
121 | (3) |
|
Digital Signal Processing Functions |
|
|
122 | (1) |
|
|
123 | (1) |
|
|
123 | (1) |
|
|
123 | (1) |
|
Image and Video Compression |
|
|
124 | (1) |
|
|
124 | (1) |
|
Horizontal and Vertical Functions |
|
|
124 | (2) |
|
Where Does Intel IPP Fit with the OS? |
|
|
126 | (1) |
|
|
127 | (3) |
|
|
127 | (1) |
|
|
128 | (1) |
|
|
128 | (2) |
|
When Not to Use Intel IPP |
|
|
130 | (1) |
|
Understanding Intel IPP Names |
|
|
130 | (1) |
|
|
131 | (1) |
|
|
132 | (2) |
|
|
134 | (4) |
|
|
134 | (2) |
|
|
136 | (1) |
|
|
136 | (1) |
|
Intel GPP Naming Conventions |
|
|
137 | (1) |
|
|
138 | (3) |
|
|
138 | (1) |
|
|
139 | (2) |
|
What Do You Need to Get Started with Intel IPP? |
|
|
141 | (1) |
|
|
141 | (2) |
|
Embedded Software Development Tool Chain |
|
|
143 | (36) |
|
Embedded System Software Design Flow |
|
|
143 | (2) |
|
Integrated Design Environment |
|
|
145 | (2) |
|
|
147 | (2) |
|
|
149 | (3) |
|
|
152 | (1) |
|
|
153 | (1) |
|
|
154 | (1) |
|
|
155 | (2) |
|
|
157 | (2) |
|
Vectorization Programming Guidelines |
|
|
159 | (16) |
|
Making Kernels of the Loop Vectorizer-Friendly |
|
|
160 | (4) |
|
Vectorizer-Friendly Addressing |
|
|
164 | (2) |
|
Vectorizer-Friendly Loop Organization |
|
|
166 | (2) |
|
|
168 | (4) |
|
Compiler Directives to Help Vectorization |
|
|
172 | (3) |
|
|
175 | (3) |
|
|
178 | (1) |
|
Optimizing for Memory Subsystems |
|
|
179 | (28) |
|
Characteristics of Memory Subsystems |
|
|
180 | (1) |
|
Memory Subsystems in Intel Application Processors |
|
|
180 | (1) |
|
Memory Subsystems-based Optimization |
|
|
181 | (1) |
|
Configuring Memory with the Correct Properties |
|
|
182 | (4) |
|
Optimizing Instruction Accesses |
|
|
184 | (1) |
|
|
184 | (2) |
|
Ensuring Correct Data Objects Placement |
|
|
186 | (3) |
|
Reducing the Impact of Data Cache Misses |
|
|
189 | (7) |
|
|
189 | (1) |
|
Preloading Data into the Cache |
|
|
190 | (6) |
|
Increasing Memory Throughput |
|
|
196 | (4) |
|
Utilizing the Nonblocking Multiple Loads |
|
|
197 | (2) |
|
Utilizing Write Combining |
|
|
199 | (1) |
|
Programming Techniques for Memory and Cache Efficiency |
|
|
200 | (6) |
|
Choosing Data Types and Alignment |
|
|
201 | (1) |
|
Increasing Cache Locality |
|
|
201 | (3) |
|
Intelligent Addressing for Reducing Cache Eviction |
|
|
204 | (1) |
|
|
205 | (1) |
|
|
206 | (1) |
|
|
207 | (26) |
|
Microarchitectural Optimization Philosophy |
|
|
208 | (1) |
|
Choosing the Right Instruction |
|
|
208 | (1) |
|
Choosing the Right Sequence |
|
|
209 | (1) |
|
Stall-Directed Instruction Scheduling |
|
|
209 | (4) |
|
|
210 | (1) |
|
|
210 | (2) |
|
Decomposition of an Application |
|
|
212 | (1) |
|
Optimization for Data-Processing Operations |
|
|
213 | (10) |
|
|
213 | (2) |
|
Fast Multiply and Accumulation |
|
|
215 | (2) |
|
Scheduling in the Addition and Logical Pipeline |
|
|
217 | (1) |
|
Getting Data from Cache to Register and Back Efficiently |
|
|
218 | (4) |
|
Optimizing Align and Shift |
|
|
222 | (1) |
|
Optimization for Control-Oriented Operations |
|
|
223 | (9) |
|
|
223 | (1) |
|
Use Conditional Instruction |
|
|
224 | (5) |
|
Use Addressing Modes Efficiently |
|
|
229 | (1) |
|
|
230 | (2) |
|
|
232 | (1) |
|
|
233 | (22) |
|
|
233 | (2) |
|
|
234 | (1) |
|
Intel® MMX™ and SSE Technology Mapping |
|
|
235 | (2) |
|
|
235 | (2) |
|
Understanding the Differences |
|
|
237 | (10) |
|
|
237 | (2) |
|
|
239 | (1) |
|
|
239 | (2) |
|
|
241 | (1) |
|
New Instruction Capability |
|
|
242 | (5) |
|
Case Studies of Code Porting |
|
|
247 | (4) |
|
Complex Multiply by a Constant |
|
|
247 | (1) |
|
Absolute Difference of Unsigned Numbers |
|
|
248 | (1) |
|
Absolute Difference of Signed Numbers |
|
|
248 | (1) |
|
Reducing Precision with an Interleaved Pack |
|
|
249 | (1) |
|
|
250 | (1) |
|
Porting Code Using C-Intrinsics |
|
|
251 | (3) |
|
|
254 | (1) |
|
Part III Intel® Wireless MMX™ Technology Case Studies |
|
|
255 | (170) |
|
Accelerating Graphics Applications |
|
|
257 | (40) |
|
|
257 | (2) |
|
Indicators of Performance |
|
|
259 | (1) |
|
|
259 | (1) |
|
|
260 | (1) |
|
|
260 | (3) |
|
|
260 | (3) |
|
Performance Optimization Methods |
|
|
263 | (10) |
|
|
264 | (1) |
|
Computational Optimization |
|
|
264 | (4) |
|
Memory-Related Optimization |
|
|
268 | (5) |
|
Optimizing Graphics Application Kernels |
|
|
273 | (22) |
|
|
273 | (22) |
|
|
295 | (2) |
|
Digital Signal Processing |
|
|
297 | (36) |
|
|
297 | (5) |
|
|
298 | (3) |
|
|
301 | (1) |
|
|
302 | (16) |
|
Memory Organization for the FIR Filter |
|
|
304 | (2) |
|
|
306 | (12) |
|
|
318 | (5) |
|
Computing the Biquad IIR Filter |
|
|
320 | (3) |
|
|
323 | (8) |
|
|
324 | (2) |
|
Computing the Single Sample FIR-LMS Filter |
|
|
326 | (5) |
|
|
331 | (2) |
|
|
333 | (38) |
|
Image Processing Fundamentals |
|
|
334 | (8) |
|
|
335 | (2) |
|
|
337 | (3) |
|
|
340 | (2) |
|
|
342 | (13) |
|
|
343 | (4) |
|
Filtering with Non-separable Kernels |
|
|
347 | (5) |
|
Filtering with Separable Kernels |
|
|
352 | (3) |
|
|
355 | (8) |
|
Nearest Neighbor Replication |
|
|
357 | (2) |
|
Color Synthesis Using NNR |
|
|
359 | (4) |
|
|
363 | (5) |
|
Color Correcting the RGB Image |
|
|
364 | (4) |
|
|
368 | (3) |
|
H.264 and MPEG-4 Video Compression |
|
|
371 | (54) |
|
|
371 | (2) |
|
|
372 | (1) |
|
|
373 | (14) |
|
|
374 | (4) |
|
Motion Estimation and Compensation |
|
|
378 | (6) |
|
2D Discrete Cosine Transforms (DCT) |
|
|
384 | (2) |
|
|
386 | (1) |
|
|
386 | (1) |
|
Intel® Wireless MMX™ in Motion Estimation and Compensation |
|
|
387 | (36) |
|
|
387 | (6) |
|
Fixed Pattern Motion Estimation |
|
|
393 | (15) |
|
Motion Compensation with Intel Wireless MMX Technology |
|
|
408 | (1) |
|
|
408 | (7) |
|
|
415 | (8) |
|
|
423 | (2) |
References |
|
425 | (2) |
Index |
|
427 | |