Matrix Multiplication Optimization

August 21, 2012 admin

Abstract:

The examples shows two ways of performing matrix multiplication, a simple one that gives moderate performance, and a slightly more complex versions that achieves optimal performance using the e-gcc compiler. The optimal code was partially unrolled to allow the compiler to take advantage of the double-load store of the architecture and to avoid unnecessary pipeline stalls.

Naive Code:

unsigned matmul_naive(float * restrict a, float * restrict b, float * restrict c)
{
	int i, j, k;

	for (i=0; i
Optimized Code:
unsigned matmul(float * restrict aa, float * restrict bb, float * restrict cc)
{
    int i = 0;

	for (i=0; i
Compile Switches:

{-Wall -O3 -std=c99 -mlong-calls -mfp-mode=round-nearest -ffp-contract=fast -funroll-loops}
Share:
Click to share on Twitter (Opens in new window)
Click to share on LinkedIn (Opens in new window)
Click to share on Facebook (Opens in new window)

AI

Matrix Multiplication Optimization

Latest Posts