3. Matrix-Matrix operations: C = A + B, D = AB, E = A−1.
A number of problems can be solved once one has these basic operations
(especially in physical simulations). This is one of the most studied problems
on the GPU.
2 The Inner Product
Consider the inner product c = a · b, which we rewrite as
2.1 Technique 1: Small memory
Each vector is stored in a 1D texture. In the ith rendering pass, we render a
single point at coordinates (0,0), having a single texture coordinate i . The
fragment program uses i to index into the two textures and returns the value
, where s is the running sum maintained over the previous i − 1
passes. Note that since we cannot read and write the
location where s is
stored in a single pass, we use a ping-pong trick to maintain s.
This procedure takes n passes, and requires only a fixed number of texture
locations (excluding the storage for a and b).
2.2 Technique 2: Fewer passes
The second technique uses more working memory (n units), but requires
fewer passes. We write a and b as 2D textures (2D textures al low for more
storage , since the dimension of a texture is typically bounded, and are better
optimized by the rasterizer).
We now multiply the contents of the textures, storing the result in a
third texture c. This can be done with a simple fragment program that takes
the fragment coordinates and looks up the a and b textures, returning their
product. We render a single quad in order to activate the fragment program.
Express multiplication of two matrices as dot product of vectors of matrix
rows and columns. That is to compute some cell cij of matrix C, we take the
dot product of row i of matrix A with column j of matrix B:
1st program used multitexturing and blending, each plane
each place in the answer. In 1st pass: AB :
We can use inner quad idea to do this:
if at location (x, y)
1)uses n passes
2)space N = n2
3.3 Technique 2: A Speedup
”Dense Matrix Multiplication” by
To make it faster:
Instead of making one computation per pass, compute multiple additions per
pass in fragment program:
Pass 1 becomes: output
Must consider that there is a tradeoff between the length of fragment
program vs. the number of passes.
3.4 Technique 3: Using All Channels
”Cache and Bandwidth Aware Matrix Multiplication on the GPU”,
by Hall, Carr and Hart”.
We have been using only the red component, propose storing
Start solving your Algebra Problems
in next 5 minutes!
Download (and optional CD)
Click to Buy Now:
2Checkout.com is an authorized reseller
of goods provided by Sofmath
Attention: We are
currently running a special promotional offer
for Algebra-Answer.com visitors -- if you order
Algebra Helper by midnight of
you will pay only $39.99
instead of our regular price of $74.99 -- this is $35 in
savings ! In order to take advantage of this
offer, you need to order by clicking on one of
the buttons on the left, not through our regular
If you order now you will also receive 30 minute live session from tutor.com for a 1$!
You Will Learn Algebra Better - Guaranteed!
Just take a look how incredibly simple Algebra Helper is:
: Enter your homework problem in an easy WYSIWYG (What you see is what you get) algebra editor:
Step 2 :
Let Algebra Helper solve it:
Step 3 : Ask for an explanation for the steps you don't understand:
Algebra Helper can solve problems in all the following areas:
simplification of algebraic expressions (operations
with polynomials (simplifying, degree, synthetic division...), exponential expressions, fractions and roots
(radicals), absolute values)