Adaptive-Weighted Lucas-Kanade Optical Flow on the GPU¶
An optical flow algorithm estimates motion between consecutive video frames. Optical flow is crucial in object detection, object recognition, motion estimation, video compression, and video effects.
This post covers an HLSL implementation of the Lucas-Kanade optical flow algorithm.
The Brightness Constancy Assumption¶
When we use our eyes to track an object, we make assumptions to determine if an object has moved. For example, we can infer that a red dot has moved if we observe it maintaining its red color but appearing in a different location than it did a moment ago.
Accurate motion estimation in video relies on fundamental assumptions:
The intensity (brightness and color) of an object’s movement in two consecutive images remains approximately constant.
The movement of objects between two images is small.
These assumptions form the basis of the Brightness Constancy Assumption.
Note
The Brightness Constancy Assumption has a limitation: This assumption holds best for objects whose appearance does not significantly change between frames. For instance, it would struggle with a ball that constantly changes color or an object moving into shadow or direct light.
The Optical Flow Equation¶
Let’s revisit the Brightness Constancy Assumption:
From this direct equality, it’s not obvious how to create a formula for optical flow, as it states that the intensity at a point \((x, y)\) in the previous image \(I\) at time \(t\) is equal to the intensity of the same point at a new position \((x + u, y + v)\) in the current image at time \(t + 1\). Our goal is to find \(u\) and \(v\).
To achieve this, we need a mathematical way to approximate the rate of change of image intensity from \(I(x, y, t)\) to \(I(x + u, y + v, t + 1)\). This is where derivatives and the Taylor series expansion become crucial.
We apply a first-order Taylor series expansion to the right-hand side of the Brightness Constancy Assumption, around the point \((x, y, t)\):
Substituting this approximation back into the Brightness Constancy Assumption and simplifying:
This is the Optical Flow Equation. Rearranging it to isolate the temporal change:
This is the spatial gradient (how brightness changes horizontally and vertically):
This is the temporal gradient (how brightness changes over time at a fixed location):
Our objective is to solve for \(u\) and \(v\), the horizontal and vertical components of the optical flow vector.
Note
We use a first-order Taylor series expansion because the “small movement” assumption means that the changes regarding \(x\), \(y\), \(z\) are small. This allows us to ignore higher-order terms in the expansion, which simplifies the math significantly while still providing a good approximation.
The Aperture Problem - In Practice¶
Here’s a practical demonstration of the Aperture Problem.
Get a string long enough that you cannot see its ends when viewing it through a small, fixed opening (an “aperture”).
Position the string behind the opening.
Angle the string at 45-degrees.
Now, slide the string through the opening in the following ways, ensuring its ends remain outside your view through the hole:
Horizontally slide the string across the opening.
Vertically slide the string across the opening.
Diagonally slide the string across the opening.
Did you see a difference in motion when sliding the string horizontally, vertically, or diagonally? Probably not, unless you can see the entire string within the opening.
The Problem: Your limited perception through the small aperture causes you to observe the string appearing to “move the same way” (only perpendicular to its orientation), regardless of its actual global movement direction. You cannot disambiguate its true 2D motion.
Let’s examine the mathematical version of this problem.
The Aperture Problem - In Mathematics¶
Consider the Optical Flow Equation:
Imagine you’re back in your underfunded school’s math class, and your teacher asks the class to solve the following single linear equation for unknowns \(u\) and \(v\):
Possible solutions the class might propose include:
This demonstrates that for a single pixel (which acts as a tiny aperture), the optical flow equation provides only one equation on two unknowns \(u\) and \(v\). Consequently, there are infinitely many pairs of \((u, v)\) that satisfy the equation. If you plot these solutions on a graph, they all lie on a single line, meaning the true direction of motion is ambiguous - only the component of motion perpendicular to the image gradient can be determined.
The Lucas-Kanade Approach to The Aperture Problem¶
The Lucas-Kanade method is a local technique designed to overcome the aperture problem by solving a system of optical flow equations within a small spatial window or neighborhood.
To estimate the local image flow at a given point, the Lucas-Kanade method employs a least-squares approach. This method solves an overdetermined system of linear equations, where each pixel within the chosen window contributes an optical flow equation.
The standard Lucas-Kanade algorithm typically solves these systems of equations within a 3x3 window, as this size often provides a good balance, effectively considering motion components in various directions.
Least-Squares Derivation¶
This is the initial system of linear equations in the form \(A \mathbf{x} = \mathbf{b}\).
To find the least-squares solution, we multiply both sides by the transpose of the matrix, \(A^T\).
The result of the matrix multiplication is expressed in summation form.
We now multiply both sides by the inverse of the matrix on the left, \((A^T A)^{-1}\), to isolate the \(\begin{bmatrix} u \\ v \end{bmatrix}\) vector.
The final step is the solution for the vector \(\begin{bmatrix} u \\ v \end{bmatrix}\).
Using Bilateral Weights¶
The standard Lucas-Kanade method treats all pixels in the neighborhood equally. However, this can lead to inaccuracies near edges or in the presence of noise, where some pixels in the window may not belong to the same moving object.
To improve robustness, bilateral weighting assigns a weight to each pixel’s contribution based on its spatial and intensity distance from the center pixel.
The weight \(W\) is the product of a spatial weight and a range weight:
The spatial weight ensures that pixels closer to the center have more influence:
The range weight reduces the influence of pixels with intensity differences, which indicate an edge or a different object:
These weights are then incorporated into the least-squares summation, performing a weighted least-squares estimation.
Using Pyramids¶
The Lucas-Kanade method, while effective for small displacements, becomes less accurate for large motions. This is because large movements violate the “small movement” assumption inherent in the first-order Taylor expansion and the brightness constancy assumption. To handle larger motions while maintaining efficiency and adherence to assumptions, a hierarchical, or “pyramid,” approach is used:
This approach ensures:
It does not break the brightness constancy assumption, as motion is incrementally estimated at different scales.
It handles cases where the actual movement between two images is significant.
It facilitates fast computation by starting with coarse motion estimates at lower resolutions.
It covers motion in areas larger than a 3x3 window by propagating estimates across pyramid levels.
The pyramid Lucas-Kanade algorithm consists of the following general steps:
Create an image pyramid for the current frame and previous frame.
Initialize the motion vector at the smallest pyramid level to 0.0 or a previous estimate.
Compute optical flow iteratively from the smallest pyramid level to the largest level. At each level, the flow from the smaller level is used to “warp” the image, reducing the remaining displacement, and then a refinement is calculated.
Cache the current frame (or its pyramid) for use as the “previous frame” in the next optical flow calculation.
Optionally, filter the computed optical flow vectors to remove noise or outliers.
Source Code¶
Note
The code contains generic functions, so you may need to change some parts of the code so it is compatible with your setup.
/*
Function to convert 2D row and column (0-indexed) to a 1D index.
GridPos.x: The 0-indexed row number.
GridPos.y: The 0-indexed column number.
GridWidth: The total width of the grid (number of columns).
Returns a 1D index.
*/
int Get1DIndexFrom2D(int2 GridPos, int GridWidth)
{
return GridPos.x + (GridPos.y * GridWidth);
}
float4 Pyramid(float3 Color)
{
float Sum = dot(Color, 1.0);
float3 Ratio = abs(Sum) > 0.0 ? Color / Sum : 1.0 / 3.0;
float MaxRatio = max(Ratio.r, max(Ratio.g, Ratio.b));
float MaxColor = max(Color.r, max(Color.g, Color.b));
return float4(Ratio / MaxRatio, sqrt(MaxColor));
}
/*
Lucas-Kanade optical flow with bilinear fetches. The algorithm is motified to not output in pixels, but normalized displacements.
---
Gauss-Newton Steepest Descent Inverse Additive Algorithm
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International journal of computer vision, 56, 221-255.
https://www.researchgate.net/publication/248602429_Lucas-Kanade_20_Years_On_A_Unifying_Framework_Part_1_The_Quantity_Approximated_the_Warp_Update_Rule_and_the_Gradient_Descent_Approximation
---
Application of Lucas-Kanade algorithm with weight coefficient bilateral filtration for the digital image correlation method
Titkov, V. V., Panin, S. V., Lyubutin, P. S., Chemezov, V. O., & Eremin, A. V. (2017). Application of Lucas-Kanade algorithm with weight coefficient bilateral filtration for the digital image correlation method. IOP Conference Series: Materials Science and Engineering, 177, 012039. https://doi.org/10.1088/1757-899X/177/1/012039
*/
float2 GetLucasKanade(
bool IsCoarse,
float2 MainTex,
float2 Vectors,
sampler2D SampleT,
sampler2D SampleI
)
{
// Get gradient information
float2 PixelSize = fwidth(MainTex);
/*
* = Indecies for calculating the temporal gradient (IT)
- = Unused indecies
Template indecies:
00- 01 02 03 04-
05 06* 07* 08* 09
10 11* 12* 13* 14
15 16* 17* 18* 19
20- 21 22 23 24-
Template (Row, Column):
(4, 0) (4, 1) (4, 2) (4, 3) (4, 4)
(3, 0) (3, 1) (3, 2) (3, 3) (3, 4)
(2, 0) (2, 1) (2, 2) (2, 3) (2, 4)
(1, 0) (1, 1) (1, 2) (1, 3) (1, 4)
(0, 0) (0, 1) (0, 2) (0, 3) (0, 4)
*/
// Initiate Cache
const int CacheWidth = 5;
const int CacheIndexSize = CacheWidth * CacheWidth;
float4 Cache[CacheIndexSize];
// Create Cache
// This unrolled version samples and assigns to the Cache array.
// The four corners of the 5x5 grid are skipped in the original code,
// so they are not included in this rewrite.
Cache[1] = tex2D(SampleT, MainTex + (float2(-1, 2) * PixelSize));
Cache[2] = tex2D(SampleT, MainTex + (float2(0, 2) * PixelSize));
Cache[3] = tex2D(SampleT, MainTex + (float2(1, 2) * PixelSize));
Cache[5] = tex2D(SampleT, MainTex + (float2(-2, 1) * PixelSize));
Cache[6] = tex2D(SampleT, MainTex + (float2(-1, 1) * PixelSize));
Cache[7] = tex2D(SampleT, MainTex + (float2(0, 1) * PixelSize));
Cache[8] = tex2D(SampleT, MainTex + (float2(1, 1) * PixelSize));
Cache[9] = tex2D(SampleT, MainTex + (float2(2, 1) * PixelSize));
Cache[10] = tex2D(SampleT, MainTex + (float2(-2, 0) * PixelSize));
Cache[11] = tex2D(SampleT, MainTex + (float2(-1, 0) * PixelSize));
Cache[12] = tex2D(SampleT, MainTex + (float2(0, 0) * PixelSize));
Cache[13] = tex2D(SampleT, MainTex + (float2(1, 0) * PixelSize));
Cache[14] = tex2D(SampleT, MainTex + (float2(2, 0) * PixelSize));
Cache[15] = tex2D(SampleT, MainTex + (float2(-2, -1) * PixelSize));
Cache[16] = tex2D(SampleT, MainTex + (float2(-1, -1) * PixelSize));
Cache[17] = tex2D(SampleT, MainTex + (float2(0, -1) * PixelSize));
Cache[18] = tex2D(SampleT, MainTex + (float2(1, -1) * PixelSize));
Cache[19] = tex2D(SampleT, MainTex + (float2(2, -1) * PixelSize));
Cache[21] = tex2D(SampleT, MainTex + (float2(-1, -2) * PixelSize));
Cache[22] = tex2D(SampleT, MainTex + (float2(0, -2) * PixelSize));
Cache[23] = tex2D(SampleT, MainTex + (float2(1, -2) * PixelSize));
// Loop over the starred template areas
const int FetchGridWidth = 3;
const int FetchGridSize = FetchGridWidth * FetchGridWidth;
// .xy = TemplateGridPos; .zw = FetchPos
const int4 P[FetchGridSize] =
{
int4(int2(-1, 1), int2(1, 1)),
int4(int2(0, 1), int2(2, 1)),
int4(int2(1, 1), int2(3, 1)),
int4(int2(-1, 0), int2(1, 2)),
int4(int2(0, 0), int2(2, 2)),
int4(int2(1, 0), int2(3, 2)),
int4(int2(-1, -1), int2(1, 3)),
int4(int2(0, -1), int2(2, 3)),
int4(int2(1, -1), int2(3, 3))
};
// Initialize variables
float IxIx = 0.0;
float IyIy = 0.0;
float IxIy = 0.0;
float IxIt = 0.0;
float IyIt = 0.0;
float SumW = 0.0;
Vectors = clamp(Vectors, -1.0, 1.0);
// Calculate warped texture coordinates
float2 WarpTex = MainTex;
WarpTex -= 0.5; // Pull into [-0.5, 0.5) range
WarpTex -= Vectors; // Inverse warp in the [-0.5, 0.5) range
WarpTex = saturate(WarpTex + 0.5); // Push and clamp into [0.0, 1.0) range
// Get center textures (this is for the spatial weighting)
float4 CenterT = Cache[Get1DIndexFrom2D(int2(2, 2), CacheWidth)];
float4 CenterI = tex2D(SampleI, WarpTex);
[unroll]
for (int i = 0; i < FetchGridSize; i++)
{
// Get cached data
float4 North = Cache[Get1DIndexFrom2D(P[i].zw + int2(0, -1), CacheWidth)];
float4 South = Cache[Get1DIndexFrom2D(P[i].zw + int2(0, 1), CacheWidth)];
float4 East = Cache[Get1DIndexFrom2D(P[i].zw + int2(-1, 0), CacheWidth)];
float4 West = Cache[Get1DIndexFrom2D(P[i].zw + int2(1, 0), CacheWidth)];
float4 R0 = Cache[Get1DIndexFrom2D(P[i].zw, CacheWidth)];
// Get R0 and R1 to calculate temporal gradient
bool IsCenter = (P[i].x == 0) && (P[i].y == 0);
float2 Offset = float2(P[i].xy);
// Get dynamic data
float4 R1 = IsCenter ? CenterI : tex2D(SampleI, WarpTex + (float2(P[i].xy) * PixelSize));
float4 It = 0.0;
// Calculate bilateral weighting
float Weight = exp2(-(abs(Offset.x) + abs(Offset.y)));
// Calculate range weights
if (!IsCenter)
{
It = R0 - CenterT;
Weight *= (1.0 / (1.0 + dot(It, It)));
It = R1 - CenterI;
Weight *= (1.0 / (1.0 + dot(It, It)));
}
// Immediately calculate spatial gradients
float4 Ix = (East * 0.5) - (West * 0.5);
float4 Iy = (South * 0.5) - (North * 0.5);
It = R1 - R0;
// Summate the weighted contributions
IxIx += (dot(Ix, Ix) * Weight);
IxIt += (dot(Ix, It) * Weight);
IyIy += (dot(Iy, Iy) * Weight);
IyIt += (dot(Iy, It) * Weight);
IxIy += (dot(Ix, Iy) * Weight);
SumW += Weight;
}
// Check if SumW is not 0
SumW = (SumW == 0.0) ? 0.0 : 1.0 / SumW;
// Normalized weighted variables
IxIx *= SumW;
IyIy *= SumW;
IxIy *= SumW;
IxIt *= SumW;
IyIt *= SumW;
/*
Calculate Lucas-Kanade matrix
---
[ Ix^2/D -IxIy/D] = [-IxIt]
[-IxIy/D Iy^2/D] [-IyIt]
*/
float2x2 A = float2x2(IxIx, IxIy, IxIy, IyIy);
float2 B = float2(IxIt, IyIt);
// Calculate C factor
float2 E = -B;
float N = dot(E, E);
float D = dot(E, mul(A, E));
float C = N / D;
// Calculate -C * B
float2 Flow = (abs(D) > 0.0) ? -C * B : 0.0;
// Normalize motion vectors
Flow *= PixelSize;
// Propagate normalized motion vectors in Norm Range
Vectors += Flow;
// Clamp motion vectors to restrict range to valid lengths
Vectors = clamp(Vectors, -1.0, 1.0);
return Vectors;
}
References¶
Baker, S., & Matthews, I. (2004). Lucas-kanade 20 years on: A unifying framework. International journal of computer vision, 56, 221-255.
Rojas, R. (2010). Lucas-kanade in a nutshell. Freie Universit at Berlinn, Dept. of Computer Science, Tech. Rep.
Titkov, V. V., Panin, S. V., Lyubutin, P. S., Chemezov, V. O., & Eremin, A. V. (2017). Application of Lucas-Kanade algorithm with weight coefficient bilateral filtration for the digital image correlation method. IOP Conference Series: Materials Science and Engineering, 177, 012039. https://doi.org/10.1088/1757-899X/177/1/012039
Wikipedia contributors. (2024, May 15). Lucas-Kanade method. In Wikipedia, The Free Encyclopedia. Retrieved 18:46, July 3, 2025, from https://en.wikipedia.org/w/index.php?title=Lucas%E2%80%93Kanade_method&oldid=1223913530