Using Dart's extension types in vector_math
With Dart version 3.3, the Dart team released the new language feature extension types. Extension types wrap existing Dart types and allow you to add a custom interface with your own methods around them. This is especially useful for native types like DOM element. Since they are statically typed, they allow for further performance optimizations. vector_math
is a wrapper around native float list types and I want to see if extension types can be useful to reduce overhead and improve performance of critical code.
I really like Dart for its simple language design – it’s similar to JavaScript, but avoids the evolutional flaws that are impossible to get rid of in the JavaScript ecosystem. I have used it a lot in the past to work on video games where performance is crucial. Especially in 3D games, vector and matrix transformations are a important for rendering and collision detection. Fortunately John McCutchan had already written a math operations package for the Dart ecosystem and it became a central part – it’s even used by Flutter, the core Dart ecosystem tool. Because math operations are often performed in critical execution paths, performance optimizations are important and provide big wins. Over the time, many people have contributed with patterns that optimize the operations (including me!).
How does vector_math work internally?
The basic building blocks of vector_math
are the VectorN
and MatrixN
classes such as Vector3
or Matrix4
. The N
denotes the dimension of the element. Internally, a VectorN
is a wrapper around a Float32List
. This has two advantages over separate fields per element or a simple array:
One advantage is that storage inside a Float32List
is much more compact than separate double fields or a List<double>
, since single precision takes half as many bytes as the double precision, and the values are packed more compactly in memory. Dart provides no other way to force variables to use single-precision floating-point values.
The second advantage is that the compiler can be certain that the list contains only floating-point values, and can therefore perform additional performance optimizations. Since the Float32List
has a fixed size, the compiler can skip the bounds check when accessing the list, which allows for even more optimizations (it is a bit tricky to get Dart2Js to actually do this, for example by having distinct field names in the different classes). The resulting code looks something like this:
class Vector3 {
final Float32List _v3storage;
Vector3.zero() : _v3storage = Float32List(3);
factory Vector3(double x, double y, double z) =>
Vector3.zero()..setValues(x, y, z);
void add(Vector3 arg) {
final argStorage = arg._v3storage;
_v3storage[2] += argStorage[2];
_v3storage[1] += argStorage[1];
_v3storage[0] += argStorage[0];
}
}
const v0 = Vector(1.0, 2.0, 3.0).add(Vector(1.0, 1.0, 1.0))
// v0 is [2.0, 3.0, 4.0]
While the internals are quite ugly to write, this still allows to keep a nice interface for the consumers of the library – with a great performance!
What are extension types?
Extension types are a compile-time abstraction that wraps an existing type with a custom interface. Similar to extension methods, they allow an existing type to be extended with additional methods, but also allow to methods to be hidden from the underlying type. They serve the same purpose as a wrapper class, but because they are present only at compile time and are removed at runtime, they have zero cost and no overhead.
The following Id
wrapper is a small example of an extension type. It has a constructor, that is similar to a class constructor and provides a printToConsole
method. Internally the id is stored as a string value, but normal string operations are not possible on the Id
type.
extension type Id(String id) {
factory Id.generate() => Id(Random().nextInt(1 << 32).toString());
void printToConsole() => print(id);
}
void main() {
final id = Id.generate();
id.printToConsole();
}
Compared to a wrapper class, extension types also have some limitations: For example, it is not possible to override core methods such as toString
. Since nothing about the type is preserved at runtime, dynamic type checking is not possible.
Applying extension types to vector_math
Having wrapper classes around the actual values is a central thing I always questioned when it comes to performance of vector_math
, as this might introduce additional overhead. The promise that extension methods have zero costs sounds interesting and is definitely something I would like to try out. I’m hoping that the extension types helper the compiler to perform even better optimizations.
Let’s apply extension types to our Vector3
class from above (with some additional methods):
extension type Vector3.fromFloat32List(Float32List _storage) {
Vector3.zero() : this.fromFloat32List(Float32List(3));
factory Vector3(double x, double y, double z) =>
Vector3.zero()..setValues(x, y, z);
void add(Vector3 arg) {
final argStorage = arg._storage;
_storage[2] += argStorage[2];
_storage[1] += argStorage[1];
_storage[0] += argStorage[0];
}
void setValues(double x, double y, double z) {
_storage[2] = z;
_storage[1] = y;
_storage[0] = x;
}
Vector3 clone() => Vector3.copy(this);
}
The usage stays the same, so this change could be a drop in replacement, but due to the limitations of extension types it would be a big breaking change.
(Micro-)Benchmarking
But does it really make a difference compared to normal wrapper classes? To find out, we need to benchmark it and compare the results. Unfortunately the benchmark suite in the vector_math
repository doesn’t cover that much, so I went ahead and created some new benchmarks.
I want to compare some simple operations using the Vector3
class from above in the different execution platforms for Dart code: The VM, Dart2Js and AoT compiled code. To make the whole process easily reproducible, I created a small script that compiles the executables and runs the benchmark on each platform. You can find my experiments in this GitHub branch.
To start, I created two simple benchmarks, each measuring a single operation:
class Vector3CloneBenchmark extends BenchmarkBase {
const Vector3CloneBenchmark() : super('Vector3Clone');
@override
void run() {
for (double i = -500; i <= 500; i += 0.75) {
for (double j = -500; j <= 500; j += 0.75) {
final _ = Vector3(j, i, 0).clone();
}
}
}
}
class Vector3AddBenchmark extends BenchmarkBase {
const Vector3AddBenchmark() : super('Vector3Add');
@override
void run() {
final vec = Vector3.zero();
for (double i = -500; i <= 500; i += 0.75) {
for (double j = -500; j <= 500; j += 0.75) {
vec.add(Vector3(i, j, i + j));
}
}
}
}
When I run the benchmarks multiple times, I end up with the following results:
Benchmark | Platform | Wrapper Classes – Runtime | Extension Types – Runtime | Improvement – Runtime |
---|---|---|---|---|
Vector3Clone | VM | 75.52ms | 75.95ms | -0.57% |
Vector3Clone | AoT | 148.50ms | 121.16ms | +18.41% |
Vector3Clone | Dart2Js | 915.70ms | 929.80ms | -1.54% |
Vector3Add | VM | 108.29ms | 107,67ms | +0.57% |
Vector3Add | AoT | 107.00ms | 106.16ms | +0.78% |
Vector3Add | Dart2Js | 495.40ms | 493.18ms | -0.45% |
Except for the Vector3Clone
benchmark on AoT, the results are quite similar. The difference of one percent or less is probably a measurement error, then a real change. I was actually hoping for bigger differences. The good results for the AoT benchmark are a bit surprising, looks like the compiler is able to optimize the code better with the extension types. In the end the results are still interesting, as they proof that the extension types don’t have a negative impact on the performance.
I also measured the size of the executables. Both benchmarks are compiled into a single executable:
Platform | Wrapper Classes – Size | Extension Types – Size | Improvement – Size |
---|---|---|---|
AoT | 5395568 bytes | 5395568 bytes | ±0.00% |
Dart2Js | 129649 bytes | 129738 bytes | -0.07% |
I’m not 100% sure what I would expect here. Smaller executables don’t necessarily have to be better, as bigger executables can actually be faster because of more inlining happening. In the end, the results are still interesting because they prove that the extension types don’t have a negative impact on performance.
However, the small difference in the Dart2Js output sizes actually made me wonder what has changed, so I diffed the files.
A.Vector3CloneBenchmark.prototype = {
run$0() {
- var i, j, t1, t2;
+ var i, j, _this, _this0;
for (i = -500; i <= 500; i += 0.75)
for (j = -500; j <= 500; j += 0.75) {
- t1 = new Float32Array(3);
- t1[2] = 0;
- t1[1] = i;
- t1[0] = j;
- t2 = new Float32Array(3);
- t2[2] = t1[2];
- t2[1] = t1[1];
- t2[0] = t1[0];
+ _this = new Float32Array(3);
+ _this[2] = 0;
+ _this[1] = i;
+ _this[0] = j;
+ _this0 = new Float32Array(3);
+ _this0[2] = _this[2];
+ _this0[1] = _this[1];
+ _this0[0] = _this[0];
}
}
};
A.Vector3AddBenchmark.prototype = {
run$0() {
- var i, j, t2,
- t1 = new Float32Array(3);
+ var i, j, _this0,
+ _this = new Float32Array(3);
for (i = -500; i <= 500; i += 0.75)
for (j = -500; j <= 500; j += 0.75) {
- t2 = new Float32Array(3);
- t2[2] = i + j;
- t2[1] = j;
- t2[0] = i;
- t1[2] = t1[2] + t2[2];
- t1[1] = t1[1] + t2[1];
- t1[0] = t1[0] + t2[0];
+ _this0 = new Float32Array(3);
+ _this0[2] = i + j;
+ _this0[1] = j;
+ _this0[0] = i;
+ _this[2] = _this[2] + _this0[2];
+ _this[1] = _this[1] + _this0[1];
+ _this[0] = _this[0] + _this0[0];
}
}
};
There’s no real difference between the two, just the naming of the variables. With gzip compression, this will probably not make much difference. This underlines my assumption that the runtime differences are measurement errors. The compiler seems to have inlined the clone
and add
methods pretty well already.
What conclusion would I draw from this? It’s proven that extension types don’t have a negative impact on performance. But I wouldn’t say that there is a big performance benefit, at least not in my artificial micro benchmark. Maybe it’s not a good idea to rely on a micro benchmark here, because the behavior of the compiler for the different platforms is far too complex to just observe in such a small scope.
As a next step, I should create a more complex benchmark to examine the performance difference in a more real-world situation. This will require implementing more methods of the Vector3
type and possibly extending the benchmark to other types. There are some questions I would like to answer in the next step:
- How does it work for larger functions with more complex operations? This could be tested on something like
Vector3.angleToSigned
, which calls into a bunch of other methods. - How does it behave if the
Vector3
type is used inside another class, like anAabb3
? For example, in a collision detection operation? - How does it work for larger types like
Matrix3
?
It’s already good to see that it doesn’t make things worse – I’m curious to see how it behaves in a real-world example. Stay tuned for a follow-up post.
Tags: performance, vector_math, open-source, math, dart