DP4a built-in functions support in WGSL
DP4a (Dot Product of 4 Elements and Accumulate) refers to a set of GPU instructions commonly used in deep learning inference for quantization. It efficiently performs 8-bit integer dot products to accelerate the computation of such int8-quantized models. It can save (up to 75%) of the memory and network bandwidth and improve the performance of any machine learning models in inferencing compared with their f32 version. As a result, it's now heavily used within many popular AI frameworks.
When the "packed_4x8_integer_dot_product"
WGSL language extension is present in navigator.gpu.wgslLanguageFeatures
, you can now use 32-bit integer scalars packing 4-component vectors of 8-bit integers as inputs to the dot product instructions in your WGSL shader code with the dot4U8Packed
and dot4I8Packed
built-in functions. You can also use packing and unpacking instructions with packed 4-component vectors of 8-bit integers with pack4xI8
, pack4xU8
, pack4xI8Clamp
, pack4xU8Clamp
, unpack4xI8
, and unpack4xU8
WGSL built-in functions.
It's recommended to use a requires-directive to signal the potential for non-portability with requires packed_4x8_integer_dot_product;
at the top of your WGSL shader code. See the following example and issue tint:1497.
if (!navigator.gpu.wgslLanguageFeatures.has("packed_4x8_integer_dot_product")) {
throw new Error(`DP4a built-in functions are not available`);
}
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const shaderModule = device.createShaderModule({ code: `
requires packed_4x8_integer_dot_product;
fn main() {
const result: u32 = dot4U8Packed(0x01020304u, 0x02040405u); // 42
}`,
});
Special thanks to Intel's Web Graphics team in Shanghai for driving this specification and implementation to completion!
Unrestricted pointer parameters in WGSL
The "unrestricted_pointer_parameters"
WGSL language extension loosens restrictions on which pointers can be passed to WGSL functions:
Parameter pointers of
storage
,uniform
andworkgroup
address spaces to user declared functions.Passing pointers to structure members and array elements to user declared functions.
Check out Pointers As Function Parameters | Tour of WGSL to learn more about it.
This feature can be feature-detected using navigator.gpu.wgslLanguageFeatures
. It's recommended to always use a requires-directive to signal the potential for non-portability with requires unrestricted_pointer_parameters;
at the top of your WGSL shader code. See the following example, the WGSL spec changes, and issue tint:2053.
if (!navigator.gpu.wgslLanguageFeatures.has("unrestricted_pointer_parameters")) {
throw new Error(`Unrestricted pointer parameters are not available`);
}
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const shaderModule = device.createShaderModule({ code: `
requires unrestricted_pointer_parameters;
@group(0) @binding(0) var<storage, read_write> S : i32;
fn func(pointer : ptr<storage, i32, read_write>) {
*pointer = 42;
}
@compute @workgroup_size(1)
fn main() {
func(&S);
}`
});
Syntax sugar for dereferencing composites in WGSL
When the "pointer_composite_access"
WGSL language extension is present in navigator.gpu.wgslLanguageFeatures
, your WGSL shader code now supports access to components of complex data types using the same dot (.
) syntax whether you're working directly with the data or with a pointer to it. Here's how it works:
If
foo
is a pointer:foo.bar
is a more convenient way to write(*foo).bar
. The asterisk (*
) would normally be needed to turn the pointer into a "reference" that can be dereferenced, but now both pointers and references are much more similar and almost interchangeable.If
foo
is not a pointer: The dot (.
) operator works exactly as you're used to for directly accessing members.
Similarly, if pa
is a pointer that stores the starting address of an array, then using pa[i]
gives you direct access to the memory location where the 'i
'th element of that array is stored.
It's recommended to use a requires-directive to signal the potential for non-portability with requires pointer_composite_access;
at the top of your WGSL shader code. See the following example and issue tint:2113.
if (!navigator.gpu.wgslLanguageFeatures.has("pointer_composite_access")) {
throw new Error(`Pointer composite access is not available`);
}
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
const shaderModule = device.createShaderModule({ code: `
requires pointer_composite_access;
fn main() {
var a = vec3f();
let p : ptr<function, vec3f> = &a;
let r1 = (*p).x; // always valid.
let r2 = p.x; // requires pointer composite access.
}`
});
Separate read-only state for stencil and depth aspects
Previously, read-only depth-stencil attachments in render passes required both aspects (depth and stencil) to be read-only. This limitation has been lifted. Now, you can use the depth aspect in read-only fashion, for contact shadow tracing for instance, while the stencil buffer is written to identify pixels for further processing. See issue dawn:2146.
Dawn updates
The uncaptured error callback set with wgpuDeviceSetUncapturedErrorCallback()
is now called immediately when the error happens. This is what developers consistently expect and want for debugging. See change dawn:173620.
The wgpuSurfaceGetPreferredFormat()
method from the webgpu.h API has been implemented. See issue dawn:1362.
This covers only some of the key highlights. Check out the exhaustive list of commits.
What's New in WebGPU
A list of everything that has been covered in the What's New in WebGPU series.
Chrome 132
- Texture view usage
- 32-bit float textures blending
- GPUDevice adapterInfo attribute
- Configuring canvas context with invalid format throw JavaScript error
- Filtering sampler restrictions on textures
- Extended subgroups experimentation
- Improving developer experience
- Experimental support for 16-bit normalized texture formats
- Dawn updates
Chrome 131
- Clip distances in WGSL
- GPUCanvasContext getConfiguration()
- Point and line primitives must not have depth bias
- Inclusive scan built-in functions for subgroups
- Experimental support for multi-draw indirect
- Shader module compilation option strict math
- Remove GPUAdapter requestAdapterInfo()
- Dawn updates
Chrome 130
- Dual source blending
- Shader compilation time improvements on Metal
- Deprecation of GPUAdapter requestAdapterInfo()
- Dawn updates
Chrome 129
Chrome 128
- Experimenting with subgroups
- Deprecate setting depth bias for lines and points
- Hide uncaptured error DevTools warning if preventDefault
- WGSL interpolate sampling first and either
- Dawn updates
Chrome 127
- Experimental support for OpenGL ES on Android
- GPUAdapter info attribute
- WebAssembly interop improvements
- Improved command encoder errors
- Dawn updates
Chrome 126
- Increase maxTextureArrayLayers limit
- Buffer upload optimization for Vulkan backend
- Shader compilation time improvements
- Submitted command buffers must be unique
- Dawn updates
Chrome 125
Chrome 124
- Read-only and read-write storage textures
- Service workers and shared workers support
- New adapter information attributes
- Bug fixes
- Dawn updates
Chrome 123
- DP4a built-in functions support in WGSL
- Unrestricted pointer parameters in WGSL
- Syntax sugar for dereferencing composites in WGSL
- Separate read-only state for stencil and depth aspects
- Dawn updates
Chrome 122
- Expand reach with compatibility mode (feature in development)
- Increase maxVertexAttributes limit
- Dawn updates
Chrome 121
- Support WebGPU on Android
- Use DXC instead of FXC for shader compilation on Windows
- Timestamp queries in compute and render passes
- Default entry points to shader modules
- Support display-p3 as GPUExternalTexture color space
- Memory heaps info
- Dawn updates
Chrome 120
- Support for 16-bit floating-point values in WGSL
- Push the limits
- Changes to depth-stencil state
- Adapter information updates
- Timestamp queries quantization
- Spring-cleaning features
Chrome 119
- Filterable 32-bit float textures
- unorm10-10-10-2 vertex format
- rgb10a2uint texture format
- Dawn updates
Chrome 118
- HTMLImageElement and ImageData support in
copyExternalImageToTexture()
- Experimental support for read-write and read-only storage texture
- Dawn updates
Chrome 117
- Unset vertex buffer
- Unset bind group
- Silence errors from async pipeline creation when device is lost
- SPIR-V shader module creation updates
- Improving developer experience
- Caching pipelines with automatically generated layout
- Dawn updates
Chrome 116
- WebCodecs integration
- Lost device returned by GPUAdapter
requestDevice()
- Keep video playback smooth if
importExternalTexture()
is called - Spec conformance
- Improving developer experience
- Dawn updates
Chrome 115
- Supported WGSL language extensions
- Experimental support for Direct3D 11
- Get discrete GPU by default on AC power
- Improving developer experience
- Dawn updates
Chrome 114
- Optimize JavaScript
- getCurrentTexture() on unconfigured canvas throws InvalidStateError
- WGSL updates
- Dawn updates