Fullscreen filters often require input pixels to be fetched multiple times! Depth of Field, SSAO, Blur, etc. Memory bandwidth often still a bottleneck Post-processing, compression, etc.
PDev11->Dispatch(nX, nY, nZ) void M圜S( uint3 groupID: SV_GroupID, uint3 groupThreadID: SV_GroupThreadID, uint3 dispatchThreadID: SV_DispatchThreadID, uint groupIndex: SV_GroupIndex) groupID.xyz: group offsets from Dispatch() groupID.xyz є (0.nX-1, 0.nY-1, 0.nZ-1) Constant within a CS thread group invocation groupThreadID.xyz: thread ID in group groupThreadID.xyz є (0.X-1, 0.Y-1, 0.Z-1) Independent of Dispatch() parameters dispatchThreadID.xyz: global thread offset = groupID.xyz*(X,Y,Z) + groupThreadID.xyz groupIndex: flattened version of groupThreadID
PIXEL SHADER 4.0 DOWNLOAD WINDOWS 10 FULL
Texture sampling and filtering instructions Explicit derivatives required Execution not limited to fixed input/output Thread model execution Full control on the number of times the CS runs Read/write access to “on-cache” memory Thread Local Storage (TLS) Shared between threads Synchronization support Random access writes At last! Enables new possibilities (scattering)Ģ0 CS Threads A thread is the basic CS processing elementĬS declares the number of threads to operate on (the “thread group”) void M圜S(…) To kick off CS execution: pDev11->Dispatch( nX, nY, nZ ) nX, nY, nZ: number of thread groups to execute Number of thread groups can be written out to a Buffer as pre-pass pDev11->DispatchIndirect(LPRESOURCE *hBGroupDimensions, DWORD dwOffsetBytes) Useful for conditional execution CS 5.0 X*Y*ZDispatch(3, 2, 1)
AI, pathfinding, physics, compression…ġ9 CS 5.0 Features Supports Shader Model 5.0 instructions Independent of the graphic pipeline New industry standard for GPGPU applications CS enables general processing operations Post-processing Video filtering Sorting/Binning Setting up resources for rendering Etc. Declaration AppendBuffer MyAppendBuf Access to write counter (Raw Buffer only) uint uCounter = MyRawAppendBuf.IncrementCounter() Append data to buffer MyRawAppendBuf.Store(uWriteCounter, value) MyStructuredAppendBuf.Append(StructElement) Can specify counters’ start offset Similar API for Consume and reading back a bufferġ6 Atomic Operations PS and CS support atomic operationsĬan be used when multiple threads try to modify the same data location (UAV or TLS) Avoid contention InterlockedAdd InterlockedAnd/InterlockedOr/InterlockedXor InterlockedCompareExchange InterlockedCompareStore InterlockedExchange InterlockedMax/InterlockedMin Can optionally return original value Potential cost in performance Especially if original value is required More latency hiding requiredġ8 Compute Shader Intro A new programmable shader stage in DX11 indexing across constant buffer slots Index must be a constant expression Texture2D txDiffuse : register(t0) Texture2D txDiffuse1 : register(t1) static uint Indices = Īppend Buffer allows new data to be written at the end of the buffer Raw and Structured Buffers only Useful for building lists, stacks, etc. Vertex Shader Hull Shader Domain Shader Geometry Shader Pixel Shader Some instructions/declarations/system values are shader-specific Pull Model Shader subroutinesĦ Uniform Indexing Can now index resource inputsīuffer and Texture resources Constant buffers Texture samplers Indexing occurs on the slot number E.g.
PIXEL SHADER 4.0 DOWNLOAD WINDOWS 10 WINDOWS 7
Will be released alongside Windows 7 Runs on Vista as well Supports downlevel hardware DX9, DX10, DX11-class HW supported Exposed features depend on GPU Allows the use of the same API for multiple generations of GPUs However Vista/Windows7 required Lots of new features…ĥ SM5.0 Basics All shader types support Shader Model 5.0