文件列表見:Rust 移動端跨平臺複雜圖形渲染專案開發系列總結(目錄)
gfx-rs/gfx是一個Rust編寫的底層、跨平臺圖形抽象庫,包含如下層或元件:
- gfx-HAL
- gfx-backend-
- Metal
- Vulkan
- OpenGL,開發中,由於GL與下一代介面Vulkan差異過大,這個模組可能做不完
- OpenGL ES,開發中,由於GL與下一代介面Vulkan差異過大,這個模組可能做不完
- DirectX 11
- DirectX 12
- WebGL,開發中,由於GL與下一代介面Vulkan差異過大,這個模組可能做不完
- gfx-warden
本文件只考慮master分支,對應HAL新介面,忽略pre-II老介面。 另外,只考慮用gfx-hal實現離線渲染和計算著色器功能,即,渲染到紋理和GPGPU。渲染到視窗及滑鼠、鍵盤事件處理可參考gfx自帶DEMO。
初始化具體圖形庫後端
gfx-hal介面近乎1:1仿造Vulkan介面,可以參考Vulkan各種教程,Vulkan的操作從Instance建立開始。
#[cfg(any(feature = "vulkan", feature = "dx12", feature = "metal"))]
let instance = backend::Instance::create("name", 1 /* version */);
複製程式碼
建立不同的裝置和佇列需要介面卡滿足不同的能力要求,下面逐一描述。
建立不同功能的裝置
整體流程為backend::Instance::create()
-> enumerate_adapters()
-> open_with()
。
enumerate_adapters()
是目前可用的Adapter列表,在這一步我們先選擇支援指定佇列功能要求的介面卡。在下一步開啟具體邏輯Device時直接返回true。open_with()
第一個引數count表示要開啟的QueueFamily數量,目前移動裝置通常只有一個GPU,此時傳遞1即可,我目前主觀認為開啟多個QueueFamily並不能提高整個App的圖形效能。
配置完在macOS上執行可得到類似如下資訊,配置細節參考後面內容。
AdapterInfo {
name: "Intel Iris Pro Graphics",
vendor: 0,
device: 0,
device_type: IntegratedGpu }
Limits {
max_texture_size: 4096,
max_texel_elements: 16777216,
... }
Memory types: [
MemoryType { properties: DEVICE_LOCAL, heap_index: 0 },
MemoryType { properties: COHERENT | CPU_VISIBLE, heap_index: 1 },
...]
複製程式碼
只支援渲染的裝置
渲染是圖形裝置存在的意義,故簡單粗暴地取出第1個介面卡進行後面操作。
let mut adapter = instance.enumerate_adapters().remove(0);
let (mut device, mut queue_group) = adapter
.open_with::<_, Graphics>(1, |_family| true
.unwrap();
複製程式碼
只支援計算的裝置
考慮到低版本的OpenGL不支援Compute Shader,此時需要過濾,如果只編譯Metal/Vulkan,和上面一樣enumerate_adapters().remove(0)
即可。相應地,open_with()
作了調整。
let mut adapter = instance
.enumerate_adapters()
.into_iter()
.find(|a| {
a.queue_families
.iter()
.any(|family| family.supports_compute())
})
.expect("Failed to find a GPU with compute support!");
let (mut device, mut queue_group) = adapter
.open_with::<_, Compute>(1, |_family| true)
.unwrap();
複製程式碼
同時支援渲染+計算的裝置
同上,介面卡過濾條件和open_with()
都調整成同時滿足渲染與計算要求。
let mut adapter = instance
.enumerate_adapters()
.into_iter()
.find(|a| {
a.queue_families
.iter()
.any(|family| family.supports_graphics() && family.supports_compute())
}).expect("Failed to find a GPU with graphics and compute support!");
let (mut device, mut queue_group) = adapter
.open_with::<_, General>(1, |_family| true)
.unwrap();
複製程式碼
有了Device和QueueGroup可開始建立Image(可看作Vulkan版Texture)、Pipeline等資源。
建立資源
Buffer
建立Buffer
Buffer和Image本身並不儲存資料,它們表達了儲存資料要滿足的條件,這些條件用於建立Memory。
let usage = buffer::Usage::TRANSFER_SRC | buffer::Usage::TRANSFER_DST;
let unbound = device.create_buffer(required_size_in_bytes, usage).unwrap();
複製程式碼
required_size_in_bytes
:需要分配的記憶體大小,單位位元組,在此只作為一個標誌,實際建立儲存空間操作由後面介紹的Memory實現。usage
:根據Buffer的實際用途進行配置,不同組合對效能影響較大,需注意。
銷燬Buffer
device.destroy_buffer(buffer);
複製程式碼
連線Buffer與Memory
buffer = device.bind_buffer_memory(&memory, 0, unbound).unwrap();
複製程式碼
bind_buffer_memory
後Buffer物件才擁有實際的儲存空間,但是,資料還是存在Memory物件中。後續更新Buffer掛接的資料只需要對映Memory進行修改。
連線Buffer和Memory後,在macOS上通常輸出如下資訊,其中length = 256
的256與前面輸出的Limits某一項相關,具體內容視顯示卡而定:
buffer = Buffer { raw: <MTLIGAccelBuffer: 0x7fa9a472d470>
label = <none>
length = 256
cpuCacheMode = MTLCPUCacheModeDefaultCache
storageMode = MTLStorageModeShared
resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModeShared
purgeableState = MTLPurgeableStateNonVolatile, range: 0..6, options: CPUCacheModeDefaultCache | StorageModeShared }
memory = Memory { heap: Public(MemoryTypeId(1), <MTLIGAccelBuffer: 0x7fa9a472d470>
label = <none>
length = 256
cpuCacheMode = MTLCPUCacheModeDefaultCache
storageMode = MTLStorageModeShared
resourceOptions = MTLResourceCPUCacheModeDefaultCache MTLResourceStorageModeShared
purgeableState = MTLPurgeableStateNonVolatile), size: 256 }
複製程式碼
建立BufferView
let format = Some(format::Format::Rg4Unorm);
let size = data_source.len();
let buffer_view = device.create_buffer_view(buffer, format, 0..size);
複製程式碼
銷燬BufferView
device.destroy_buffer_view(buffer_view);
複製程式碼
Memory
Memory分配用於儲存Buffer和Image所需資料的記憶體空間。
建立Memory
// A note about performance: Using CPU_VISIBLE memory is convenient because it can be
// directly memory mapped and easily updated by the CPU, but it is very slow and so should
// only be used for small pieces of data that need to be updated very frequently. For something like
// a vertex buffer that may be much larger and should not change frequently, you should instead
// use a DEVICE_LOCAL buffer that gets filled by copying data from a CPU_VISIBLE staging buffer.
let upload_type = memory_types
.iter()
.enumerate()
.position(|(id, mem_type)| {
mem_req.type_mask & (1 << id) != 0 && mem_type.properties.contains(memory::Properties::CPU_VISIBLE)
})
.unwrap()
.into();
let mem_req = device.get_buffer_requirements(&unbound);
let memory = device.allocate_memory(upload_type, mem_req.size).unwrap();
複製程式碼
Memory寫入
Memory的讀寫都要對映相應的Write/Reader,為了執行緒安全,需要手工加上合適的Fence。
let mut data_target = device.acquire_mapping_writer::<T>(&memory, 0..size).unwrap();
data_target[0..data_source.len()].copy_from_slice(data_source);
device.release_mapping_writer(data_target);
複製程式碼
Memory讀取
let reader = device.acquire_mapping_reader::<u32>(&staging_memory, 0..staging_size).unwrap();
println!("Times: {:?}", reader[0..numbers.len()].into_iter().map(|n| *n).collect::<Vec<u32>>());
device.release_mapping_reader(reader);
複製程式碼
Image
建立Image
類似Buffer物件,Image物件本身也不儲存實際的紋理資料。
let kind = image::Kind::D2(dims.width as image::Size, dims.height as image::Size,
1/* Layer */, 1/* NumSamples */);
let unbound = device
.create_image(
kind,
1,
ColorFormat::SELF,
image::Tiling::Optimal,
image::Usage::TRANSFER_DST | image::Usage::SAMPLED,
image::StorageFlags::empty(),
)
.unwrap();
複製程式碼
同樣,建立Image時指定的Usage
也要根據Image的實際用途來組合,不合理的組合會降低效能。
銷燬Image
device.destroy_image_view(image_view);
複製程式碼
連線Image到Memory
let image = device.bind_image_memory(&memory, 0, unbound).unwrap();
複製程式碼
建立ImageView
建立Sampler
組織繪製命令
建立Submission
hal-buffer建立、讀寫
buffer::Usage::TRANSFER_SRC | buffer::Usage::TRANSFER_DST,
複製程式碼
功能、區別
let (staging_memory, staging_buffer, staging_size) = create_buffer::<back::Backend>(
&mut device,
&memory_properties.memory_types,
memory::Properties::CPU_VISIBLE | memory::Properties::COHERENT,
buffer::Usage::TRANSFER_SRC | buffer::Usage::TRANSFER_DST,
stride,
numbers.len() as u64,
);
複製程式碼
let (device_memory, device_buffer, _device_buffer_size) = create_buffer::<back::Backend>(
&mut device,
&memory_properties.memory_types,
memory::Properties::DEVICE_LOCAL,
buffer::Usage::TRANSFER_SRC | buffer::Usage::TRANSFER_DST | buffer::Usage::STORAGE,
stride,
numbers.len() as u64,
);
複製程式碼
{
let mut writer = device.acquire_mapping_writer::<u32>(&staging_memory, 0..staging_size).unwrap();
writer[0..numbers.len()].copy_from_slice(&numbers);
device.release_mapping_writer(writer);
}
複製程式碼
Metal模組
fn create_buffer(
&self, size: u64, usage: buffer::Usage
) -> Result<n::UnboundBuffer, buffer::CreationError> {
debug!("create_buffer of size {} and usage {:?}", size, usage);
Ok(n::UnboundBuffer {
size,
usage,
})
}
複製程式碼
fn get_buffer_requirements(&self, buffer: &n::UnboundBuffer) -> memory::Requirements {
let mut max_size = buffer.size;
let mut max_alignment = self.private_caps.buffer_alignment;
if self.private_caps.resource_heaps {
// We don't know what memory type the user will try to allocate the buffer with,
// so we test them all get the most stringent ones.
for (i, _mt) in self.memory_types.iter().enumerate() {
let (storage, cache) = MemoryTypes::describe(i);
let options = conv::resource_options_from_storage_and_cache(storage, cache);
let requirements = self.shared.device.lock()
.heap_buffer_size_and_align(buffer.size, options);
max_size = cmp::max(max_size, requirements.size);
max_alignment = cmp::max(max_alignment, requirements.align);
}
}
// based on Metal validation error for view creation:
// failed assertion `BytesPerRow of a buffer-backed texture with pixelFormat(XXX) must be aligned to 256 bytes
const SIZE_MASK: u64 = 0xFF;
let supports_texel_view = buffer.usage.intersects(
buffer::Usage::UNIFORM_TEXEL | buffer::Usage::STORAGE_TEXEL
);
memory::Requirements {
size: (max_size + SIZE_MASK) & !SIZE_MASK,
alignment: max_alignment,
type_mask: if !supports_texel_view || self.private_caps.shared_textures {
MemoryTypes::all().bits()
} else {
(MemoryTypes::all() ^ MemoryTypes::SHARED).bits()
},
}
}
複製程式碼
fn allocate_memory(&self, memory_type: hal::MemoryTypeId, size: u64) -> Result<n::Memory, AllocationError> {
let (storage, cache) = MemoryTypes::describe(memory_type.0);
let device = self.shared.device.lock();
debug!("allocate_memory type {:?} of size {}", memory_type, size);
// Heaps cannot be used for CPU coherent resources
//TEMP: MacOS supports Private only, iOS and tvOS can do private/shared
let heap = if self.private_caps.resource_heaps && storage != MTLStorageMode::Shared && false {
let descriptor = metal::HeapDescriptor::new();
descriptor.set_storage_mode(storage);
descriptor.set_cpu_cache_mode(cache);
descriptor.set_size(size);
let heap_raw = device.new_heap(&descriptor);
n::MemoryHeap::Native(heap_raw)
} else if storage == MTLStorageMode::Private {
n::MemoryHeap::Private
} else {
let options = conv::resource_options_from_storage_and_cache(storage, cache);
let cpu_buffer = device.new_buffer(size, options);
debug!("\tbacked by cpu buffer {:?}", cpu_buffer.as_ptr());
n::MemoryHeap::Public(memory_type, cpu_buffer)
};
Ok(n::Memory::new(heap, size))
}
複製程式碼
fn bind_buffer_memory(
&self, memory: &n::Memory, offset: u64, buffer: n::UnboundBuffer
) -> Result<n::Buffer, BindError> {
debug!("bind_buffer_memory of size {} at offset {}", buffer.size, offset);
let (raw, options, range) = match memory.heap {
n::MemoryHeap::Native(ref heap) => {
let resource_options = conv::resource_options_from_storage_and_cache(
heap.storage_mode(),
heap.cpu_cache_mode(),
);
let raw = heap.new_buffer(buffer.size, resource_options)
.unwrap_or_else(|| {
// TODO: disable hazard tracking?
self.shared.device
.lock()
.new_buffer(buffer.size, resource_options)
});
(raw, resource_options, 0 .. buffer.size) //TODO?
}
n::MemoryHeap::Public(mt, ref cpu_buffer) => {
debug!("\tmapped to public heap with address {:?}", cpu_buffer.as_ptr());
let (storage, cache) = MemoryTypes::describe(mt.0);
let options = conv::resource_options_from_storage_and_cache(storage, cache);
(cpu_buffer.clone(), options, offset .. offset + buffer.size)
}
n::MemoryHeap::Private => {
//TODO: check for aliasing
let options = MTLResourceOptions::StorageModePrivate |
MTLResourceOptions::CPUCacheModeDefaultCache;
let raw = self.shared.device
.lock()
.new_buffer(buffer.size, options);
(raw, options, 0 .. buffer.size)
}
};
Ok(n::Buffer {
raw,
range,
options,
})
}
複製程式碼
let size = data_source.len() as u64;
let mut data_target = device.acquire_mapping_writer::<T>(&memory, 0..size).unwrap();
data_target[0..data_source.len()].copy_from_slice(data_source);
let _ = device.release_mapping_writer(data_target);
複製程式碼
/// Acquire a mapping Writer.
///
/// The accessible slice will correspond to the specified range (in bytes).
fn acquire_mapping_writer<'a, T>(
&self,
memory: &'a B::Memory,
range: Range<u64>,
) -> Result<mapping::Writer<'a, B, T>, mapping::Error>
where
T: Copy,
{
let count = (range.end - range.start) as usize / mem::size_of::<T>();
self.map_memory(memory, range.clone()).map(|ptr| unsafe {
let start_ptr = ptr as *mut _;
mapping::Writer {
slice: slice::from_raw_parts_mut(start_ptr, count),
memory,
range,
released: false,
}
})
}
複製程式碼
fn map_memory<R: RangeArg<u64>>(
&self, memory: &n::Memory, generic_range: R
) -> Result<*mut u8, mapping::Error> {
let range = memory.resolve(&generic_range);
debug!("map_memory of size {} at {:?}", memory.size, range);
let base_ptr = match memory.heap {
n::MemoryHeap::Public(_, ref cpu_buffer) => cpu_buffer.contents() as *mut u8,
n::MemoryHeap::Native(_) |
n::MemoryHeap::Private => panic!("Unable to map memory!"),
};
Ok(unsafe { base_ptr.offset(range.start as _) })
}
複製程式碼