i stumbled upon this while tweaking my setup for the latest project - instead of dealing w/ those pesky multi-gpu setups or quantization headaches.
blackwell just lets you run everything smoothly, even on bigger models. it's like they finally solved that age-old bottleneck.
but here's a question: has anyone tried using blackwell in conjunction with
cuda-memory-pool
? i bet there'd be some serious performance gains if we could optimize both together!
found this here:
https://www.freecodecamp.org/news/the-evolution-of-nvidia-blackwell-gpu-memory-architecture/