Updated: Aug 4, 2022
For quite some time now, Nvidia has been dominating the virtual graphics market. With virtual graphics I am referring to the graphical acceleration of either a Server Based Computing session like RDS or Xenapp or VDI such as VMware View and Citrix Xendesktop. Nvidia has found a very scalable way of giving a large number of users per physical graphics card that acceleration they are going to require more and more. AMD has been announcing their capabilities for a while now, but have only recently brought something real to the market. Are they on par with Nvidia yet? Hmmmm, maybe not entirely, since there are a number of key differences to the setup. Let's explain some history first, to grant you some context.
Graphics acceleration options in these kinds of environments have been numerous, ranging from cpu emulation to passthrough and everything in between. The usecase for a number of them has dwindled seriously in the last years, due to the limitations of these solutions. A few have relied on using the CPU in the server to give users a basic form of acceleration, often granting little more than the ability to run the Windows Aero shell, but not much else. Others have been limited to certain versions of Directx and OpenGL, making them obsolete as they did not evolve. There has always been the option of passthrough, giving a PCI or PCI-E device directly to a VM running on a hypervisor. Although this would present you with every feature the concerning device would offer, this was hardly scalable. A costly endeavour to put a number of videocards in one server and not leaving much room to spare.
Nvidia created their vGPU Manager software, sitting between the hypervisor and the graphics cards, effectively granting the capability to overcommit the GPU's, as we have been doing so long with CPU's. Combined with putting several GPU's (up to 4) on one board, a very scalable solution emerged, being supported by both VMware and Citrix, making it possible to even run CAD, CAM, BIM and other high graphics software in a VDI or SBC environment.
Each virtual machine is given a profile, which consists of the GPU based on timeslicing, and a reserved portion of the video memory on the card. Scalability is high, as this allows up to 64 VM's to use this card simultaneously (Nvidia GRID M10). I'm sure you can imagine however what would happen if all these VM's would request cycles from the GPU's at the same time... For this setup to work, you need to test this thoroughly. I don't mean benchmarking, since this would be useless. It would only show you what one GPU on the card can do when you grant it to one user. You need users to use the entire solution as they would in a production environment to be able to see how the performance would be.
AMD has a very different approach to the situation. Like Nvidia, AMD offers a card sporting multiple GPU's on the same board, but there is no software in between the hardware and the hypervisor. The card and it's gpu's can effectively be split in sections, granting the concerning virtual machine an equal portion of the card, both in GPU power and in video memory.
This has a very predictable performance characteristic, and ensures users are not hindered by other users, since they are granted their respective portion of the cards potential. This ofcourse has a major drawback, since it cuts the entire performance in as much as 16 slices, leaving users with only 1/16th of the cards power. It is also dependant on SR-IOV. This technology gives one card in a PCI-E slot the ability to present itself as multiple devices. Presenting the devices to their respective VM's relies on direct passthrough, so there are limitations in automated rollout of Virtual Desktop Pools. One other major difference is the fact that application certification is a lot simpler on the AMD solution, since there is no difference in the drivers used for the solution. If your application was certified on an AMD Radeon Firepro solution, you're almost certain it will work flawlessly on the virtual implementation.
The use cases for the current solutions from Nvidia and AMD are somewhat different. AMD is a good fit when you want a few dedicated CAD desktops, hosted applications based on citrix or RDS or an accelerated Citrix environment and without automation in rollout. Nvidia is a good fit for an environment where you need your desktops to be rolled out automatically, you need to scale up and down very fast and where the things your users request from the graphics card are very predictable or known up front.
The best way to make a good decision on whether you need a solution like this, and if the solution fits your needs is still a good inventory and sizing of your needs and the application landscape, but how to accomplish this, is another blogpost. Any questions? Drop me a line...