3D Performance Benchmarks in Azure Virtual Desktop

Written by Toby Skerritt | Jul 23, 2021 11:45:00 AM

In this blog I’m going to look at 3D performance in Azure Virtual Desktop, using the SKUs available in the West Europe region at the time of writing. This includes offerings from both AMD and Nvidia, the current market leaders in this space.

I’m also going to provide some very old-school apples to apples comparisons using synthetic benchmarks (gasp) - they can be controversial, as synthetic benchmarks don’t give a true view how hardware will perform for real world applications. But in this instance there is very little information out there to compare the available Azure options, so I wanted to provide some rough guidance as to how the options compare against each other including costs, which may help to provide an entry point.

The objective here is to compare performance in dedicated desktops, rather than investigate how a graphics-enabled machine would improve the experience on session hosts, I’ll leave this topic for a future blog.

What’s on Offer?

*** This blog has been updated to include the NC_T4_v3 series ***

Currently there are 2 classes of machine pitched as being “designed for remote visualization, streaming, gaming, encoding, and VDI scenarios”,

The NVas_v4 series are powered by AMD and offer 4 different variants, including either 1/8, 1/4, 1/2 or a whole virtualised Radeon instinct Radeon MI25 GPU.
The NVs_v3 series are powered by Nvidia and offer either 1, 2 or 4 Tesla M60 GPUs (although confusingly, 1 ‘Azure’ GPU equals 1/2 of a ‘physical’ card)
Additionally, Microsoft are now recommending the use of NCasT4_v3-series machines to customers with more demanding workloads. These machines are built on the newer architecture AMD EPYC processors married to a Nvidia single Tesla T4 GPU, and their performance advantage should show in these tests.

Testing

For the testing, I am going to use a combination of Unigine’s Heaven and Superposition benchmarks as well as running Furmark (yes, it’s old), which will give us a good view on raw computational power. I was planning to include 3D mark in this testing, but due to its strict benchmarking requirements and dislike for running within an RDP session, I had to drop it. All benchmarks are undertaken in windowed mode with a window resolution of 1920x1080. The Heaven benchmark was run at Ultra settings, whereas Superposition was run with Medium settings.

Furmark - https://geeks3d.com/furmark/

Notice the temperature – all our AMD SKU machines ran at 88c. Although this may be a misread, it would concern me if it was my hardware.

Unigine Heaven 4.0 - https://unigine.com/

Unigine Superposition - https://unigine.com/

NV4as_v4 (£0.35 per hour / £258.44 per month)

The NV4as_v4 is the cheapest, but also lowest spec of our test machines, sporting just 1/8 of a Radeon MI25 GPU. Performance was not acceptable for graphically intensive 3D applications, but this is to be expected as this machine is designed to lightly accelerate browsing, providing a reasonably smooth experience in web pages and for video playback.

NV8as_v4 (£0.71 per hour / £516.87 per month)

The NV8as_v4 includes 1/4 of a Radeon MI25 GPU. Performance was roughly double that achieved by the NV4as_v4, leading to very smooth browser use and video playback. Light 3D applications and basic 3D modelling is possible with this machine.

NV16as_v4 (£1.42 per hour / £1,034.29 per month)

The NV16as_v4 includes 1/2 of a Radeon MI25 GPU. Performance was double that achieved by the NV8as_v4. Smooth 3D applications use and general 3D modelling is possible with this machine. AMD specify this as the sweet-spot for CAD software usage on Azure:

NV32as_v4 (£2.83 per hour / £ 2,068.58 per month)

The NV32as_v4 includes a whole Radeon MI25 GPU. Performance was superb, with framerates exceeding what could be delivered across my network connection. This is an expensive machine, and should therefore only be used for complex high resolution modelling with high polygon counts.

NV12s_v3 (£1.47 per hour / £1,076.18 per month)

The first of the NVidia cards tested, the NV12s_v3 performed very well, showing a 10-20% increase in performance over the similarly priced NV16as_v4. Smooth 3D applications use and general to advanced 3D modelling is possible with this machine.

NV24s_v3 (£2.95 per hour / £2,151.27 per month)

The NV24s_v3 is equipped with 2 Tesla M60 cards, which show as discrete devices in Device Manager – and this is where I hit a problem. No matter which benchmarks I ran, I was unable to make use of the second card. I looked for SLI related options and trawled the web briefly to see if guidance was available, but I came up blank. My assumption here is that the two cards are intended to provide a pool of aggregated GPU resources in a session host, rather than in an SLI-like configuration, but as it stands I couldn’t make use of the 2 cards in combinations, so ended up with the same results as the NV12s_v3, and results actually dropped by around 20% in FurMark.

If I have missed something, please let me know in the comments and I will revise this article. There is an additional NV48s_v3 size above this, which runs with 4 GPUs, but I did not test this due to the issues encountered here.

NC4as_T4_v3 (£0.63 per hour / £458.11 per month)

The NCas_T4 machines run a newer generation of hardware, and it shows. Performance for this relatively cheap machine was significantly above that of the older generation hardware.

NC8as_T4_v3 (£0.98 per hour / £727.81 per month)

The NC8as_T4_v3 doesn’t offer a GPU upgrade over the NC4as_T4_v3, it does however offer 4 additional CPU cores and double the RAM (56GB vs 28GB). In these tests we saw no difference in performance, however if you were working with complex physical calculations such as fluid modelling, you may find that this larger machine is required.

NC16as_T4_v3 (£1.67 per hour / £1,235.43 per month)

This machine simply bumps memory from 56GB in the NC8as to 110GB, as we aren’t touching the sides of our memory capacity with these tests, we will skip this size. Safe to say, if you are facing RAM restrictions with NC8as machines, upgrade to NC16as

NC64as_T4_v3 (£5.89 per hour / £4,302.55 per month)

The largest size on offer, I’d like to take a look at this machine as it includes a massive 4 Tesla T4 GPUs, 64 CPU cores and 440GB of RAM. After my failure to make use of the second GPU in the NV24s_v3, Ill leave this one for a future blog.

Conclusions

Although important, framerate has less of a bearing on CAD experience than Video playback or gaming, for example. 15fps should be the minimum target for a reasonable user experience, with 30fps providing smooth panning and rotation for models. However, for animation focused activities, users will require a consistent minimum of 30-60fps (I haven't tested this scenario in AVD). The graph below offers an all-up comparison showing averaged framerate and cost.

Yes, these are synthetic tests, and quite old ones at that, but they serve a purpose – they help us to judge the available SKU’s with reference to each other. Of the older GPU generation machines, both the AMD based NV16as_v4 and Nvidia based NV12s_v3 offer solid 3D performance, with the NV12s_v3 just edging it in terms of cost vs performance. As the cheapest option, The NV4as_v4 provides a smooth web browsing experience, with very little 3D capability.

In the previous version of this article, I stated that anyone looking to perform light 3D modelling in Azure should start with the NV8as_v4 and move to either an NV12s_v3 or NV16as_v4 if more performance is required. However after testing the NC4as_T4_v3, its now obvious that this machine offers drastically improved performance at a lower cost, therefore I would now recommend the NC4as_T4_v3 for all 3D workloads. If you find that the machine become RAM constrained, moving to the NC8as_T4_v3 or NC816as_T4_v3 should resolve this.

The generational improvements in NCas_T4 machines shone through both in terms of cost and performance, with the NC4as_T4_v3 in particular offering the best overall performance at one of the lowest price points. In my testing, these machines were capable of running the Unigine Superposition benchmark at 2160p (UHD) resolution with max settings at an average of 22fps, the additional RAM of the NC4as_T4_v3 was not required here.

View full post