Notes From The Trenches : Horizon View Multimedia and Graphics Tuning
In the last few weeks I’ve spent a lot of time with different customers from different verticals looking at the performance of multimedia playback and hardware graphics rendering for applications. I thought I already had a good grasp of how all of these technologies work and how best to tune them, but it’s not until you’re presented with varying requirements that it really puts your skills to the test trying to find bottlenecks and in some cases, turning what you think you knew on it’s head.
This post relates solely to VMware Horizon View, but I guess could be applied to any virtual desktop solution, whether that be XenDesktop, vWorkspace or anything else. From a View perspective, we are always hammered with the message “know your use cases”. In many ways, this message gets saturated to the degree that a touch of snow blindness kicks in and you just make some general assumptions about what will work and what won’t.
To recap, at the very least before starting a View proof of concept (PoC), you should be asking yourself the following questions :-
- How many users will require multimedia playback?
- What level of quality is expected? 480p? 720p?
- Will the videos be in full screen or will a smaller window suffice?
- What applications will require OpenGL or DirectX support?
By covering off the above points, you put yourself in a position where the proof of concept can address a high proportion of user requirements. Don’t beat yourself up though if you don’t nail it 100% out of the gate, getting it right on day one is unheard of! Taking each point in turn, let’s look at why the responses to these points matter.
How many users will require multimedia playback?
Quite often, organisations limit access to video streaming sites such as YouTube, Vimeo, iPlayer and others to preserve office bandwidth. From a View perspective, this can do you a favour but there are always uses that may well slip under that radar and result in users having a degraded VDI experience that gives the solution an unmerited bad name. News websites such as Sky News or BBC News all have multimedia content these days, from radio interviews to video clips in small boxes with a full screen option.
The result of this question will help determine an appropriate pool design. If you have users that will use a lot of video (presenters, trainers, academics, for example), then it would make sense to have a dedicated pool of desktops for these users with more resource than would be given to “regular” users. Also, VMware’s recommendation is that good quality multimedia playback requires 2 vCPUs per virtual desktop, which in turn affects your infrastructure sizing requirements as this will impact your host:desktop consolidation ratios.
If the answer to this question “all users” then the next step is to determine what quality is expected.
What level of quality is expected? 480p? 720p?
If there are users expecting full screen 1080p native performance at 60 fps (same as a blu ray player etc.) then this level of expectation should be reset. The nature of a virtual desktop solution means that this won’t be achieved due to bandwidth, compression, hardware resource and other potential bottlenecks along the way. Remember a blu ray player has a single cable running from it to the TV, so there isn’t a bunch of other traffic flying down that cable, the cable length is typically less than a metre (or 3 feet 3⅜ inches in old money!) so it’s barely even an apples to oranges comparison.
In a View environment, PCoIP is the display protocol of choice. RDP is also available, but generally lacks the flexibility and tuning options afforded to us by PCoIP. In addition to this, there are several hardware options to help augment and improve the PCoIP experience. If you recall, Teradici are the owners of PCoIP and VMware use this protocol in View and Teradici have both host card based and zero client based options to offload PCoIP processing to dedicated hardware.
One other thing worth bearing in mind is that in my experience, the human eye can’t really tell the difference between 720p and 1080p, so if you get mired in discussions with end users about this, you’re kind of missing the point. If you can deliver video at a good resolution (720p) and a good frame rate, the rest is just splitting hairs in my opinion.
Will the video be in full screen or smaller screen?
This fact obviously matters because the larger the video playback surface area, the more resource required to push it along. In the tests I’ve done, standard 480p video in a full screen uses around 20MB more GPU RAM than a video embedded into a web page (BBC News is a good example of this). If full screen high quality video playback is required, you need to factor this into pool designs and the PoC. As always though, benchmark it yourself during the PoC phase, as your mileage will inevitably vary.
What applications will require OpenGL or DirectX support?
As we ascend to the more demanding groups of users, it’s key to know who requires extra grunt on their virtual desktop. This again affects pool design but also means that some consideration will need to be given to specialist graphics hardware to enable these users to work effectively in a virtual environment as they do in the physical world. Some examples of applications requiring this level of support include but are not limited to:-
(All OpenGL apps)
- Adobe After Effects, Photoshop CS3/CS4, Premiere Pro
- Google Earth
- Google SketchUp
DirectX seems to be typically used in gaming, so it’s debatable whether or not there’s a use case there in general environments. That being said, one recent customer teaches computer game design and coding, which would place this use case right in the middle of your View deployment.
What are my options?
There is a very good white paper from VMware that discusses graphics acceleration in View and is well worth a read. It focusses primarily on the NVIDIA GRID solution which I’ve discussed previously and is available here. So let’s say you’ve done your scoping exercise and completed a desktop assessment, where do you go from here? There are lots of resources out there if your Google-Fu is up to snuff that will tell you what the different varieties of hardware there are, I’m just going to provide a simple reference based on my testing.
For good multimedia performance, look for a thin or zero client with the Teradici chip installed. I recently tested the 10Zig V1200P Zero Client with a customer and I have to say the multimedia performance was exceptional. This was partly due to the excellent bandwidth to desktop that the customer had, but also with the Tera2 chip installed, the PCoIP processing gets offloaded to a dedicated hardware device, in a similar way to say a TCP/IP Offload Engine works on a network card. Of course it goes without saying that performance with hardware vs software will always be superior and this was the case in my experience.
If you can’t stretch that far budget wise, ensure you follow best practice and have 2 vCPU per virtual desktop as video playback is a purely CPU based operation. If you run up Task Manager on two desktops, one uniprocessor and one multiprocessor, you should see the difference in CPU spiking during multimedia playback. Remember though not to oversubscribe vCPU – there is not a linear improvement the more vCPUs you add to a desktop, in actual fact you could slow the whole environment down. This is well known in the virtualisation community and there is a good explanation as to why here. So in short? No more than 2 vCPUs per desktop unless a specific use case calls for it, which would be dependent on multiprocessor desktop applications.
Applications requiring OpenGL or DirectX support
Primarily in the case of OpenGL, I have seen improved performance using NVIDIA GRID K1 cards operating in vSGA mode. As stated in a previous post, make sure you have the latest versions of ESXi, Horizon View and the NVIDIA VIB on the ESXi host. So how much better is hardware acceleration? A picture says a thousand words, so I’ve pictured below the output from Passmark 3D test that shows a desktop using software rendering and 128 MB video RAM and a desktop using hardware rendering with 128MB video RAM.
Apologies for the general rubbishness of the pictures, they were taken with a phone camera, but it gives you a sense as to what kind of performance you can expect with hardware acceleration. Thankfully Passmark is a free tool and can be downloaded from here.
How much Video RAM should I allocate to my pool?
Great question and somewhat subjective. VMware state that if you are using Windows 7 with Aero (and let’s be honest, most people are) that you should set a value between 64MB – 128MB in the pool settings (I couldn’t find any change in this advice for View 6). In my experience, this is fine for basic use cases, but where multimedia is required and especially good quality video playback, that won’t be enough. We talk all the time about right sizing View deployments, but one gap really in the analysis for me is to how to right size video RAM per pool.
I found a free utility called GPU-Z which is really useful for benchmarking performance during multimedia playback and determining the high watermark for video RAM usage. This can then be taken forward into the View design and ensure multimedia users have enough resource for their use case. GPU-Z can be downloaded here and is pretty simple to use. For my testing purposes, I ran it up during normal Windows navigation (starting and closing apps, web pages etc.)
The above screenshot shows the tool running on a laptop and gives you the kind of idea of what information can be gleaned. The Sensors tab is the one with all the key information, the screen shot below shows the “idle” state of the GPU during normal operations.
As you can see above, we’re already at 206MB of video RAM and we’re not really doing anything. If your pool is set to the recommended 128MB RAM, there is an obvious pinch point there already. This will lead to degraded multimedia performance as there isn’t sufficient resource. Playing a small screen video from the BBC News website, the video RAM usage climbs to 284MB as shown below.
And then finally on the full screen version of the video, the usage climbs still further to 363MB. So in this particular case, you’d be looking to set an initial high pool watermark of 370MB video RAM (for example) to give the user sufficient horse power for video playback in View. That being said, the results will most likely be lower in a virtual environment, so make sure you continue to monitor usage during the PoC phase to ensure that the pool video RAM size is neither under or over specced.
In summary, I’d say look at the following:-
- Review end user requirements for graphics performance, including multimedia and application support
- Spend time tuning PCoIP if you find that bandwidth is a constraint, but remember this is usually a balance and/or trade off between picture quality and playback smoothness. Audio quality is also affected by tuning maximum audio bandwidth
- Conduct a PoC to level expectation appropriately
- Use free tools such as Passmark and GPU-Z to accurately benchmark the environment and right size the capacity
- Obtain eval units if you are going down the thin/zero client route and test out all the different use cases you know of to see which unit is most appropriate
- vSGA with NVIDIA GRID cards can be very cost effective when applications require additional 3D resource
- Consider the use of a Teradici APEX 2800 card to offload some graphics processing to dedicated hardware in the host (caveat : I haven’t tested this and Teradici – feel free to loan me one!)