So this blog posting is a relatively short one, but hopefully it will be useful to those doing some stateless desktop implementations. One of the big problems in a stateless desktop deployment is the issue of identity persistence. By that, I mean that when a user logs into a desktop, the stateless model is comparable to the metaphor of the “next cab off the rank”. We have no idea which desktop will be used and by whom, it’s something of a random process.
This can and does work well in a vast majority of cases – where profile management solutions are in use (such as AppSense, RES or vendor specific implementations from VMware or Citrix), we can inject some user specific information back into the environment at logon and/or application open/close which means that users are not materially impacted by the stateless model and can work as they normally would with all their apps and settings available, irrespective of which desktop they happen to pluck out of the pool.
Where this tends to come unstuck is where the idea of identity needing to be persistent or predictable comes into play. Let me use a real life example. The engagement I’m currently working on is for a customer whose entire business is underpinned by an application on a mainframe server. This is not uncommon for businesses with long histories, and to be honest, it generally “just works” as it’s a solution that’s been in play for a number of years (sometimes decades) and has been refined over time to become an indispensable business tool. That’s all well and good and worked like a charm in the days of green screen “dumb terminals” (which I’m sad to say I’m just about old enough to remember!) and even as Windows and other GUIs came in, we still had the ability to use terminal emulation software to connect back to the mainframe and carry on as usual.
In my particular case, the customer is using a suite of connectivity products called Hummingbird. There isn’t anything spectacularly exotic about this, but the issue then becomes one of identity. In the “fat client” world, it’s easy enough to configure a connection and save it locally, so no matter who logs on, the “All Users” profile means that the correct settings (including LU name, which is the specific slot on the mainframe reserved for this workstation) are always available and never have to be fiddled with.
The difference of course in a stateless environment is that we never really know which desktop a user will connect to the mainframe from. It could be Desktop01, or Desktop02 or even Desktop50. Because of this, we need a way to tie down the LU identity so we know that regardless of which user logs into a thin client and regardless of which virtual desktop they are given, that particular thin client will always use the same connection details to the mainframe.
I thought long and hard about this, and came up with several solutions which while they worked, never felt truly elegant and required updating each time a new thin client was brought into the environment. When you’re talking about a couple of thousand devices, this design quickly becomes impractical. In this particular environment, we are using Wyse T10 devices which have a factory preset device name (string value starting with WT), which can be altered in the config mode to pretty much anything you like. As well as this, we’re using Citrix XenDesktop 5.6 and AppSense Environment Manager 8.2.
A colleague stumbled across a Windows utility called SETX.EXE. Apparently it’s been around for years, but no-one seems to have heard of it before. Essentially what it does is create an environment variable based on input from a registry key, arguments or a file. Citrix creates entries in the registry for where the desktop session has originated from, which are called ClientName and ClientIPAddress. What we did was to use SETX.EXE to read these values from the registry and store them in a custom environment variable.
What we then did was to copy the Hummingbird configuration files to a network share (one folder per thin client) and use an AppSense Environment Manager policy to copy the appropriate configuration files from the network to the virtual desktop using the User Logon node. The logic was basically thus:-
Use SETX.EXE to create the environment variable ClientName and populate it with the reciprocal value from the registry
Copy \\share\configs\%ClientName%\*.hep to C:\ProgramData\Hummingbird\Connectivity\13.00\Profile\Startup
It’s as simple as it’s elegant and means it doesn’t matter how many thin clients we add or what we call them, as long as the share exists, is populated with the correct files and permissions are correct, when Hummingbird is started, the sessions will start automatically (hence the use of the Startup folder). Hint – to do this, you need to add the “-*” switch to the desktop shortcut.
I know I’ve rambled a bit, but hopefully SETX.EXE can be a useful Swiss Army Knife tool you can store in your VDI deployment armory for future use!
In the previous blog post, I discussed how LoginVSI can help benchmark your VDI or SBC environment and provide some performance metrics on where the performance bottlenecks are likely to occur when the solution is heavily loaded. As discussed previously, you’ll have the following components set up and configured:-
LoginVSI share (hosted on a Windows Server or Samba share where the Windows 7 20 concurrent connection restriction does not apply)
LoginVSI Launcher workstations (with the Launcher setup run in advance)
LoginVSI Target desktop pools (with the Target setup run in advance and Microsoft Office installed)
Active Directory script run to configure the required LoginVSI users and groups and add the Group Policy settings to those users (turns off UAC, amongst other things)
Ensure statistics logging is working properly on vCenter (assuming a vSphere infrastructure)
Once the environment has been configured and you have your pool of desktops spun up, it is recommended that all virtual desktops be left to “sit” idle for a while, this is so that they reach “steady state” before the tests commence. Steady state is essentially where all desktops have started, launched all start up services (anti-virus scanners, “call home” services or applications, Windows services) and disk activity has settled down to an idle tick, rather than thrashing as it does when it starts. What’s worth bearing in mind is that if all virtual desktops are on the same datastore that it may take several minutes for steady state to be reached, depending on disk latencies. In my particular tests, I had between 100-120 desktops spun up at once and I left the pool to sit for around 20 minutes before running any LoginVSI workloads.
How do you know if steady state has been reached? I used the vSphere client to look at CPU and memory usage of each virtual machine and waited until the utilisation dropped down to a minimum. After a few test runs, you will start to get an idea of where steady state is, as each desktop build is slightly different, depending on applications and services installed. It’s not imperative you do this, but if you read the white papers produced by the major VDI stack vendors (Microsoft, Citrix, VMware, NetApp etc.), you will find this is something they tend to do.
At this stage, it’s often prudent to perform a few test runs, just to ensure that everything is running as you expect. You can also use these test runs to perform some workload tuning, such as time delays between sessions starting. As discussed in the previous post, if you set this value too aggressively, you can saturate your hypervisor host very quickly, and this can negatively skew results. Plus, is this the reality of how your users will use your VDI environment? Is it likely that you will have 100 users logging in a near simultaneous manner in a three or four minute window? In most cases you’d probably say no. The obvious exception to this would be an educational environment, in which dozens (even hundreds in a University or College setting) of users would login at the same time and start several applications after login. In a commercial or non-academic environment, generally users login over a much larger time frame and even when they’re logged in, they are far more inclined to make long phone calls or make a coffee, resulting in significant periods of idle time.
As a tip, use the calculator built into the Management Console to compute the time delays between the number of sessions and make sure they represent “real life” numbers, such as a login every 6 minutes etc.
During my testing with a customer, we would make a single environmental change and then analyse the results – for example, changing the amount of memory given to the virtual desktops (1.5GB vs 2GB, for example), or an extra vCPU, or a change to the underlying storage fabric. In this respect, LoginVSI can also be used to model environmental changes, a “what if” type of analysis. This can be especially useful if you are conducting a performance analysis of new storage to validate a vendor’s claims, or a “what if we add 20 more virtual desktops to this host” scenario.
The end goal is the result of the VSI Max, which is essentially the “tipping point” of performance. This is established in a way that I still don’t truly understand (and I read the explanation several times!), but in essence is calculated by capturing the delay intervals in between performing tasks in the target workload. There are embedded timers within the workloads that spawn activities such as reading Outlook messages, or playing a Flash video and the intervals between activities are randomised, so as to imitate real life usage. A baseline average response time is calculated and when delays increase, the VSI Max value is obtained. This value basically represents the maximum number of virtual desktops per host before performance significantly degrades.
In our particular test case, we were looking to achieve a density of 100 desktops per vSphere blade. This figure was reached after a capacity planning exercise – so VMware’s Capacity Planner was deployed to a bunch of workstations in a “knowledge worker” use case – users who generally have medium to high task demands – using Outlook to send messages, opening large spreadsheets, manipulating graphics intensive slide decks etc. As a result, 100 desktops was considered an appropriate density based on the Capacity Planner results and the specification of the hypervisor hardware.
The VSIMax validates the design of the solution and gives both the solution architect and the end users/customers confidence that the VDI solution is fit for purpose. The graphic below shows the output from three tests run that validate the design for 100 desktops. You will need to install the VSI Analyser to compare the results, using the Comparison Wizard:-
Running The Tests
I’d recommend running at least three iterations of your test cycle to ensure a reliable result. What you should find is that each result is generally quite close together and this way you can average out the VSIMax over the three runs of the test. That being said, on odd occasions you may see freak results (generally at the lower end of the performance spectrum) and it’s worth discarding this result and performing another test iteration. This can happen for a variety of reasons, such as the pool not being in a steady state, for example. Several simultaneous power cycle operations on a pool can cause performance degredation.
So let’s say you’ve built your solution to meet the needs of a 100 simultaneous virtual desktop connections, but your VSIMax figure averages out well below that figure (worryingly so!). Where do you go from here? At this stage, this is where performance of the hypervisor host comes into play. In our particular test, the hypervisor in use is vSphere. This is good because vCenter automatically collects performance statistics and stores them in the database, so we don’t need to babysit real time statistics to know where the bottleneck is, we can just look back restrospectively in vCenter.
The main areas to look at first for performance bottlenecks include:-
There are other metrics we can look at, but it’s likely that in a high proportion of cases the bottleneck has been caused by one of the three main resources listed above. Looking at processor first, we can obtain graphs from vCenter for the lifetime of the test run (so please make sure you make a note of the start and stop times of the tests!). Export the information and select the processor, memory and datastore check boxes so we keep data to a minimum to start with.
Looking at the graph above from vCenter, we can see variable saturation of processor resource. The main takeaway from this result is that CPU utilisation never exceeds ~65%, so we can see quite clearly from the off that CPU is not the limiting factor in this particular test scenario.
To continue the investigation, we now need to take a look at the memory resource to see if this is the constraining resource. As we can see from the chart, again memory is not the issue. Although the memory usage hovers around maximum, it is a little below.
20GB of physical RAM is available in the ESXi host, and as we can see by the performance chart, memory is heavily utilised for most of the test but does not max out. So taking into account CPU and memory performance during the testing, we have enough spare capacity in these resources to service 100 virtual desktops. We’re making good progress in ruling out the performance bottleneck, but we haven’t found it yet! Onwards to the datastore performance charts!
Looking at the performance charts for the datastore, we can clearly see an issue with performance straight away. The chart shows high latencies for both read and write performance, in the worst case we can see a latency of 247ms for write operations to one datastore in use.
So the question here is, what is an acceptable disk latency? In broad terms, the following values are a reasonable rule of thumb :-
Sub 10 ms – excellent, should be the target performance level
10-20 ms – indicates a problem, may cause noticeable application/infrastructure issues
20 ms or greater – indicates unacceptable performance, applications and services such as virtual desktops will exhibit significant performance issues
Depending on your workload, you may well see spikes in performance at the storage level. These spikes can be acceptable as by definition they are sporadic and rare and generally do not impact long term performance. Microsoft lists acceptable disk latency spikes for SQL Server as 50ms, for example. I don’t know I especially agree with this number, but they know SQL Server a lot better than I do!
Looking at the performance charts, we can see that the disk is the bottleneck. The latencies at the disk level are quite severe, and would result in a much lower VSIMax value than what was originally planned for. If we can add bandwidth to the disk layer, we can improve the density of virtual desktops per hypervisor host. In this case, we had local SAS disks in a RAID1 configuration. Even though third party storage appliances were in use to try and improve throughput, the physical disks themselves could not sustain the level of performance required.
As such, the desktop pool was moved to SAN based storage, which was a Fibre Channel storage on a NetApp storage device. One LUN was configured to host the desktop pool datastore, in a one to one relationship, as per best practices. As the storage now in use is enterprise grade, we would expect the disk latencies to be significantly reduced. As mentioned before, LoginVSI can be a really useful for tool modeling configuration changes and their impact and this is a good example. We’ve already proved that CPU and memory are not fully utilised, and that the disk latencies are causing a lower than expected VSIMax value.
The performance graph for a virtual desktop datastore on the NetApp datastore shows a much reduced latency of (on average) under 1 ms. As stated previously, any latency under 10ms is excellent, anything sub 1 ms is jet propelled! Now we have identified and removed the performance bottleneck, our VDI solution will scale to the required number of 100, as per the original design. Obviously CPU, memory and datastore are only a subset of the possible performance metrics we could have obtained, but any bottleneck is most likely to be around those resources.
Also, we could look at such metrics as network, but we’d be most likely to look at those metrics if for example mouse movement was delayed, or keystrokes were slow. In a LoginVSI test scenario as the virtual desktops are designed to be “stand alone”, there should be minimal network traffic anyway.
Hopefully the two posts on LoginVSI have provided some guidance on how you can benchmark your environment, and also identify and rectify any bottlenecks that prevent you from scaling to the designed limits. I’d quite like to present this topic as a slide deck at a VMUG somewhere, sometime. Please let me know if that’s something you’d like to see!
As I mentioned in a previous post, I’ve spent the last few weeks working with a product called Login VSI. What does it do? Well it essentially forms part of a virtual desktop deployment toolkit in the sense that it helps to benchmark performance of a VDI or SBC (Server Based Computing, such as Remote Desktop Services/Terminal Server) environment and essentially provide accurate end user performance metrics (OS and application response times) to outline the “tipping point” in performance of a VDI deployment.
For those who’ve already done several VDI deployments, you’ll already know the level of detail (and in some senses, educated guesswork) that goes into designing a solution. The types of questions posed include :-
– How many desktops do I need?
– How many IOPS do I need?
– How many physical disks do I need to provide the amount of IOPS?
– What sort of user metrics do I have from desktop assessment phases of the project?
– What are the requirements on the network fabric?
There are a lot more questions along similar lines, but all are important in the design of the solution to ensure it is fit for purpose. Once all numbers have been crunched, a design comes out of the other end that we hope will cut the mustard when it’s put into production.
Login VSI can help in this instance because it simulates users logging into the SBC/VDI environment and performing tasks expected of end users. As such, there are several pre-defined workloads that can be used to simulate real life examples. For example, the medium workload (which comes with the free licence) simulates a user logging in, browsing their Outlook mailbox, manipulating a Word document, PowerPoint presentation, Excel spreadsheet, PDF document, ZIP archive and website browsing with a Flash component (Kick-Ass trailer, which is a very funny movie if you haven’t seen it already!). Timers are built into the process to simulate random wait times when a user drinks coffee, sends a text or talks to a colleague, for example. There’s nothing so random as a human being, so it’s not precise but it does represent a “scattered” workload as you’d see in reality.
The refreshing approach from Login VSI is that you don’t need to spin up a SQL Server to capture your performance metrics and environment configuration (don’t you just get tired of having to commission a SQL box every time you need to fart?). This means that as well as reduced initial cost, the complexity is lower and the time to be up and running is reduced. All you need to provide are four elements :-
– LoginVSI Share (can be anywhere on the network, but must be reachable and writable by all devices used in the test)
– Login VSI Launcher (Windows machine that can be physical or virtual, which essentially performs the logins and spawns the test workloads)
– Login VSI Target (Windows machine that has MS Office pre-installed, along with some other tools such as Flash Player, BullZip, Internet Explorer)
– Active Directory (a Login VSI OU is created, along with a Group Policy Object and some scripts that get copied into the NETLOGON share)
The good news is that you don’t need to rummage around dusty corners of the internet to get these tools, each of the four parts above some with their own installer. A handy graphic lifted from Login VSI’s website below illustrates the simple architecture of the product :-
One thing that caught me out was that my VSI Share was on a Windows 7 machine. This would be fine on a very small scale, but I was caught out by the fact that Windows 7 shares do not permit more than 20 simultaneous connections. Login VSI exhibits the behaviour that the target sessions starts and the user logs in, but the desktop just sits there and does not spawn any application sessions. This had me confused for quite a while as there are no error messages as such. If you go to one of the stalled desktops, unlock KidKeyLock (by typing vsiquit) and type in the UNC path of the VSI Share in Start | Run, you will see an error about the number of concurrent connections to a Windows 7 share. Save yourself a lot of time and put the VSI Share on a Windows server or Samba share!
In a VMware View or XenDesktop environment, you need to run the target setup routine on your master image before you spin up a desktop pool/catalog. This ensures that all of the desktops to be tested have all the appropriate software installed. You also need to ensure you have Microsoft Office installed in advance. Any version from 2003 upwards is fine, but if you’re testing Office 2007, it’s recommended to install SP2 beforehand, as there are some known issues with Outlook that are resolved by this patch.
Once you have your VSI Share, your launcher workstation(s) (each launcher will take a maximum of 50 targets, though my testing tended to work better with a maximum of around 35) and your targets, you’re pretty much set. The next stage from here is to configure your environment using the Management Console. The main points of interest here are configuring the launcher names and configuring the workload settings, such as workload type (light, medium etc.) and also peripheral settings such as the Microsoft Office version (if the wrong version is listed, this can prevent the automated workload from running successfully). The management console itself is pretty straight forward and self explanatory.
The screen shot above shows the test configuration. This is where the workload type is selected (Light, Medium etc.) and also connection settings to the VDI environment. As you can see from the above screen shot, Python is being used to connect to a Citrix XenDesktop web interface. This is because the login screen for Web Interface had been customised, and the Citrix connector for Login VSI could not recognise buttons on the screen such as login and selecting the available desktops. Citrix themselves provide some Python scripts to provide connectivity and these work just fine. In a View environment, the existing Login VSI connector would probably work just fine, as would a “vanilla” XenDesktop environment.
The next step before actually getting to the testing phase is to define your launcher machines (use the Windows NetBIOS name, rather than a DNS name or IP address, or you’ll likely see a few errors) and configure what settings you want for the workload itself. In my experience, the only setting you really need to look at is the Office version string, so 14 for Office 2010, 12 for Office 2007 and 11 for Office 2003. The screenshot below illustrates the settings.
You also have the option of creating custom workloads, but this is not something I have experience of and to be honest, not something I really had a need to use. If you just need some general benchmarks from your VDI environment, the Medium workload is recommended and used by most vendors when they produce performance white papers for their VDI solution (See Microsoft, Citrix and Equalogic for examples).
At this stage, I’m not going to get too invested in the nuts and bolts of how the whole process works, but needless to say if you’ve got this far, you’re pretty much ready to go. None of the workloads require access to the internet, nor do they require a connection to an Exchange server or any other network location. All workloads are fully isolated and self contained. If you’ve done all the setup and configuration successfully, you’re now at the stage where you can actually run some tests. Consult the Login VSI documentation for session specific settings, such as number of sessions, time delays between starting sessions (try and make this value sensible, so you don’t saturate your VDI hypervisor within a few minutes of starting the test, although if you’re simulating an academic environment, this may be important to you).
Once you’re ready to start the sessions, you should have the launcher agent running on your launcher workstations (a command prompt box that pings the VSI share for work to do) and all target machines spun up and ready to be logged into. In part two of this blog, I’ll tell you more about how to interpret the results. Stay tuned!