Archive for November, 2010

Virtualization – Under the Hood (Part II)

November 19, 2010

This is a continuation of my last post (Virtualization – Under the Hood (Part I)).

Q: Can you actually have more memory allocated than available physical memory? And how?

Short Answer: Yes. Through  many techniques including: Transparent page sharing, page reclamation, balloon driver, etc.

Long Answer: You can actually start many VMs with total allocated memory that is more than the physical memory available on the server because not all applications will utilize 100% of their requested memory at all times. Page reclamation allows the hypervisor to reclaim unused (previously committed) pages from one VM and give it to another. Another technique a hypervisor may use is to allow VMs to share memory without them knowing it! Sounds scary, but nonetheless manageable by the hypervisor. This allows more VMs to start with their full requirements of allocated memory met, although they may be sharing memory pages with other VMs. Lastly, there is the approach of ballooning memory out of a VM. This is more of a respect driven approach by the hypervisor where it requests memory from all executing VMs, and they would voluntarily balloon out all the memory pages they are not using. Once they need the memory back, the hypervisor will send it back after obtaining it from other VMs using any of the methods above. Swapping pages with the hypervisor is an expensive operation. That is why you should always start your VM with a pre-set Reservation amount (minimum amount of memory guaranteed to the VM by the hypervisor). However, the more you reserve upon start up of your VM, the less VMs can be fired up on that same physical host.

Q: How do you make sure your application is highly available?

Short Answer: It depends on the approach, and the virtualization suite you are using. You either take things into your own hand and cluster your application over 2+ VMs, and make sure you replicate necessary data over to the redundant VM(s), or use the tools provided by virtualization suite to move the VM to a less utilized or crowded host.

Long Answer: High availability of VMs can be jeopordized in one of two ways:

1. Either your VM is running on a highly utilized host, making your applications less responsive. In this approach you can utilize the virtualization suite to transfer or migrate your running VM to another host that is less crowded. vSphere provides vMotion, which is used by their HA appliance to migrate your VM to another host without taking the VM down! They actually start copying your VM byte by byte starting with the section of your memory that is not or under utilized at the moment, while keeping track of all “dirtied” pages since the last transfer to re-transfer again to keep it consistent on the new host. At some point the hypervisor of the first machine will turn off the VM, while the other hypervisor on the target machine turns it on simultaneously. Microsoft added Live Migration to their R2 release of HyperV to do just that. There are many metrics and thresholds that can be configured to trigger such an action. Dynamic Resource Scheduling (DRS) in vSphere allows you to set those parameters and move away from the cluster, DRS will manage to move your VM from one host to another to ensure highest availability and accessibility.

2. When a host goes down, another VM needs to fire to start taking requests. This can be done using virtualization suite tools (only when data replication is not required). However, when you need to cluster your data as well then you will need to introduce data replication yourself such as Microsoft SQL Server clustering. This will allow the new VM to immediately serve requests as soon as the first VM goes down. Of course there will need to be some sort of switch control at the virtualization suite management level or using an external context switch appliance such as NetScaler.

Q: Is virtualization the equivalent of parallel processing on multiple processors?

Short Answer: Yes and no.

Long Answer: Virtualization introduces the capability of borrowing CPU cycles from all CPUs physically available on the host. The side-effect of this is introducing the effect of parallel processing. However, the only reason a hypervisor would want to borrow cycles from another CPU is because the first CPU it had access to is fully utilized. So, technically, you are not really parallelizing to run things in parallel, but rather to use as much of the CPU cycles as your application needs to run its single- or multi-threaded code.

Q: Since we are running multiple VMs on the same host, doesn’t that mean we share the same LAN?? Wouldn’t that be a security threat if one application on one VM was missconfigured to access an IP of a service on another VM?

Short Answer: Yes. But we don’t have to share the same network even among VMs on the same host.

Long Answer: You can create virtual LANs even between VMs running on the same physical host. You can even use firewalls between VMs running on the same host. This way you can create DMZs that keep your applications (within your VM) safe no matter which VMs are running on the same host.

Q: Since a hypervisor emulates hardware, does that mean that my guest operating systems are portable, EVEN among different hardware architectures?

Short Answer: Yes.

Long Answer: Most virtualization suites support x86 architectures because they are “better” designed to take advantage of virtualization. It also depends on the virtualization suite you are using, and what guest OS it supports (for example vSphere does not support AIX). Additionally, although in theory those guest OS and their hosted applications are portable, it also depends on the hypervisor’s own implementation of drivers on the system. The hypervisor code does not use the drivers installed inside the guest OS, but its own set of drivers. The implementation could vary from one system to another, one device to another. So, you definitely may end up with different behaviors or performance on different hardware even using the same VM.

Note: The Open Virtualization Format (OVF) is a standard format to save VM in so you can migrate them to another virtualization suite (not just hardware!) However, not many virtualization tools support this format yet.

Q: What about security? Who controls access to VMs?

Short Answer: Virtualization suites provide user management. This list is separate from application users.

Long Answer: There are many layers of user roles and permission management in a virtualization suite, depending on the suite itself. Typically, you can create users, define their role, their access to VM, and what type of permission they get. You can even create pools of VMs and apply the same set of user role/permission combination. This eliminates having to manage security and authentication on each individual hypervisor, and instead, do the management across a host of them.

Q: Ok, ok, ok. How about the dark side of virtualization?

Short Answer: There are many problems or potential problems with virtualization. It is not all roses.

Long Answer: There could be many reasons why not to use virtualization including:

1. With virtualization, now you are collapsing the system administration and networking team (and possibly security as well) into one team. Most (if not all) virtualization suites do not provide various roles of managing the virtualized datacenter based on those divisions. Once you have an administrator access to managing the virtualized datacenter, all divisions are off at that point. This can be seen as a good thing. However, it is mostly a bad thing because a system administrator is not necessarily a person that is highly specialized in dissecting the network among all the various applications based on the requirements and capabilities of the enterprise.

2. Upgrading or updating one application or VM requires a lot more knowledge in its requirements and potential effects on other VMs on the same host. For example, if an application doubles its memory requirements, the IT administrator managing the virtual environment must know even if the increase in requirement comes on a host that has that enough physical memory. In a traditional environment, as long as the physical memory is available, the IT administrator deploying the updates or upgrades does not necessarily need to know of the new memory requirements of the application as long as no additional physical memory needs to be attached to the server. This change forces administrators of the virtual environment to be more aware and knowledgeable of the applications and VMs running in their system, which is not a hard-line requirement in traditional systems.

3. If the number of VMs fall under 10 or so per host, then you maybe adding more overhead than realizing the benefits of virtualizing your machines.

4. Testing and debugging system events are a lot more involved now as an administrator has to chase the VM wherever it goes in order to check the trail of events across those machines, plus look at the guest OS even logs to complete the picture before understanding the problem.

5. Created VMs will require physical space as well (for the VM files themselves). This is an overhead, and if not managed correctly you may end up bursting your storage capacity bubble with over-creating VMs.

6. Miscellaneous: expensive management tools, new IT skills to be learned, single point of failure (if one host goes down, it will take down many VMs with it), more bandwidth headache if one physical host starts up (making many VMs initialize while starting up at the same time), etc.


Virtualization – Under the Hood (Part I)

November 19, 2010

Sometimes we take long pauses thinking about something someone said. We are not discarding what is being said, but we are not acknowledging it either. Sometimes, the person repeats the same sentence again, maybe in different tone or structure. And we may continue to pause, although we may shift our eyes to the speaker’s eyes to express a less formal way of saying: I heard you, but I am still thinking about it! It may take some time before we let the first word out of our lips, just to start the conversation.

Virtualization is not a new concept by any means. It started in the 1960s as a form to virtualize memory to trick system applications into thinking there is more memory to play with than there actually is.  This evolved into the concept of time-sharing on mainframes where individual applications were made to believe they have all the resources they need, although shared and restricted in time.  I am going to skip a few generations to talk about today’s virtualization to avoid duplicating what could be obtained from well-maintained definitions and history lessons on virtualization all over the Internet. However, the exhibition of long pauses over the concept of virtualization and its potential extended over generations rather than a few seconds before we realized what we really have at hand.

VMware is certainly the leader in the market today through its vSphere offerings, with Microsoft’s HyperV behind but making long strides to catch up. There are many other virtualization suites offered by Cisco (mainly around unified communication services), Citrix (mainly around virtual desktops), and open source hypervisors such as xenapps.

We hear a lot about virtualization. But, it is like one of those topics that you “know” but you don’t really “know”.  We all know why objects fall towards the center of the Earth, but we don’t really know “why” objects fall towards the center of the Earth (we know gravity, but we don’t understand why negative and positive charges attract). We always hear about virtualization. It is a catchy word in an industry with a long list of names, acronyms and abbreviations.  But, do we all know the basics of how it works? What it is? How it can answer some of the main enterprise questions that haunt administrators during their sleep? How it can actually add its own set of issues that make an administrator’s nightmare not so … virtual? I explored some of those questions and decided to take a few minutes from my sleep (that is the only way my hypervisor can lend me those valuable resources) to share them here. I will address those in the form of questions and answers, simply because that is the easiest way to get to the point without dancing around it.

Q: Is virtualization a software or a hardware technology?

Short Answer: Both.

Long Answer: There are two types of hypervisors (Type 1: runs directly on the hardware, and Type 2: runs on a host OS). Type 1 is a hardware virtualization solution simply because that plays well with our definition of hardware (we define hardware to be anything below the supervisor code in an OS, and since hypervisors are below the supervisor code, then it is a hardware solution). However, there is another type of hypervisors, type 2, where the hypervisor runs on top of the host OS as another application. This is not very common for a lot of reasons including the requirement to modify the OS to accommodate virtualization, OR settling for a major overhead by the hypervisor to do the translation to host OS terms and not being able to optimize drivers. The conversation is too deep for this post, but there are many types of hardware virtualization (hardware-assisted virtualization, paravirtualization, partial virtualization, etc.) where you have a mix of hardware-assisted virtualization added to software-ful hypervisors.

Q: Why virtualize?

Short Answer: To get the most out of our idle resources today.

Long Answer: How much time do you have? There are a lot of reasons to virtualize your OS and its applications. Here is a short list:

1. Most applications today utilize around 10-15% CPU. With virtualization, that utilization increases. “Virtualization is extraordinarily expensive if the number of VMs fall below 10 on a physical machine” (Mike Rubesch – Purdue University Director of IT infrastructure systems.) You also have to think about the overhead to run the hypervisor which will take a few CPU cycles as well.

2. Less physical space. If you combine 10+ applications on the same physical host, you could potentially decrease the number of your servers by a factor of 10+.

3. Quick turn-around. It is much faster to create a VM from a VM template than to build a machine.

4. Introduces leaner process. The concept of leasing VMs and expiring them enables IT administrators to build VMs for a pre-determined length of time. In traditional systems, such a machine may go unused after a certain period of time, adding more overhead and headache.

5. Easy to handle disaster recovery (DR), high availability (HA), and fail-over (FA) as most of virtualization suites include tools to easily manage those concerns (such as vSphere’s and HyperV’s HA and FA products, enabled by vMotion and Live Migration).

6. And many other reasons that I don’t want to talk about here including: one stop security management, lower energy cost, self-contained infrastructure requirements, portability of machines, etc.

Q: What are applications that are best kept un-virtualized?

Short Answer: Applications that require fastest possible responses or CPU-bound.

Long Answer: You can virtualize everything, but you cannot offer more bandwidth throughput than your server can physically handle. All applications requiring heavy I/O such as databases and other applications that write to disk, may be tremendously slowed by virtualization. I have come across a few companies that refuse to virtualize their database servers. Furthermore, since virtualization is an overhead (an additional layer between the application and hardware that does more management and work across all VMs on the same physical machine), you can actually see slowness in running your application. Although the worst of those were no more than 20% decline in performance, nowadays, you rarely see such high declination levels, but overhead is definitely above 0%. So, if your application queries the hardware clock for nanoseconds, then maybe it is not a good idea to run it on a virtualized machine.

More to come…