Virtual machines need to share memory

Topic: 

A big trend in systems operation these days is the use of virtual machines -- software systems which emulate a standalone machine so you can run a guest operating system as a program on top of another (host) OS. This has become particularly popular for companies selling web hosting. They take one fast machine and run many VMs on it, so that each customer has the illusion of a standalone machine, on which they can do anything. It's also used for security testing and honeypots.

The virtual hosting is great. Typical web activity is "bursty." You would like to run at a low level most of the time, but occasionally burst to higher capacity. A good VM environment will do that well. A dedicated machine has you pay for full capacity all the time when you only need it rarely. Cloud computing goes beyond this.

However, the main limit to a virtual machine's capacity is memory. Virtual host vendors price their machines mostly on how much RAM they get. And a virtual host with twice the RAM often costs twice as much. This is all based on the machine's physical ram. A typical vendor might take a machine with 4gb, keep 256mb for the host and then sell 15 virtual machines with 256mb of ram. They will also let you "burst" your ram, either into spare capacity or into what the other customers are not using at the time, but if you do this for too long they will just randomly kill processes on your machine, so you don't want to depend on this.

The problem is when they give you 256MB of ram, that's what you get. A dedicated linux server with 256mb of ram will actually run fairly well, because it uses paging to disk. The server loads many programs, but a lot of the memory used for these programs (particularly the code) is used rarely, if ever, and swaps out to disk. So your 256mb holds the most important pages of ram. If you have more than 256mb of important, regularly used ram, you'll thrash (but not die) and know you need to buy more.

The virtual machines, however, don't give you swap space. Everything stays in ram. And the host doesn't swap it either, because that would not be fair. If one VM were regularly swapping to disk, this would slow the whole system down for everybody. One could build a fair allocation for that but I have not heard of it.

In addition, another big memory saving is lost -- shared memory. In a typical system, when two processes use the same shared library or same program, this is loaded into memory only once. It's read-only so you don't need to have two copies. But on a big virtual machine, we have 15 copies of all the standard stuff -- 15 kernels, 15 MYSQL servers, 15 web servers, 15 of just about everything. It's very wasteful.

So I wonder if it might be possible to do one of the following:

  • Design the VM so that all binaries and shared libraries can be mounted from a special read-only filesystem which is actually on the host. This would be an overlay filesystem so that individual virtual machines could change it if need be. The guest kernel, however, would be able to load pages from these files, and they would be shared with any other virtual machine loading the same file.
  • Write a daemon that regularly uses spare CPU to scan the pages of each virtual machine, hashing them. When two pages turn out to be identical, release one and have both VMs use the common copy. Mark it so that if one writes to it, a duplicate is created again. When new programs start it would take extra RAM, but within a few minutes the memory would be shared.

These techniques require either a very clever virtualizer or modified guests, but their savings are so worthwhile that everybody would want to do it this way on any highly loaded virtual machine. Of course, that goes against the concept of "run anything you like" and makes it "run what you like, but certain standard systems are much cheaper."

This, and allowing some form of fair swapping, could cause a serious increase in the performance and cost of VMs.

Comments

Why is RAM so expensive in these shared hosting setups? What's determining these costs and why is it coming down so slowly?

Yes, most hosting companies charge much more for ram that it costs you in the store. Some of that is mark-up, to make a profit. And part of it is as a proxy. Since ram limits what you can do with your server, the more ram you want, the more you plan to do, and the more you are willing to pay. You see this a lot. ISPs charge more for extra IP addresses, not because they cost much, but because they are a sign you will be a higher-end user, and use more of other things that are harder to bill for.

A hosting company has mostly fixed costs and shares resources. So to decide how many people they can put on a machine, they need something, and RAM is a common choice.

You can have swap in vm. You can use a file or lvm and expose them to vm as a swap disk.

Also resource isolation is one of the main reasons you want to have VM in the first place.

Yes, as I noted you can have swap, but it is not typically done because a VM that is swapping will take away resources from other VMs in a way that's harder to contain. VM services I have bought have disabled swap for this reason.

There are many reasons for virtual machines. And if you could get it to work properly, swapping would actually help the resource allocation. However, there is no virtue in any event of having every VM use of the ram for its own copy of every running binary. Sharing this ram would benefit all VMs and the system as a whole.

Sun Solaris Zones virtualisation creates isolated execution environments for operating system instances within a single host. The instances appear separate to applications, but actually share a number of core resources. One of the side effects is that they also use a common virtual memory pool and necessarily only have single instances of shared objects (page 21 of Solaris Containers Technology Architecture Guide at http://www.sun.com/blueprints/0506/819-6186.pdf).

This software virtualisation permits sharing between instances with no performance impact, reduce the overall memory footprint and introduce economies of scale to the hosting companies. I'd doubt that they would pass this saving on to customers as there seems to be a major problem in accounting for the shared use of these resources by the standard tools.

Add new comment