What is cloud computing?
Werner Vogels' definition of cloud computing is as follows:
- public infrastructure: it must be accessible for everyone. That does not mean it needs to be publicly owned.
- Internet accessible: that is the cloud part of the definition. It is "somewhere on the Internet".
- location independent: you do not need to know where your data are, or where your calculation is done. This is not absolutely true though. In the new version of Amazon, you can indicate regions in the the world. This way you can be sure part of your computing or data is processed in one region and another part in another region for greater reliability. Another problem is that in some cases for privacy reasons, it is not allowed to process data outside of a country.
- access to raw resources: if you use a Globus based Grid, you have to validate and adjust your programmes against Globus API's. With cloud computing you get access to raw resources, such as a Virtual Machine in which you can run whatever you want.
- acquire resources on demand: when you need more computing power, you can get it instantly.
- pay for what you use. The only thing you need to start with cloud computing, Werner Vogels said, is a credit card. You can use what you pay for and you only pay for what you use.
- metered like a utility: it is analogue to getting electricity or water.
Of course, Werner Vogels said, many of these items hold for Grid computing too, depending on your definition of Grid computing.
What then is the main difference between Grid computing and Cloud computing? Cloud computing has no middleware: so there is access to raw computing resources. There is no Globus or anything else running that you need to learn.
Why is Amazon in the business of cloud computing? Isn't Amazon a book seller? Yes, Werner Vogels said, but it is important to realize that Amazon is a retailer. It is very good at doing things at a large scale with low margins. Amazon is very good at operating large scale compute environments.
Amazon has a massively distributed system to support its complex retail environments. Currently they are selling almost anything. To get a good customer satisfaction you need to deliver a 99,999% up time. Amazon reaches this number because it connects its services internally based on SLA's. The numbers are also reached even when one whole data centre fails, because of the SLA based service delivery and the built-in redundancy needed to deliver the SLA's.
From the outside about a thousand different services are available. Internally the whole architecture is also services based. According to Werner Vogels, Amazon was doing this already long before services oriented architectures came on vogue. The Amazon architecture is based on a number of foundation services that give access to large datasources, and a level of agregated services that make more specialized services out of these foundation services.
As an example: a hit to the Amazon gateway services goes out to about three hundred independent services to collect the results and present one page to you. Hence, Werner Vogels claims, Amazon has really mastered how to set up a reliable, dynamic, multi-service, large scale distributed system.
Scalability, availability, performance, and cost-effectiveness
To measure how good they do, Amazon looks at the following metrics: scalability, availability, performance, and cost-effectiveness.
Scalability, Werner Vogels said, just means the services can run on a 1000 processors: but it can be that only 500 of them are used. Scalability, as Amazon sees it, means you can add a processor to the service, one at a time, when needed, without the users noticing it.
Availability is defined as a promise: 99,999% availability. Amazon tries to reach 100%. What the availability also means is that if you build your business on top of Amazon, there is a 0,001% change your business will not run due to Amazon. You have to go to your insurance company to cover that 0,001% if that is important to you.
Performance is steered by SLA's. Having SLA's between the services is key to building a reliable system of this size. However, the SLA's are only used in cases when things are under pressure. The service can look at what it has promised.
Cost-effectiveness, explained Werner Vogels, is important when it comes to running a business with small margins as Amazon does. Sometimes that is difficult to explain to technical people. They look, for instance, for the best algorithm to solve a problem. That may not be the most cost-effective algorithm from a business perspective. They have to learn that.
A major property that sets Amazon apart, according to Werner Vogels, is operational excellence. Amazon is very good at operating these large amounts of services. Each service is run by a team that not only is responsible for the development, but also for operating the service. This helps getting high quality services.
The idea behind running Software as a Service, as opposed to more traditional Software as Bits, is that it is much easier. In the traditional software industry, you have a develop - test release - install - configure and operate cycle. In Saas this is said to be reduced top develop - test - operate; and then users can access it over the Internet. But, Werner Vogels said, this is a big lie. Something is missing: between test and operate there is a phase of undifferentiated heavy lifting, that includes hardware costs, software costs, maintenance, load balancing, scaling, bandwidth management, server hosting, and more. It proved that at Amazon this phase took up 70% of the developer teams' time.
Hence Amazon was looking for ways to virtualize this type of activities: give a developer team access to the configurable virtual resources they need to create and operate their specific services on, and centralize the burden of this heavy lifting, making it easier and more cost-effective. That is why Amazon invented web-scale computing - called cloud computing today.
After analysing what was needed, Amazon decided that three environments were needed: a compute environment, that grew into EC2, a storage environment, that grew into S3, and a messaging environment, that grew into SQS. That were the first three internal services that Amazon built and also the first three that were launched externally.
Later new services were and are released. In storage, for instance SimpleDB, that provides indexing capabilities to storage.
This cloud environment can be used for research and for companies. Instead of setting up and operating your own Grid, you just use Amazon's EC2 for instance. You then can concentrate on your work or your research. Werner Vogels believes that Amazon can do this much cheaper and better than anyone else, because they can handle the scale. And, if you want to operate a Grid anyway for good reasons, you can do it on top of EC2 and many people do.
Setting up and maintaining large scale computer facilities is difficult. You cannot just buy servers out of a box. It is much easier to get a WII than to get within a few weeks time the right server configuration. And, of course, one data centre is not enough: when disaster strikes, you are out of business for some time. Even during normal operations everything fails: disk, memory, etc. Everything fails all of the time, Werner Vogels emphasized, especially when you are operating large facilities you need people running around, replacing disks and other components. So running these facilities is a nightmare: do not do it, Werner Vogels said, let us do that for you.
Why would you use Amazon cloud computing services?
What is a good reason to turn to Amazon cloud computing services? Whenever you have spikes in your operation needs. For instance Walmart has them around Christmas, but also in science you have them: especially around the time when publications have to be submitted for important conferences. Werner Vogels mentioned the example of a student who needed a lot of computation power. Instead of turning in an application for the TeraGrid, which would have taken a lot of time, he went for buying computing time on a hundred or so computers from Amazon and could deliver the paper in time.
The basic service that Amazon provides in computing is a computer image, a virtual machine. You can take one of the predefined images, or define your own, for instance take Ubunet as operating system, Apache as web server and Perl as scripting environment. You can upload these images to Amazon, as one instance, or a hundred instances, each with their own IP number and SSH keys, as long as you pay with your credit card. Around these basic services, a whole eco system of companies and organisations providing specialized services is emerging. Some of these services are for free, for others you have to pay, but there is a whole community out there that probably already provides parts of the things you need for your services.
Recently several new features for Amazon EC2 have been announced. You can now choose between different computer models, including multicore. Another new feature is "elastic IP addresses": you get a number of addresses and can associate them with the instances you want. Currently that is IPv4 - the standard Internet addresses - only. Amazon would like to use the more advanced IPv6 addresses: that is ideally suited for their use. The feature of availability zones has already been mentioned: you can ask for systems in certain regions. Within zones systems also have low latency to each other. A new feature is that you can also define your own Linux kernel.
In S3 - the simple storage service - there are currently 14 billion objects stored. The bandwidth needed to serve S3 is larger than all amazon.com, amazon.co.uk, etc., in the world combined.
Grids on clouds
Werner Vogels mentioned two specific examples of Grids on clouds: Hadoop and Globus.
The US academic community is quite active in getting Grid computing to work on EC2. Hadoop, an Apache project, for instance, is now running. The Globus guys have versions of Gobus that run in EC2. And, according to Werner Vogels, there is also a lot of commercial interest in this large scale computer infrastructure.
One of the questions from the audience was about the "No middleware" statement. It was noticed that, obviously, there is a lot of middleware involved, but it is all inside Amazon. How open is all of this? Can I, for instance, run an Amazon system image somewhere else? Werner Vogels answered that nobody else tried this, because it is hard to do. But the system image is just a Xen image with some additional metadata, so yes, you can run it somewhere else.
One of the things where Grids have done really ground breaking work, Werner Vogels said, is thinking how to federate resources. That is really something done well.
In the USA power production has just been deregulated, so you can generate power yourself and sell it back to the power company. Is that a model that could work for computing too? That when your machine is idle, for instance in the night, you sell it back to the Grid or the cloud? That is not really science fiction, Werner Vogels concluded. That is highly likely. We may get into a situation where Amazon not only sells computing time of its own but also acts as a reseller of capacity on your infrastructure.
|