Syllabus
Characteristics of virtualized environments, taxonomy of virtualization techniques, virtualization and cloud Computing, pros and cons of virtualization, technology examples (XEN, VMware), building blocks of containers, container platforms (LXC, Docker), container orchestration, Docker Swarm and Kubernetes, public cloud VM (e.g. AmazonEC2) and container (e.g. Amazon Elastic Container Service) offerings.
Topic-wise textbooks/references
VIRTUALIZATION:
3.1 Characteristics of virtualized environments
3.2 Taxonomy of virtualization techniques
3.3 Virtualization and cloud Computing
3.4 Pros and Cons of Virtualization
3.5 Technology
(a) XEN
(b) VMware
CONTAINERS:
3.6 Building blocks of containers
3.7 Container Platforms
(a) LXC
(b) Docker
3.8 Container Orchestration
(a) Docker Swarm
(b) Kubernetes
3.9 Public Cloud
(a) VM (Ex: AmazonEC2)
(b) Container (Ex: Amazon Elastic Container Service) offerings
Virtualization requires Guest OS, while Containers use the Host OS itself( share basis)
Virtualization refers to the creation of a virtual version of hardware, software environments, storage, or networks. A virtualized environment consists of three major components:
the guest (the system component interacting with the virtualization layer)
the host (the original environment)
the virtualization layer (software that recreates the environment for the guest).
Major Characteristics of the virtualized environments are:
(a) Security prevents malicious operations from affecting the host.
(b) Managed execution enables sharing, aggregation, emulation, and isolation.
Sharing allows a single host to support multiple guests, maximizing resource utilization
Aggregation combines separate hosts into a single virtual resource.
Emulation: Guest systems run inside a software-controlled virtualization layer that emulates different hardware environments, enabling execution and testing of operating systems with specific requirements that are not available on the physical host.
Isolation is a critical feature, ensuring that multiple guests run concurrently in separate environments without interfering with each other. This separation allows the virtualization layer to filter harmful activity.
(c) portability: a guest can be packaged into a virtual image file and safely moved to different physical machines, facilitating straightforward application deployment,. Furthermore, virtualization allows for performance tuning, enabling the virtualization layer to control the resources exposed to a guest to meet specific Quality of Service requirements.
Fig: Taxonomy of Virtualization Techniques
Virtualization techniques are classified based on the entity being emulated, primarily covering execution environments, storage, and networks. The first classification distinguishes virtualization based on the type of service or entity being emulated, such as execution environments, storage, or networks. Since execution virtualization is the oldest and most widely used form, it is further categorized into two major types according to the kind of host platform they require.
(a) Execution virtualization
(i) Machine Reference Model
(ii)Hardware-Level Virtualization
(iii)Hardware Virtualization Techniques: Hardware-assisted Virtualization, Full Virtualization, Paravirtualization, Partial Virtualization
(iv) Operating System Level Virtualization
(v) Programming-Language-Level Virtualization
(vi) Application-Level Virtualization: Interpretation, Binary Translation
(b) Other Types of Virtualization
(i) Storage Virtualization
(ii) Network Virtualization
(iii) Desktop Virtualization
(iv) Application-Server Virtualization
(a) Execution virtualization is the most prominent category and is further divided into process-level techniques (running on an OS) and system-level techniques (running directly on hardware)
modern hardware extensions ensure sensitive instructions execute only in privileged mode, enabling multiple isolated operating systems on the same machine.
(i) Machine Reference Model: Modern computing systems are organized as layered abstractions consisting of API, ABI, ISA, operating system, and hardware, where each layer hides implementation details and exposes well-defined interfaces. Virtualization works by replacing one of these layers and intercepting calls directed to it, emulating its interface while coordinating with the underlying layer. The ISA forms the boundary between hardware and software, the ABI separates applications from the operating system, and the API provides high-level access for applications. Operations initiated at the application level are translated through API, ABI, and ISA into machine-level instructions executed by hardware. To enforce security and isolation, processors distinguish between privileged and non-privileged instructions and support execution modes such as user mode and supervisor (kernel) mode. Hypervisors rely on this model to safely virtualize execution environments.
A possible implementation features a hierarchy of privileges in the form of a ring-based security: Ring 0, Ring 1, Ring 2, and Ring 3; Ring 0 is in the most privileged level, and the Ring 3 in the least privileged level. Ring 0 is used by the kernel of the OS, Rings 1 and 2 are used by the OS level services, and Ring 3 is used by the user. Recent systems support only two levels with Ring 0 for the supervisor mode and Ring 3 for user mode.
The hypervisor runs above the supervisor mode and from here, the prefix hyper is used. In reality, hypervisors are run in supervisor mode, and the division between privileged and non privileged instructions has posed challenges in designing virtual machine managers
Fig: Machine Reference Model
Fig: Security Rings and Privileged Modes
Fig: Hardware Virtualization Reference Model.
Fig: Hosted (left) and Native (right) Virtual Machine.
(ii) Hardware-level virtualization: In this model, the guest is represented by the operating system, the host by the physical computer hardware, the virtual machine by its emulation, and virtual machine manager by the hypervisor. The hypervisor is generally a program, or a combination of software and hardware, that allows the abstraction of the underlying physical hardware.
It provides an abstract hardware environment for a guest OS and is managed by a hypervisor:
Type I hypervisors: These run directly on top of the hardware; This type of hypervisors is also called native virtual machine, since it runs natively on hardware
Type II (hosted): This type of hypervisors is also called hosted virtual machine, since it is hosted within an operating system.
(iii) Hardware Virtualization Techniques: This category includes full virtualization (running unmodified guests via emulation), paravirtualization (modifying guests to interact with a thin hypervisor), and hardware-assisted virtualization (leveraging architectural support like Intel VT)
(iv) Operating system-level virtualization creates isolated user space instances, such as containers, within a single kernel, offering low overhead but requiring all guests to share the same OS,.
(v) Programming-language-level virtualization uses virtual machines (e.g., JVM, .NET) to execute byte code, ensuring cross-platform portability,.
(vi) Application-level virtualization executes applications in environments that do not natively support them, often using binary translation (e.g., Wine),.
(b) Other types of virtualization:
storage virtualization decouples logical data from physical hardware (e.g., SANs)
network virtualization aggregates or partitions network resources (e.g., VLANs).
Desktop virtualization uses a client–server model to provide remote access to a desktop environment that is hosted on centralized servers, delivering the experience of a locally installed system while storing data and execution remotely. It relies on hardware virtualization to run multiple desktop virtual machines on shared infrastructure, offering high availability, persistence, accessibility, and centralized management.
Application-server virtualization aggregates multiple application servers into a single virtual server using load balancing and high-availability mechanisms to improve service quality rather than to emulate a different execution environment.
Virtualization is a fundamental component of Cloud computing because it allows for the customization, security, and isolation required to deliver IT services on demand. It enables different cloud service models: Hardware virtualization is the enabling factor for Infrastructure-as-a-Service (IaaS) by offering configurable computing environments, while programming language virtualization is leveraged in Platform-as-a-Service (PaaS) to provide managed, sandboxed execution environments,.
A major benefit of virtualization in the cloud is server consolidation. By creating isolated and controllable environments, cloud providers can aggregate multiple virtual machines onto fewer physical servers, maximizing resource utilization and reducing energy consumption,. This process is enhanced by live migration, which allows running virtual machine instances to move between physical servers without service interruption, optimizing efficiency dynamically.
Beyond computation, storage virtualization allows providers to harness huge storage facilities and offer them as dynamic, scalable slices. Additionally, desktop virtualization (VDI) has been revamped by cloud computing, allowing complete virtual computers to be hosted on provider infrastructure and accessed remotely via thin clients. Ultimately, virtualization provides the essential abstraction and manageability that make cloud infrastructure flexible and scalable.
Fig: Live Migration and Server Consolidation
(a) Advantages of Virtualization
(b) The Other Side of the Coin: Disadvantages
(i) Performance Degradation
(ii) Inefficiency and Degraded User Experience
(iii) Security Holes and New Threats
(a) Advantages of Virtualization:
Virtualization offers significant advantages, primarily managed execution and isolation. These features create secure "sandbox" environments that prevent harmful operations from crossing into the host, making it ideal for running untrusted code.
Portability is another major pro; virtual machines are often self-contained files that can be easily transported and deployed across different physical systems, simplifying maintenance and administration.
Virtualization drives efficiency through server consolidation, allowing multiple systems to securely share resources, which reduces the number of active servers and lowers power consumption.
(b) The Other Side of the Coin: Disadvantages
(i) Performance Degradation: is a primary concern due to the overhead introduced by the abstraction layer, which must manage virtual processors, memory, and privileged instructions,. While hardware-assisted techniques have reduced this, overhead remains a factor. (simply Maintaining the status of virtual processor, Support of privileged instructions (trap and simulate privileged instructions), Support of paging within VM, Console functions)
(ii) Inefficiency and Degraded User Experience: can also arise if the virtualization layer cannot fully expose host features, such as advanced graphics, leading to a degraded user experience.
(iii) Security Holes and New Threats: Malicious programs, such as "BluePill" or "SubVirt," can target the hypervisor itself, potentially gaining control of the host or extracting sensitive information from guest operating systems
BluePill is a malware targeting the AMD processor family and moves the execution of the installed OS within a virtual machine.
The original version of SubVirt was developed as a prototype by Microsoft through collaboration with Michigan University. SubVirt infects the guest OS and when the virtual machine is rebooted, it gains control of the host.
Xen is an open-source platform largely based on paravirtualization, which requires guest operating systems to be modified to interact with the hypervisor via "hypercalls",.
VMware is a pioneer in full virtualization, allowing unmodified operating systems to run by replicating the underlying hardware. Historically, VMware used binary translation to dynamically handle sensitive x86 instructions that could not be virtualized natively. VMware offers a wide range of solutions:
Type II hypervisors like VMware Workstation for desktops
Type I hypervisors like ESX/ESXi for servers. ESXi utilizes a thin "VMkernel" for efficient resource management. VMware's ecosystem includes
vSphere for infrastructure management
vCloud for creating IaaS clouds, supporting features like live migration and disaster recovery.
Xen: Xen is an open source initiative implementing a virtualization platform based on paravirtualization. Paravirtualization needs the operating system codebase to be modified i.e., hypervisor in Ring1 and the guest operating system in Ring 0. Xen exhibits some limitations in case of legacy hardware and legacy operating systems.
It is also offered as a commercial solution, XenSource, by Citrix
Cloud computing solutions by means of Xen Cloud Platform (XCP
Paravirtualization eliminating the performance loss while executing instructions requiring special management. This is done by modifying portion of the guest operating systems run by Xen, with reference to the execution of such instructions. Therefore, it is not a transparent solution for implementing virtualization (mostly machines and servers are x86 based)
Example1: Open source operating systems such as Linux can be easily modifi ed, since their code is publicly available, and Xen provides full support for their virtualization,
Example2: Components of Windows family are generally not supported by Xen, unless hardware-assisted virtualization is available
Fig: Xen Architecture and Guest OS Management.
The fundamental architecture of containers relies on two essential Linux kernel features: Control Groups (Cgroups) and Namespaces. Cgroups, originally contributed by Google, manage the resource consumption of processes. They allow administrators to allocate, limit, and monitor resources such as CPU, memory, disk I/O, and network bandwidth for a group of processes. This ensures that a single application cannot exhaust system resources, providing fine-grained control and protection against issues like "fork bomb" attacks.
While Cgroups handle resource usage, Namespaces are responsible for isolation. They partition kernel resources so that one set of processes sees a specific view of the system, distinct from other processes. Key namespaces include the PID namespace, which isolates process IDs so containers can't see each other's processes; the Network namespace, which provides isolated network stacks (interfaces, IP addresses); and the Filesystem namespace, which isolates the directory tree, typically using a chroot operation to change the root directory for a process,. Together, these features allow multiple isolated containers to run securely on a single host kernel.
LXC (Linux Containers), introduced in 2008, was the first complete runtime for operating system-level virtualization. It allows multiple isolated Linux systems to run on a single control host using Cgroups and namespaces without requiring patches. LXC focuses on system containers that behave like lightweight virtual machines.
Docker, launched in 2013, revolutionized containerization by creating a comprehensive ecosystem for developing, shipping, and running applications. Originally based on LXC, Docker later developed its own library, libcontainer. Docker employs a client-server architecture consisting of a Client (CLI), a Daemon (dockerd) that manages objects like images and containers, and a Registry (like Docker Hub) for sharing images. A key innovation of Docker is its use of layered images based on a union filesystem, which makes images lightweight and portable; changes are stored in a writable top layer while the base layers remain read-only,. This structure allows applications to be packaged with all dependencies, ensuring they run reliably across different environments.
As organizations moved from monolithic to microservice architectures, managing hundreds or thousands of containers manually became impractical. Container orchestration is the automated process of managing the lifecycle of containerized applications, including deployment, networking, scaling, and health monitoring.
In a production environment, orchestration tools ensure high availability and fault tolerance. If a container fails, the orchestrator automatically replaces it; if traffic spikes, it scales the number of replicas up or down. Orchestration platforms enable declarative configuration, where developers define the "desired state" (e.g., "run three copies of this service"), and the tool works to maintain that state,. This automation eliminates the need for manual intervention in tasks like rolling updates, service discovery, and load balancing. The most prominent tools in this space are Docker Swarm and Kubernetes, which abstract the underlying infrastructure, allowing developers to treat a cluster of machines as a single deployment target.
Docker Swarm and Kubernetes
Docker Swarm is a container orchestration engine integrated natively into the Docker ecosystem. It uses a manager-worker architecture where manager nodes handle scheduling and cluster state (maintained via the Raft consensus algorithm), while worker nodes execute the containers. Swarm is known for its simplicity and ease of setup, utilizing the standard Docker API. It uses a "Gossip network" for fast communication between nodes and supports services defined with specific replica counts.
Kubernetes (K8s) is a robust, open-source platform originally developed by Google. It is more complex than Swarm but offers extensive features for automated deployment, scaling, and operations. Kubernetes organizes containers into Pods (the smallest deployable units), which are managed by Deployments and exposed via Services. Its architecture separates the Control Plane (API Server, Scheduler, Controller Manager) from Worker Nodes (running Kubelet and Kube-Proxy),. Kubernetes supports advanced capabilities like self-healing, secret management, and automated rollouts/rollbacks, making it the industry standard for large-scale container management.
Public cloud providers offer both infrastructure and managed container services to simplify deployment. Amazon Web Services (AWS) allows users to run containers on Amazon EC2 instances, which provides full control over the underlying virtual machines. For a managed experience, Amazon Elastic Container Service (ECS) handles orchestration, allowing deployment on clusters of EC2 instances or via AWS Fargate, a serverless engine that removes the need to manage servers entirely.
Microsoft Azure offers Azure Container Instances (ACI) for running containers quickly without provisioning VMs, and Azure Kubernetes Service (AKS) for managed Kubernetes clusters,. ACI charges based on execution time, offering a "serverless" feel for containers.
Google Cloud leverages its history with Kubernetes through Google Kubernetes Engine (GKE), a fully managed service for deploying containerized applications. Google also offers Google Compute Engine for running containers on standard VMs with container-optimized OS images, and Cloud Run, a managed platform for stateless containers that scales to zero, billing only for execution time
Download Docker and Start using it - Link
docker -Reading article - https://www.codecentric.de/en/knowledge-hub/blog/docker-demystified