CLOUD COMPUTING-INTRODUCTION



SYLLABUS:  SYSTEMS MODELING, CLUSTERING AND VIRTUALIZATION
Scalable Computing over the Internet, Technologies for Network based systems, System models
for Distributed and Cloud Computing, Software environments for distributed systems and clouds, Performance, Security And Energy Efficiency

SCALABLE COMPUTING OVER THE INTERNET
Ø  Over the past 60 years, computing technology has undergone a series of platform and environment changes.
Ø  we assess evolutionary changes in machine architecture, operating system platform, network connectivity, and application workload.
Ø  Instead of using a centralized computer to solve computational problems, a parallel and distributed computing system uses multiple computers to solve large-scale problems over the Internet. Thus, distributed computing becomes data-intensive and network-centric.
The Age of Internet Computing:
Ø  Billions of people use the Internet every day. As a result, supercomputer sites and large data centers must provide high-performance computing services to huge numbers of Internet users concurrently.
Ø  Because of this high demand, the Linpack Benchmark for high-performance computing (HPC) applications is no longer optimal for measuring system performance.
Ø  The emergence of computing clouds instead demands high-throughput computing (HTC) systems built with parallel and distributed computing technologies.
Ø  We have to upgrade data centers using fast servers, storage systems, and high-bandwidth networks.
1. The Platform Evolution:
Computer technology has gone through five generations of development, with each generation lasting from 10 to 20 years.
Ø  From 1950 to 1970, a handful of mainframes, including the IBM 360 and CDC 6400, were built to satisfy the demands of large businesses and government organizations.
Ø  From 1960 to 1980, lower-cost minicomputers such as the DEC PDP 11 and VAX Series became popular among small businesses and on college campuses.
Ø  From 1970 to 1990, we saw widespread use of personal computers built with VLSI microprocessors.


Ø  From 1980 to 2000, massive numbers of portable computers and pervasive devices appeared in  both wired and wireless applications.
Ø  Since 1990, the use of both HPC and HTC systems hidden in clusters, grids, or Internet clouds has proliferated. These systems are employed by both consumers and high-end web-scale computing and information services.
2.HPC:
Ø  On the HPC side, supercomputers (massively parallel processors or MPPs) are gradually replaced by clusters of cooperative computers out of a desire to share computing resources.
Ø  The cluster is often a collection of homogeneous compute nodes that are physically connected in close range to one another.
Ø  For many years, HPC systems emphasize the raw speed performance. The speed of HPC systems has increased from Gflops in the early 1990s to now Pflops in 2010. This improvement was driven mainly by the demands from scientific, engineering, and manufacturing communities.

3.HTC:
Ø  On the HTC side, peer-to-peer (P2P) networks are formed for distributed file sharing and content delivery applications.
Ø  A P2P system is built over many client machines. Peer machines are globally distributed in nature. P2P, cloud computing, and web service platforms are more focused on HTC applications than on HPC applications.
Ø  This HTC paradigm pays more attention to high-flux computing.

Ø  The main application for high-flux computing is in Internet searches and web services by millions or more users simultaneously.
Ø  The performance goal thus shifts to measure high throughput or the number of tasks completed per unit of time.
Clustering and P2P technologies lead to the development of computational grids or data grids.
4.Three New Computing Paradigms:
  1. With the introduction of SOA, Web 2.0 services become available.
  2. Advances in virtualization make it possible to see the growth of Internet clouds as a new computing paradigm.
  3. The maturity of radio-frequency identification (RFID), Global Positioning System (GPS), and sensor technologies has triggered the development of the Internet of Things (IoT).
5.Computing Paradigm Distinctions:
Ø  The high-technology community has argued for many years about the precise definitions of centralized computing, parallel computing, distributed computing, and cloud computing.
Ø  In general, distributed computing is the opposite of centralized computing.
Ø  The field of parallel computing overlaps with distributed computing to a great extent.
Ø  Cloud computing overlaps with distributed, centralized, and parallel computing.
  1. Centralized computing: This is a computing paradigm by which all computer resources are centralized in one physical system. All resources (processors, memory, and storage) are fully shared and tightly coupled within one integrated OS. Many data centers and supercomputers are centralized systems, but they are used in parallel, distributed, and cloud computing applications.
  2. Parallel computing: In parallel computing, all processors are either tightly coupled with centralized shared memory or loosely coupled with distributed memory.
  3. Distributed computing: This is a field of computer science/engineering that studies distributed systems. A distributed system consists of multiple autonomous computers, each having its own private memory, communicating through a computer network. Information exchange in a distributed system is accomplished through message passing.
  4. Cloud computing: An Internet cloud of resources can be either a centralized or a distributed computing system. The cloud applies parallel or distributed computing, or both. Clouds can be built with physical or virtualized resources over large data centers that are centralized or distributed.



6.Distributed System Families:
Ø  In the future, both HPC and HTC systems will demand multi-core or many-core processors that can handle large numbers of computing threads per core.
Ø  Both HPC and HTC systems emphasize parallelism and distributed computing. Future HPC and HTC systems must be able to satisfy this huge demand in computing power in terms of throughput, efficiency, scalability, and reliability.
Ø  The system efficiency is decided by speed, programming, and energy factors (i.e., throughput per watt of energy consumed).
Ø  Meeting these goals requires to yield the following design objectives:
1)      Efficiency measures the utilization rate of resources in an execution model by exploiting massive parallelism in HPC. For HTC, efficiency is more closely related to job throughput, data access, storage, and power efficiency.
2)      Dependability measures the reliability and self-management from the chip to the system and application levels. The purpose is to provide high-throughput service with Quality of Service (QoS) assurance, even under failure conditions.
3)      Adaptation in the programming model measures the ability to support billions of job requests over massive data sets and virtualized cloud resources under various workload and service models.
4)      Flexibility in application deployment measures the ability of distributed systems to run well in both HPC (science and engineering) and HTC (business) applications.

SCALABLE COMPUTING TRENDS & NEW PARADIGMS
Several predictable trends in technology are known to drive computing applications. According to Moore’s law indicates that processor speed doubles every 18 months.
Gilder’s law indicates that network bandwidth has doubled each year in the past. This has also driven the adoption and use of commodity technologies in large-scale computing.
1. Degrees of Parallelism:
Ø  Bit-level parallelism (BLP): converts bit-serial processing to word-level processing gradually.
Ø  Instruction-level parallelism (ILP): Over the years, users graduated from 4-bit microprocessors to 8-, 16-, 32-, and 64-bit CPUs. This led us to the next wave of improvement, known as instruction-level parallelism (ILP), in which the processor executes multiple instructions simultaneously rather than only one instruction at a time.

Ø  Data-level parallelism (DLP): was made popular through SIMD (single instruction, multiple data) and vector machines using vector or array types of instructions. DLP requires even more hardware support and compiler assistance to work properly.
Ø  Task-level parallelism (TLP): A modern processor explores all of the aforementioned parallelism types. However, TLP is far from being very successful due to difficulty in programming and compilation of code for efficient execution on multicore CMPs.
Ø  Job-level parallelism (JLP): As we move from parallel processing to distributed processing, we will see an increase in computing granularity to job-level parallelism (JLP). It is fair to say that coarse-grain parallelism is built on top of fine-grain parallelism.
2. Innovative applications:
Both HPC and HTC systems desire transparency in many application aspects. For example, data access, resource allocation, process location, concurrency in execution, job replication, and failure recovery should be made transparent to both users and system management. The following table highlights a few key applications that have driven the development of parallel and distributed systems over the years.

3. The trend towards utility computing:
Utility computing focuses on a business model in which customers receive computing resources from a paid service provider. All grid/cloud platforms are regarded as utility service providers. Cloud computing offers a broader concept than utility computing. Distributed cloud applications run on any available servers in some edge networks.



4. The Hype Cycle of New Technologies : Any new and emerging computing and information technology may go through a hype cycle, as illustrated in Figure. This cycle shows the expectations for the technology at five different stages. The expectations rise sharply from the trigger period to a high peak of inflated expectations. Through a short period of disillusionment, the expectation may drop to a valley and then increase steadily over a long enlightenment period to a plateau of productivity. The number of years for an emerging technology to reach a certain stage is marked by special symbols. The hollow circles indicate technologies that will reach mainstream adoption in two years. The gray circles represent technologies that will reach mainstream adoption in two to five years. The solid circles represent those that require five to 10 years to reach mainstream adoption, and the triangles denote those that require more than 10 years. The crossed circles represent technologies that will become obsolete before they reach the plateau. The cloud technology had just crossed the peak of the expectation stage in 2010, and it was expected to take two to five more years to reach the productivity stage.


5. The Internet of Things:
Ø  The traditional Internet connects machines to machines or web pages to web pages.
Ø  The IoT refers to the networked interconnection of everyday objects, tools, devices, or computers. One can view the IoT as a wireless network of sensors that interconnect all things in our daily life.
Ø  These things can be large or small and they vary with respect to time and place. The idea is to tag every object using RFID or a related sensor or electronic technology such as GPS.
Ø  This communication can be made between people and things or among the things themselves.
Ø  Three communication patterns co-exist: namely
1)      H2H (human-to-human),
2)      H2T (human-to-thing)
3)      T2T (thing-to-thing)
6. Cyber-Physical Systems: A cyber-physical system(CPS) is the result of interaction between computational processes and the physical world. A CPS integrates “cyber”(heterogeneous, asynchronous) with “physical”(concur-rent and information-dense) objects. A CPS merges the“3C”technologies of computation, communication, and control into an intelligent closed feedback system between the physical world and the information world, a concept which is actively explored in the United States. The IoT emphasizes various networking connections among physical objects, while the CPS emphasizes exploration of virtual reality(VR) applications in the physical world. We may transform how we interact with the physical world just like the Internet transformed how we interact with the virtual world.
TECHNOLOGIES FOR NETWORK-BASED SYSTEMS
In particular, we will focus on viable approaches to building distributed operating systems for handling massive parallelism in a distributed environment.
1.Multi core CPUs and Multithreading Technologies:
Both multi-core CPU and many-core GPU processors can handle multiple instruction threads at different magnitudes today. Figure  shows the architecture of a typical multi core processor. Each core is essentially a processor with its own private cache (L1 cache). Multiple cores are housed in the same chip with an L2 cache that is shared by all cores. In the future, multiple CMPs could be built on the same CPU chip with even the L3 cache on the chip. Multi core and multi- threaded CPUs are equipped with many high-end processors, including the Intel i7, Xeon, AMD Opteron, Sun Niagara, IBM Power 6, and X cell processors.

Multi core processor
Multi core CPU and Many-Core GPU Architectures:
Multi core CPUs may increase from the tens of cores to hundreds or more in the future. But the CPU has reached its limit in terms of exploiting massive DLP due to the aforementioned memory wall problem.
This has triggered the development of many-core GPUs with hundreds or more thin cores.
Multithreading Technology:
  1. four-issue superscalar processor
  2. a fine-grain multithreaded processor
  3. a coarse-grain multithreaded processor
  4. a two-core CMP multithreaded (SMT) processor
  5. a simultaneous multithreaded (SMT) processor
Ø  The superscalar processor is single-threaded with four functional units. Each of the three multithreaded processors is four-way multithreaded over four functional data paths.
Ø  In the dual-core processor, assume two processing cores, each a single-threaded two-way superscalar processor.






Multi threading models
1)      Only instructions from the same thread are executed in a superscalar processor.
2)      Fine-grain multithreading switches the execution of instructions from different threads per cycle.
3)      Course-grain multithreading executes many instructions from the same thread for quite a few cycles before switching to another thread.
4)      The multi core CMP executes instructions from different threads completely.
5)      The SMT allows simultaneous scheduling of instructions from different threads in the same cycle.

2. GPU Computing to Exascale and Beyond:
Ø  A GPU is a graphics coprocessor or accelerator mounted on a computer’s graphics card or video card. A GPU offloads the CPU from tedious graphics tasks in video editing applications.
Ø  The world’s first GPU, the GeForce 256, was marketed by NVIDIA in 1999.
Ø  How GPUs Work:
Ø  Early GPUs functioned as coprocessors attached to the CPU. Today, the NVIDIA GPU has been upgraded to 128 cores on a single chip.





GPU
Ø  Furthermore, each core on a GPU can handle eight threads of instructions. This translates to having up to 1,024 threads executed concurrently on a single GPU.
Ø  The GPU is optimized to deliver much higher throughput with explicit management of on-chip memory.

3.Memory, Storage, and Wide-Area Networking:
Memory Technology:
Ø  The growth of DRAM chip capacity from 16 KB in 1976 to 64 GB in 2011. This shows that memory chips have experienced a 4x increase in capacity every three years.
Ø  For hard drives, capacity increased from 260 MB in 1981 to 250 GB in 2004. The Seagate Barracuda XT hard drive reached 3 TB in 2011. This represents an approximately 10x increase in capacity every eight years.
Ø  Faster processor speed and larger memory capacity result in a wider gap between processors and memory. The memory wall may become even worse a problem limiting the CPU performance in the future.
Disks and Storage Technology:
Ø  The rapid growth of flash memory and solid-state drives (SSDs) also impacts the future of HPC and HTC systems.
Ø  A typical SSD can handle 300,000 to 1 million write cycles per block. So the SSD can last for several years, even under conditions of heavy write usage. Flash and SSD will demonstrate impressive speedups in many applications.
System-Area Interconnects:
Ø  The nodes in small clusters are mostly interconnected by an Ethernet switch or a local area network (LAN).
Ø  A LAN typically is used to connect client hosts to big servers.

Ø  A storage area network (SAN) connects servers to network storage such as disk arrays.
Ø  Network attached storage (NAS) connects client hosts directly to the disk arrays.
Wide-Area Networking:
Ø  The rapid growth of Ethernet bandwidth from 10 Mbps in 1979 to 1 Gbps in 1999, and 40 ~ 100 GE in 2011. It has been speculated that 1 Tbps network links will become available by 2013.
Ø  An increase factor of two per year on network performance was reported, which is faster than Moore’s law on CPU speed doubling every 18 months. The implication is that more computers will be used concurrently in the future.
Ø  High-bandwidth networking increases the capability of building massively distributed systems.
4. Virtual Machines and Virtualization Middleware:
Ø  Virtual machines (VMs) offer novel solutions to underutilized resources, application inflexibility, software manageability, and security concerns in existing physical machines.
1. Virtual Machines:




Ø  The host machine is equipped with the physical hardware, as shown at the bottom of the figure. An example is an x-86 architecture desktop running its installed Windows OS, as shown in part (a) of the figure.
Ø  The VM can be provisioned for any hardware system. The VM is built with virtual resources managed by a guest OS to run a specific application. Between the VMs and the host platform, one needs to deploy a middleware layer called a virtual machine monitor (VMM).
Ø  Figure (b) shows a native VM installed with the use of a VMM called a hypervisor in privileged mode. For example, the hardware has x-86 architecture running the Windows system.
Ø  The guest OS could be a Linux system and the hypervisor is the XEN system developed at Cambridge University. This hypervisor approach is also called bare-metal VM, because the hypervisor handles the bare hardware (CPU, memory, and I/O) directly.
Ø  Another architecture is the host VM shown in Figure (c). Here the VMM runs in non-privileged mode. The host OS need not be modified. The VM can also be implemented with a dual mode, as shown in Figure (d).
Ø  Part of the VMM runs at the user level and another part runs at the supervisor level. In this case, the host OS may have to be modified to some extent. Multiple VMs can be ported to a given hardware system to support the virtualization process.
Ø  The VM approach offers hardware independence of the OS and applications. The user application running on its dedicated OS could be bundled together as a virtual appliance that can be ported to any hardware platform.
Ø  The VM could run on an OS different from that of the host computer.

2. VM Primitive Operations:
1)      Multiplexing : VMs can be multiplexed between hardware machines, as shown in Figure (a).
2)      Suspension :  VM can be suspended and stored in stable storage, as shown in Figure (b).
3)      Provision : a suspended VM can be resumed or provisioned to a new hardware platform, as shown in Figure (c).
4)      Life migration :  VM can be migrated from one hardware platform to another, as shown in Figure(d)



3. Virtual Infrastructure:
Dynamic mapping of system resources to specific applications. The result is decreased costs and increased efficiency and responsiveness. Virtualization for server consolidation and containment is a good example of this
5.Data Center Virtualization for Cloud Computing:
Ø  Cloud architecture is built with commodity hardware and network devices.
Ø  Almost all cloud platforms choose the popular x86 processors. Low-cost terabyte disks and Gigabit Ethernet are used to build data centers.
Ø  Data center design emphasizes storage and energy efficiency are more important than shear speed performance.
1. Data Center Growth and Cost Breakdown:
Ø  A large data center may be built with thousands of servers. Smaller data centers are typically built with hundreds of servers.
Ø  According to a 2009 IDC report typically only 30 percent of data center costs goes toward purchasing IT equipment (such as servers and disks),
Ø  33 percent is attributed to the chiller (cooling system)
Ø  18 percent to the uninterruptible power supply (UPS),
Ø  9 percent to computer room air conditioning (CRAC),
Ø  And the remaining 7 percent to power distribution, lighting, and transformer costs.
Ø  Thus, about 60 percent of the cost to run a data center is allocated to management and maintenance.

 2. Low cost design philosophy :High-end switches or routers may be too cost-prohibitive for building data centers. Thus, using high-bandwidth networks may not fit the economics of cloud computing. Commodity switches and networks are more desirable in data centers. Similarly, using commodityx86 servers is more desired over expensive mainframes. The software layer handles network traffic balancing, fault tolerance, and expandability. Currently, nearly all cloud computing data centers use Ethernet as their fundamental network technology
   3. Convergence of Technologies: Essentially, cloud computing is enabled by the convergence of technologies in four areas: (1) hard-ware virtualization and multi-core chips, (2) utility and grid computing, (3) SOA, Web 2.0, and WS mashups, and (4) atonomic computing and data center automation
SYSTEM MODELS FOR DISTRIBUTED AND CLOUD COMPUTING
Distributed and cloud computing systems are built over a large number of autonomous computer nodes. These node machines are interconnected by SANs, LANs, or WANs in a hierarchical manner.
Ø  Massive systems are classified into four groups:
1.      Clusters
2.      P2P networks
3.      Computing grids,
4.      Internet clouds over huge data centers.


1. Cluster Architecture: An interconnected stand alone computers that works co-operatively as a single integrated computing resource is called a cluster
Ø  The architecture of a typical server cluster built around a low-latency, high-bandwidth interconnection network.
Ø  To build a larger cluster with more nodes, the interconnection network can be built with multiple levels of Gigabit Ethernet, Myrinet, or InfiniBand switches.
Ø  Through hierarchical construction using a SAN, LAN, or WAN, one can build scalable clusters with an increasing number of nodes.
Ø  The cluster is connected to the Internet via a virtual private network (VPN) gateway. The gateway IP address locates the cluster.
Ø  The system image of a computer is decided by the way the OS manages the shared cluster resources.
Ø  Most clusters have loosely coupled node computers. All resources of a server node are managed by their own OS.
Ø  Thus, most clusters have multiple system images as a result of having many autonomous nodes under different OS control.

Cluster of servers interconnected by high bandwidth SAN or LAN

Single-System Image:
Ø  Cluster should merge multiple system images into a single-system image (SSI).
Ø  Cluster designers desire a cluster operating system or some middleware to support SSI at various levels, including the sharing of CPUs, memory, and I/O across all cluster nodes.


Ø  An SSI is an illusion created by software or hardware that presents a collection of resources as one integrated, powerful resource.

Hardware, Software, and Middleware Support
Ø  Clusters exploring massive parallelism are commonly known as MPPs. Almost all HPC clusters in the Top 500 list are also MPPs. The building blocks are computer nodes (PCs, workstations, servers, or
Ø  SMP), special communication software such as PVM or MPI, and a network interface card in each
Ø  computer node. Most clusters run under the Linux OS.
Ø  The computer nodes are interconnected by a high-bandwidth network (such as Gigabit Ethernet, Myrinet, InfiniBand, etc.).
Ø  Special cluster middleware supports are needed to create SSI or high availability (HA). Both
Ø  sequential and parallel applications can run on the cluster, and special parallel environments are
Ø  needed to facilitate use of the cluster resources.

Major Cluster Design Issues
A cluster-wide OS for complete resource sharing is not available yet. Middleware or OS extensions were developed at the user space to achieve SSI at selected functional levels. Without this middleware, cluster nodes cannot work together effectively to achieve cooperative computing. The software environments and applications must rely on the middleware to achieve high performance. The cluster benefits come from scalable performance, efficient message passing, high system availability, seamless fault tolerance, and cluster-wide job management,
2. Grid Computing Infrastructures:
Grid families:
Ø  In grid systems are classified in essentially two categories: computational or data grids and P2P grids.




Computing grid
Computational Grids:
Ø  Like an electric utility power grid, a computing grid offers an infrastructure that couples computers, software/middleware, special instruments, and people and sensors together. The grid is often constructed across LAN, WAN, or Internet backbone networks at a regional, national, or global scale.
Ø  The resource sites offer complementary computing resources, including workstations, large servers, a mesh of processors, and Linux clusters to satisfy a chain of computational needs.
3. Peer-to-Peer Network Families:
Ø  client machines (PCs and workstations) are connected to a central server for compute, e-mail, file access, and database applications.
Ø  The P2P architecture offers a distributed model of networked systems. First, a P2P network is client-oriented instead of server-oriented.
Ø  P2P systems are introduced at the physical level and overlay networks at the logical level.

P2P Systems:
Ø  In a P2P system, every node acts as both a client and a server, providing part of the system resources.
Ø  Peer machines are simply client computers connected to the Internet. All client machines act autonomously to join or leave the system freely.
Ø  This implies that no master-slave relationship exists among the peers. No central coordination or central database is needed.
Ø  The system is self-organizing with distributed control.
Ø  Only the participating peers form the physical network at any time.
Ø  Unlike the cluster or grid, a P2P network does not use a dedicated interconnection network. The physical network is simply an ad hoc network formed at various Internet domains randomly using the TCP/IP and NAI protocols.
Overlay Networks:
Ø  Data items or files are distributed in the participating peers.
Ø  Based on communication or file-sharing needs, the peer IDs form an overlay network at the logical level. This overlay is a virtual network formed by mapping each physical machine with its ID, logically, through a virtual mapping.

Overlay networks built with virtual links
P2P Application Families:
Ø  Based on application, P2P networks are classified into four groups.
1)      Distributed File Sharing
2)      Collaboration Platform

3)      Distributed P2P Computing
4)      P2P Platform
P2P Computing Challenges:
Ø  P2P computing faces three types of heterogeneity problems in hardware, software, and network requirements.
Ø  Different network connections and protocols make it too complex to apply in real applications.
Ø  System scaling is directly related to performance and bandwidth. P2P networks do have these properties. Data locality, network proximity, and interoperability are three design objectives in distributed P2P applications.
Ø  Fault tolerance, failure management, and load balancing are other important issues in using overlay network.
Ø  Security, privacy, and copyright violations are major worries by those in the industry in terms of    applying  P2P technology in business applications
Ø  By replicating data in multiple peers, one can easily lose data in failed nodes.
Disadvantages of P2P networks : Because the system is not centralized, managing it is difficult.
Ø  In addition, the system lacks security. Anyone can log on to the system and cause damage or abuse.
Ø  Further, all client computers connected to a P2P network cannot be considered reliable or virus-free. P2P networks are reliable for a small number of peer nodes. They are only useful for applications that require a low level of security and have no concern for data sensitivity.
Cloud Computing over the Internet:
Ø  In the future, working with large data sets will typically mean sending the computations (programs) to the data, rather than copying the data to the workstations.
Ø  This reflects the trend in IT of moving computing and data from desktops to large data centers where there is on-demand provision of software, hardware, and data as a service.
Ø  This data explosion has promoted the idea of cloud computing.
Ø  Based on this definition, a cloud allows workloads to be deployed and scaled out quickly through rapid provisioning of virtual or physical machines.
Ø  The cloud supports redundant, self-recovering, highly scalable programming models that allow workloads to recover from many unavoidable hardware/software failures.


Ø  Finally, the cloud system should be able to monitor resource use in real time to enable rebalancing of allocations when needed.

Internet Clouds:
Ø  Cloud computing applies a virtualized platform with elastic resources on demand by provisioning hardware, software, and data sets dynamically (see Figure). The idea is to move desktop computing to a service-oriented platform using server clusters and huge databases at data centers.

Virtualized resources from data centers to form a cloud
The Cloud Landscape:
Ø  These traditional systems have encountered several performance bottlenecks: constant system maintenance, poor utilization, and increasing costs associated with hardware/software upgrades.
Ø  Cloud computing as an on-demand computing paradigm resolves or relieves us from these problems.
Ø  The cloud landscape and major cloud players, based on three cloud service models.
1.      Infrastructure as a Service (IaaS)
2.      Platform as a Service (PaaS)
3.      Software as a Service (SaaS)
Ø  Infrastructure as a Service (IaaS): This model puts together infrastructures demanded by users—namely servers, storage, networks, and the data center fabric. The user can deploy and run on multiple VMs running guest OSes on specific applications. The user does not manage or control the underlying cloud infrastructure, but can specify when to request and release the needed resources.

Ø  Platform as a Service (PaaS): This model enables the user to deploy user-built applications onto a virtualized cloud platform. PaaS includes middleware, databases, development tools, and some runtime support such as Web 2.0 and Java. The platform includes both hardware and software integrated with specific programming interfaces. The provider supplies the API and software tools (e.g., Java, Python, Web 2.0, .NET). The user is freed from managing the cloud infrastructure.
Ø  Software as a Service (SaaS): This refers to browser-initiated application software over thousands of paid cloud customers. The SaaS model applies to business processes, industry applications, consumer relationship management (CRM), enterprise resources planning (ERP), human resources (HR), and collaborative applications. On the customer side, there is no upfront investment in servers or software licensing. On the provider side, costs are rather low, compared with conventional hosting of user applications.
The following list highlights eight reasons to adapt the cloud for upgraded Internet applications and web services:
Desired location in areas with protected space and higher energy efficiency
1)      Sharing of peak-load capacity among a large pool of users, improving overall utilization
2)      Separation of infrastructure maintenance duties from domain-specific application development
3)      Significant reduction in cloud computing cost, compared with traditional computing paradigms
4)      Cloud computing programming and application development
5)      Service and data discovery and content/service distribution
6)      Privacy, security, copyright, and reliability issues
7)      Service agreements, business models, and pricing policies

 

SOFTWARE ENVIRONMENTS FOR DISTRIBUTED SYSTEMS AND CLOUDS

Service-Oriented Architecture (SOA):
Ø  In grids/web services, Java, and CORBA, an entity is, respectively, a service, a Java object, and a CORBA distributed object in a variety of languages.
Ø  These architectures build on the traditional seven Open Systems Interconnection (OSI) layers that provide the base networking abstractions.
Ø  On top of this we have a base software environment, which would be .NET or Apache Axis for web services, the Java Virtual Machine for Java, and a broker network for CORBA.
Ø  On top of this base environment one would build a higher level environment reflecting the special features of the distributed computing environment.
Ø  This starts with entity interfaces and inter-entity communication, which rebuild the top four OSI layers but at the entity and not the bit level.

Layered Architecture for Web Services and Grids:
1)      Higher Level services
2)      Service Context
3)      Service Internet
4)      Bit-Level Internet
Ø  The entity interfaces correspond to the Web Services Description Language (WSDL).
Ø  Java method, and CORBA interface definition language (IDL) specifications in these example distributed systems.
Ø  These interfaces are linked with customized, high-level communication systems: SOAP, RMI, and IIOP in the three examples.
Ø  These communication systems support features including particular message patterns (such as Remote Procedure Call or RPC), fault recovery, and specialized routing.
Ø  Often, these communication systems are built on message-oriented middleware (enterprise bus) infrastructure such as WebSphere MQ or Java Message Service (JMS),
Ø  which provide rich functionality and support virtualization of routing, senders, and recipients.



The Evolution of SOA:


Ø  SOA applies to building grids, clouds, grids of clouds, clouds of grids, clouds of clouds (also known as inter-clouds), and systems of systems in general.
Ø  Raw data → Data → Information → Knowledge → Wisdow → Decisions
Ø  A large number of sensors provide data-collection services, denoted in the figure as SS.
Ø  A sensor can be a ZigBee device, a Bluetooth device, a WiFi access point, a personal computer, a GPA, or a wireless phone, among other things.
Ø  All the SS devices interact with large or small computers, many forms of grids, databases, the compute cloud, the storage cloud, the filter cloud, the discovery cloud, and so on.
Ø  Filter services ( fs in the figure) are used to eliminate unwanted raw data, in order to respond to specific requests from the web, the grid, or web services.
Ø  SOA aims to search for, or sort out, the useful data from the massive amounts of raw data items.
Ø  Processing this data will generate useful information, and subsequently, the knowledge for our daily use.
Ø  Finally, we make intelligent decisions based on both biological and machine wisdom.

Trends toward Distributed Operating Systems:
Ø  The computers in most distributed systems are loosely coupled. Thus, a distributed system inherently has multiple system images.
Ø  This is mainly due to the fact that all node machines run with an independent operating system.
Ø  To promote resource sharing and fast communication among node machines, it is best to have a distributed OS that manages all resources coherently and efficiently.
Distributed Operating Systems:
Ø  Tanenbaum identifies three approaches for distributing resource management functions in a distributed computer system.
1)      The first approach is to build a network OS over a large number of heterogeneous OS platforms.
2)      The second approach is to develop middleware to offer a limited degree of resource sharing, similar to the MOSIX/OS developed for clustered systems.
3)      The third approach is to develop a truly distributed OS to achieve higher use or system transparency.


Amoeba versus DCE:
Ø  DCE is a middleware-based system for distributed computing environments.
Ø  The Amoeba was academically developed at Free University in the Netherlands.
Ø  To balance the resource management workload, the functionalities of such a distributed OS should be distributed to any available server. to any available server. In this sense, the conventional OS runs only on a centralized platform.
Ø  With the distribution of OS services, the distributed OS design should take a lightweight microkernel approach like the Amoeba or should extend an existing OS like the DCE by extending UNIX.
Ø  The trend is to free users from most resource management duties.
MOSIX2 for Linux Clusters:
Ø  MOSIX2 is a distributed OS, which runs with a virtualization layer in the Linux environment.
Ø  This layer provides a partial single-system image to user applications.
Ø  MOSIX2 supports both sequential and parallel applications, and discovers resources and migrates software processes among Linux nodes.
Ø  Flexible management of a grid allows owners of clusters to share their computational resources among multiple cluster owners.
Transparency in Programming Environments:
Ø  The user data, applications, OS, and hardware are separated into four levels.
Ø  Data is owned by users, independent of the applications.
Ø  The OS provides clear interfaces, standard programming interfaces, or system calls to application programmers.
Ø  In future cloud infrastructure, the hardware will be separated by standard interfaces from the OS.
Ø  Thus, users will be able to choose from different OSes on top of the hardware devices they prefer to use.
Ø  To separate user data from specific application programs, users can enable cloud applications as SaaS.
Ø  Thus, users can switch among different services.
Parallel and Distributed Programming Models:
1)      Message-Passing Interface (MPI):
Ø  This is the primary programming standard used to develop parallel and concurrent programs to run on a distributed system.

Ø  MPI is essentially a library of subprograms that can be called from C or FORTRAN to write parallel programs running on a distributed system.
Ø  The idea is to embody clusters, grid systems, and P2P systems with upgraded web services and utility computing applications.
2)      Map Reduce:
Ø  This is a web programming model for scalable data processing on large clusters over large data sets.
Ø  The model is applied mainly in web-scale search and cloud computing applications.
Ø  The user specifies a Map function to generate a set of intermediate key/value pairs.
Ø  Then the user applies a Reduce function to merge all intermediate values with the same intermediate key.
Ø  Map Reduce is highly scalable to explore high degrees of parallelism at different job levels.
Ø  A typical Map Reduce computation process can handle terabytes of data on tens of thousands or more client machines.
3)      Hadoop Library:
Ø  The package enables users to write and run applications over vast amounts of distributed data.
Ø  Users can easily scale Hadoop to store and process petabytes of data in the web space.
Ø  It is efficient, as it processes data with a high degree of parallelism across a large number of commodity nodes.
Ø  it is reliable in that it automatically keeps multiple data copies to facilitate redeployment of computing tasks upon unexpected system failures.
4)      Open Grid Services Architecture:
Ø  OGSA is a common standard for general public use of grid services.
Ø  Genesis II is a realization of OGSA. Key features include a distributed execution environment,
Ø  Public Key Infrastructure (PKI) services using a local certificate authority (CA),
Ø  Trust management, and security policies in grid computing.
5)      Globus Toolkits and Extensions:
Ø  Globus is a middleware library.
Ø  This library implements some of the OGSA standards for resource discovery, allocation, and security enforcement in a grid environment.
Ø  The Globus packages support multisite mutual authentication with PKI certificates.
Ø  In addition, IBM has extended Globus for business applications.


PERFORMANCE, SECURITY, AND ENERGY EFFICIENCY
1. Performance Metrics:
Ø  In a distributed system, performance is attributed to a large number of factors.
Ø  System throughput is often measured in MIPS, Tflops (tera floating-point operations per second), or TPS (transactions per second).
Ø  Other measures include job response time and network latency. An interconnection network that has low latency and high bandwidth is preferred.
Ø  System overhead is often attributed to OS boot time, compile time, I/O data rate, and the runtime support system used.
Ø  Other performance-related metrics include the QoS for Internet and web services.
Ø  System availability and dependability.
Ø  Security resilience for system defense against network attacks.

Dimensions of Scalability:
1)      Size scalability:
Ø  This refers to achieving higher performance or more functionality by increasing the machine size.
Ø  The word “size” refers to adding processors, cache, memory, storage, or I/O channels.
Ø  The most obvious way to determine size scalability is to simply count the number of processors installed.
2)      Software scalability:
Ø  This refers to upgrades in the OS or compilers, adding mathematical and engineering libraries, porting new application software, and installing more user-friendly programming environments.
Ø  Some software upgrades may not work with large system configurations.
Ø  Testing and fine-tuning of new software on larger systems is a nontrivial job.
3)      Application scalability:
Ø  This refers to matching problem size scalability with machine size scalability.
Ø  Problem size affects the size of the data set or the workload increase. Instead of increasing machine size, users can enlarge the problem size to enhance system efficiency or cost-effectiveness.



4)      Technology scalability:
Ø  This refers to a system that can adapt to changes in building technologies, such as the component and networking technologies.
Ø  When scaling a system design with new technology one must consider three aspects: time, space, and heterogeneity.
1)      Time refers to generation scalability. When changing to new-generation processors, one must consider the impact to the motherboard, power supply, packaging and cooling, and so forth.
2)      Space is related to packaging and energy concerns. Technology scalability demands harmony and portability among suppliers.
3)      Heterogeneity refers to the use of hardware components or software packages from different vendors. Heterogeneity may limit the scalability.
2. Fault Tolerance and System Availability
In addition to performance, system availability and application flexibility are two other important  design goals in a distributed computing system.
System Availability:
Ø  HA (high availability) is desired in all clusters, grids, P2P networks, and cloud systems. A system is highly available if it has a long mean time to failure (MTTF) and a short mean time to repair (MTTR).
Ø  System availability is formally defined as follows:
System Availability =MTTF/ðMTTF +MTTRÞ (1.5)
Ø  System availability is attributed to many factors. All hardware, software, and network components may fail. Any failure that will pull down the operation of the entire system is called a single point of failure. In general, as a distributed system increases in size, availability decreases due to a higher chance of failure and a difficulty in isolating the failures.
Ø  Both SMP and MPP are very vulnerable with centralized resources under one OS. NUMA machines have improved in availability due to the use of multiple OSes.
Ø  Most clusters are designed to have HA with failover capability. Meanwhile, private clouds are created out of virtualized data centers
Ø  A grid is visualized as a hierarchical cluster of clusters. Grids have higher availability due to the isolation of faults. Therefore, clusters, clouds, and grids have decreasing
availability as the system increases in size. A P2P file-sharing network has the highest aggregation of client machines.


Network Threats and Data Integrity
Clusters, grids, P2P networks, and clouds demand security and copyright protection if they are to be accepted in todays digital society.
1.      Threats to Systems and Networks
Ø  Network viruses have threatened many users in widespread attacks. These incidents have created a worm epidemic by pulling down many routers and servers, and are responsible for the loss of billions of dollars in business, government, and services.
Ø  Loss of data integrity may be caused by user alteration, Trojan horses, and service spoofing attacks. A denial of service (DoS) results in a loss of system operation and Internet connections. Lack of authentication or authorization leads to attackersillegitimate use of computing resources. Open resources such as data centers, P2P networks, and grid and cloud infrastructures could become the next targets. Users need to protect clusters, grids, clouds, and P2P systems. Otherwise, users should not use or trust them for outsourced work.
2.      Security Responsibilities
Ø  Three security requirements are often considered: confidentiality, integrity, and availability for most Internet service providers and cloud users.
Ø  In the order of SaaS, PaaS, and IaaS, the providers gradually release the responsibility of security control to the cloud users. In summary, the SaaS model relies on the cloud provider to perform all security functions.
Ø  At the other extreme, the IaaS model wants the users to assume almost all security functions, but to leave availability in the hands of the providers. The PaaS model relies on the provider to maintain data integrity and availability, but burdens the user with confidentiality and privacy control.
Ø  We also need to deploy mechanisms to prevent online piracy and copyright violations of digital content. Security responsibilities are divided between cloud providers and users differently for the three cloud service models. The providers are totally responsible for platform availability.
Ø  The IaaS users are more responsible for the confidentiality issue. The IaaS providers are more responsible for data integrity. In PaaS and SaaS services, providers and users are equally responsible for preserving data integrity and confidentiality.

3.      Copyright Protection
Ø  Collusive piracy is the main source of intellectual property violations within the boundary of a P2P
Ø  network. Paid clients (colluders) may illegally share copyrighted content files with unpaid clients (pirates).
Ø  Online piracy has hindered the use of open P2P networks for commercial content delivery.
One can develop a proactive content poisoning scheme to stop colluders and pirates from alleged copyright infringements in P2P file sharing.
Ø  Pirates are detected in a timely manner with identity-based signatures and time stamped tokens. This scheme stops collusive piracy from occurring without hurting legitimate P2P clients.
4.      System Defense Technologies
Ø  Three generations of network defense technologies have appeared in the past. In the first generation, tools were designed to prevent or avoid intrusions. These tools usually manifested themselves as access control policies or tokens, cryptographic systems, and so forth. However, an intruder could always penetrate a secure system because there is always a weak link in the security provisioning process.
Ø  The second generation detected intrusions in a timely manner to exercise remedial actions. These techniques included firewalls, intrusion detection systems (IDSes), PKI services, reputation systems, and so on.
Ø  The third generation provides more intelligent responses to intrusions.

5. Data Protection Infrastructure

Ø  Security infrastructure is required to safeguard web and cloud services. At the user level, one
needs to perform trust negotiation and reputation aggregation over all users.
Ø  At the application end, we need to establish security precautions in worm containment and intrusion detection
4.  Energy Efficiency in Distributed Computing:
Ø  Primary performance goals in conventional parallel and distributed computing systems are high performance and high throughput, considering some form of performance reliability (e.g., fault tolerance and security).
Ø  These systems recently encountered new challenging issues including energy efficiency, and workload and resource outsourcing. These emerging issues are crucial not only on their own, but also for the sustainability of large-scale computing systems in general.

Ø  Protection of data centers demands integrated solutions. Energy consumption in parallel and distributed computing systems raises various monetary, environmental, and system performance issues.
Energy Consumption of Unused Servers
Ø  To run a server farm (data center) a company has to spend a huge amount of money for hardware, software, operational support, and energy every year.
Ø  Therefore, companies should thoroughly identify whether their installed server farm (more specifically, the volume of provisioned resources) is at an appropriate level, particularly in terms of utilization.
Ø  It was estimated in the past that, on average, one-sixth (15 percent) of the full-time servers in a company are left powered on without being actively used (i.e., they are idling) on a daily basis. This indicates that with 44 million servers in the world, around 4.7 million servers are not doing any useful work.
Reducing Energy in Active Servers
In addition to identifying unused/underutilized servers for energy savings, it is also necessary to apply appropriate techniques to decrease energy consumption in active distributed systems with negligible influence on their performance. Power management issues in distributed computing platforms can be categorized into four layers: the application layer, middleware layer, resource layer, and network layer.
1.      Application Layer:
Ø  Most user applications in science, business, engineering, and financial areas tend to increase a systems speed or quality. By introducing energy-aware applications, the challenge is to design sophisticated multilevel and multi-domain energy management applications without hurting performance.
Ø  The first step toward this end is to explore a relationship between performance and energy consumption. Indeed, an applications energy consumption depends strongly on the number
Ø  of instructions needed to execute the application and the number of transactions with the storage unit (or memory). These two factors (compute and storage) are correlated and they affect completion time.
2.      Middleware Layer:
Ø  The middleware layer acts as a bridge between the application layer and the resource layer. This layer provides resource broker, communication service, task analyzer, task scheduler, security  access, reliability control, and information service capabilities.
Ø  It is also responsible for applying energy-efficient techniques, particularly in task scheduling. Until recently, scheduling was aimed at minimizing makespan, that is, the execution time of a set of tasks. Distributed computing systems necessitate a new cost function covering both makespan and energy consumption.


3.      Resource Layer:
Ø  The resource layer consists of a wide range of resources including computing nodes and storage units. This layer generally interacts with hardware devices and the operating system; therefore, it is responsible for controlling all distributed resources in distributed computing systems.
Ø  Several mechanisms have been developed for more efficient power management of hardware and operating systems. The majority of them are hardware approaches particularly for processors.
Ø  Dynamic power management (DPM) and dynamic voltage-frequency scaling (DVFS) are two popular methods incorporated into recent computer hardware systems.
Ø  In DPM, hardware devices, such as the CPU, have the capability to switch from idle mode to one or more lower power modes.
Ø  In DVFS, energy savings are achieved based on the fact that the power consumption in CMOS circuits has a direct relationship with frequency and the square of the voltage supply. Execution time and power consumption are controllable by switching among different frequencies and voltages.
4.      Network Layer
Ø  Routing and transferring packets and enabling network services to the resource layer are the main responsibility of the network layer in distributed computing systems.
Ø  The major challenge to build energy-efficient networks is, again, determining how to measure, predict, and create a balance between energy consumption and performance. Two major challenges to designing energy-efficient networks are:
        The models should represent the networks comprehensively as they should give a full understanding of interactions among time, space, and energy.
        New, energy-efficient routing algorithms need to be developed. New, energy-efficient protocols should be developed against network attacks.
Ø  As information resources drive economic and social development, data centers become increasingly important in terms of where the information items are stored and processed, and where services are provided.
Ø  Data centers become another core infrastructure, just like the power grid and   transportation systems. Traditional data centers suffer from high construction and operational costs, complex resource management, poor usability, low security and reliability, and huge energy consumption. It is necessary to adopt new technologies in next-generation data-center designs.

Comments

Popular posts from this blog

CLOUD COMPUTING : GOOGLE FILE SYSTEM

cloud computing : virtualization