Ticker

6/recent/ticker-posts

A Report of Trends in Computer Architecture


Abstract

The computer system architecture has been, and always will be, significantly influenced by the underlying trends and capabilities of hardware and software technologies. The transition from electromechanical relays to vacuum tubes to transistors to integrated circuits has driven fundamentally different trade-offs in the architecture of computer systems. Additional advances in software, which includes the transition of the predominant approach to programming from machine language to assembly language to high-level procedural language to an object-oriented language, have also resulted in new capabilities and design points. The impact of these technologies on computer system architectures past, present, and future will be explored and projected.

  

1.      Definitions of Computer Architecture

Computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer systems. Some definitions of architecture define it as describing the capabilities and programming model of a computer but not a particular implementation. In other definitions, computer architecture involves instruction set architecture design, microarchitecture design, logic design, and implementation.

Computer architecture can be divided into five fundamental components: input/output, storage, communication, control, and processing. In practice, each of these components (sometimes called subsystems) is sometimes said to have an architecture, so, as usual, context contributes to usage and meaning.  

2.      History of Computer Architecture 

The first documented computer architecture was in the correspondence between Charles Babbage and Ada Lovelace, describing the analytical engine. When building the computer Z1 in 1936, Konrad Zuse described in two patent applications for his future projects that machine instructions could be stored in the same storage used for data, i.e. the stored-program concept. Two other early and important examples are:

·        John von Neumann's 1945 paper, First Draft of a Report on the EDVAC, which described an organization of logical elements.

·         Alan Turing's more detailed Proposed Electronic Calculator for the Automatic Computing Engine, also 1945 and which cited John von Neumann's paper.

The term “architecture” in computer literature can be traced to the work of Lyle R. Johnson, Frederick P. Brooks, Jr., and Mohammad Usman Khan, all members of the Machine Organization department in IBM’s main research center in 1959. Johnson had the opportunity to write a proprietary research communication about the Stretch, an IBM-developed supercomputer for Los Alamos National Laboratory (at the time known as Los Alamos Scientific Laboratory). To describe the level of detail for discussing the luxuriously embellished computer, he noted that his description of formats, instruction types, hardware parameters, and speed enhancements were at the level of “system architecture” – a term that seemed more useful than “machine organization.

Computer architecture, like other architecture is the art of determining the needs of the user of a structure and then designing to meet those needs as effectively as possible within economic and technological constraints.  

3.      Trends in Computer Architecture

Computer trends are changes or evolutions in the ways that computers are used which become widespread and integrated into popular thought with regard to these systems. These movements often begin with one or two companies adopting or promoting a new technology, which grabs the attention of others and becomes popular. Both hardware and software can be a part of computer trends, such as the development and proliferation of mobile devices including smartphones and tablets. Changes in the Internet, the development of new websites, and the expansion of cloud computing models are likely to be similar software trends throughout the early part of the 21st Century.

Much like changing fashions in clothing, trends in computers indicate the types of technology or concepts that are popular at a given time. This can occur in a number of ways, including a company introducing new technology to a market and customers finding that they can use certain products more effectively than others. As these changes happen, computer trends typically evolve and grow over time, so that popular technology one year, maybe considered outdated the next. Identifying the next major trend, and finding a way to get in on it ahead of time, can be substantially profitable for companies that work with technology.

Computer trends often involve hardware and the development or release of something new and innovative. The proliferation of smartphones throughout the first decade of the 21st Century, for example, is a major hardware trend that has changed the way in which many people access information. Mobile phones had already been established in the year 2000 as a major commodity and had gone beyond the niche item they may have been seen as in the 1980s. The development of smartphones in the years that followed, and their release as affordable products, established one of several major trends in which portability became a marketing factor for hardware developers.

Different types of software are often involved in computer trends since applications people use tend to evolve and change over time. Although the Internet has been around since the late 20th Century, the way in which it is used and how information can be presented on it continues to change. Developments in Internet coding and viewing continue to make its growth a major trend in the computer industry.

Computer trends with software often involve the way in which information is accessed and shared. Systems like cloud computing and similar methods for data storage and file sharing are likely to continue to grow and develop throughout the 21st Century. New applications for communicating, sharing information with family or business partners, and using every facet of innovative hardware are going to be popular for years to come.

 

3.1.Major Trends Affecting Microprocessor Performance and Design:

 

In a competitive processor, some of the major trends affecting microprocessor performance are:

·         Increasing number of Cores  

·         Clock Speed 

·         Number of Transistors

 

3.2.Increasing Number of Cores:

 

Multi-core processors are referred to as a single computing component with two or more independent central processing unit called “cores”. The multi-core processor enables users to have boosted performance, improved power consumption, and parallel processing that allows multiple tasks to be performed simultaneously. The development of microprocessors for desktops and laptops today is expanding from core i3, core i5, and Core i7 presently. This results in using several chips in the CPUs. In the year 2017, it is estimated that embedded processors shall sport 4,096 cores, servers shall have 512 cores and desktop chips shall be using 128 cores.

 

3.3.Clock Speed:

Clock speed is defined as the frequency at which a processor executes instructions and/or data is processed. The clock speed is measured in megahertz (MHz) or gigahertz (GHz). It is a quartz crystal that vibrates and sends beats or pulse to each component that is synchronized with it. (PC computer notes, 2003). The speed of microprocessors measured in megahertz (MHz) processes one million instructions per second. Besides that, the microprocessor that runs in gigahertz (GHz), is able to process a billion instructions per second.

 In modern technology, most CPU runs in the gigahertz range. For instance, a 3GHz Microprocessor and a 3.6GHz is faster than a 500MHz microprocessor as it six times slower. The speed of the computer is fast when the frequency of the microprocessor is higher.

 

3.4.Number of Transistors:

 

The number of transistors available on the microprocessor has a massive effect on the performance of the CPU. For instance, in microprocessor 8088, it takes about 15 clock cycles to execute instructions, with this we can assume that on one 16-bit multiplication of the 8088 processors, it takes about 80 cycles.

 

According to Moore’s Law, the number of transistors on a chip roughly doubles every two years. As a result, the scale gets smaller and transistor counts increases at a regular pace to provide improvements in integrated circuit functionalities and performance while decreasing costs.

 

By increasing the number of transistors, it allows a technology known as pipelining. The execution of instruction overlaps in the pipelined architecture. For instance, it might take five clock cycles to execute each of the instructions; the five instructions may be in different stages for executions and we can deduce that one instruction is completed at every clock cycle. Most modern processors have multiple instruction decoders with its very own pipeline that allows multiple instruction streams, where one instruction is completed at each clock cycle with a lot of transistors used in the microprocessors.

 

3.5.Microprocessor Design Goals for Laptops, Servers, and Desktops and Embedded System:

 

The microprocessor in laptops, servers, desktops varies as they have unique forms varying from each other. Laptop is small and portable; a version that functions as a computer for use anytime and anywhere. The Microprocessor design goal is to emphasize power consumption. A laptop uses battery power; it would be inconvenient for laptop users to carry the battery adapter wherever they go and thus, the microprocessor in a laptop ensures that it consumes lesser power compared to a desktop computer. Besides that, the processors also help in cooling the laptops as they produce a lot of heat when they are in use which might damage the internal hardware of the laptop. To ensure that the laptops have the required cooling requirement, the processors allow the laptop to lower the clock speed and bus speed. Cooling requirements is also achieved when the processors make the laptop to run in a lower operating voltage which also helps in less power consumption.

 

 A server is a computer or device on a network that works together with the network resources. Generally, serves runs24*7 hours to function efficiently in a network and avoid disruption in the server operations may be disastrous than the failure of a desktop computer. The microprocessor design for a server ensures that the server’s uptime is stable, always available and reliable to use by having larger cache memory. The cache memory in the server is higher than the desktops and embedded systems. The microprocessor design implemented for servers helps in controlling the heat released; i.e. the microprocessor relative size for a server is 2U (3.5-in thick) or 1U (1.75-in thick) in size and permits the servers to implement large cooling system as it runs 24*7. Whereas, a desktop computer is also personal computer that is used regularly at a single location and it is not portable. The microprocessor design goal for a desktop also ensures that it supports job scheduling and multi tasks an operation which helps it performs more than one job at a time. The microprocessor design goal for an embedded system focuses on power consumption. The power consumption of an embedded microprocessor is based on the relative size of the microprocessor; i.e. embedded system uses a very small amount of power which reduces the power consumption of the system. The microprocessor design goal of an embedded system would be memory management through code density; which is the amount of space engaged by executable programs in an embedded system. The microprocessor is aimed to lower the code density.

 

3.6.Optimizing Performance on POWER8 Processor-Based Systems:

 

The optimization performance guidance is organized into three broad categories:

 

3.7.Lightweight Tuning and Optimization Guidelines:

Lightweight tuning covers simple prescriptive steps for tuning application performance onPOWER8 processor-based systems. These simple steps can be carried out without detailed knowledge of the internals of the application that is being optimized and usually without modifying the application source code. Simple system utilization and performance tools are used for understanding and improving your application performance.

 

3.8.Deployment Guidelines:

Deployment guidelines cover tuning considerations that are related to the: Configuration of a POWER8 processor-based system to deliver the best. This section presents some guidelines and preferred practices. Understanding logical partitions (LPARs), energy management, I/O configurations, and using multi-threaded cores are examples of typical system considerations that can impact application performance.

 

 

 

3.9.Deep Performance Optimization Guidelines:

Deep performance analysis covers performance tools and general strategies for identifying and fixing application bottlenecks. This type of analysis requires more familiarity with performance tools and analysis techniques, sometimes requiring a deeper understanding of the application internals, and often requiring a more dedicated and lengthy effort. Another approach that exploits the economies of scale by using commodity components is represented by Rack Scale and the Open Compute project. The Rack Scale architecture is usually referred to by three key concepts. the disaggregation of the compute, memory and storage resources; the use of silicon photonics as a low-latency, high-speed fabric; and, finally, software that combines disaggregated hardware capacity over the fabric to create ‘pooled systems’. Rack scale is not well defined, it can refer to a large unit, filling part of a rack; it may also refer to a single rack and it sometimes also refers to a small number of racks. Several commercial products have tried to address the growing computing needs. These machines range in size from one to10 rack units and sometimes contain more than 1000 cores, divided between many small server units.

 

3.10.  Limitations of Current-Day Architectures:

The computing industry historically relied on increased microprocessor performance as transistor density doubled, while power density limits led to multi-processing. Common servers today consist of multiple processors, each consisting of multiple cores, and increasingly a single machine runs a hypervisor to support multiple virtual machines (VMs). A hypervisor provides to each VM an emulation of the resources of a physical computer. Upon each VM, a more typical operating system and application software may operate. The hypervisor allocates each VM memory and processor time. While a hypervisor gives access to other resources, e.g. network and storage, limited guarantees (or constraints) are made on their usage or availability. While VMs are popular, permitting consolidation and increasing the mean utilization of machines, the hypervisor has limited ability to isolate competing resource use or mitigate the impact of usage between VMs.

Resource isolation is not the only challenge for scaling computing architectures. General purpose central processing units (CPUs) are not designed to handle the high packet rates of new networks. Doing useful work on a 100 Gbps data stream exceeds the limits of today’s processors. This is despite the modern CPU intra-core/cache ring-bus achieving a peak interface rate of3Tbps, and a peak aggregate throughput that grows proportionally with the number of cores. A data stream of 100 Gbps, with 64 byte packets, is a packet rate of 148.8M packets per second; thus a 3GHz CPU has only 20 cycles per packet: significantly less than required even just to send or receive. The inefficiency of packet processing by the CPU remains a great challenge, with a current tendency to offload to an accelerator on the network interface itself. On February 11, 2016.

In-memory processing and the use of remote direct memory access as the underlying communications system is a growing trend in large-scale computing. Architectures such as scale out non-uniform memory access (NUMA) for rack-scale computers are very sensitive to latency and thus have latency-reducing designs. However, they have limited scalability due to intrinsic physical limitations of the propagation delay among different elements of the system. A fiber used for interserver connection has a propagation delay of 5 ns/m; thus, within a standard height rack, the propagation delay between the top and bottom rack units is approximately 9 ns, and the round-trip time to fetch remote data is 18 ns.

 

While for current generation architectures this order of latency is reasonable, it indicates scale-out NUMA machines at data-centre scale (with each round-trip taking at least 1µs) are not plausible, as the round-trip latency alone is many magnitudes the time-scale for memory retrieval off local random access memory or the latency contribution of any other element in the system.

 

 Photonics has advanced hand in hand with network-capacity growth. However, photonics has its own limitations the minimum size for photonic devices is determined by the wavelength of light, e.g. optical waveguides must be larger than one half of the wavelength of the light in use.

 

Limitations are faced at several levels in the system hierarchy: from the practical limitations of physics to the increasing impedance mismatch between processor clock speed and network data rates.

 

3.11.  The Gap Between Networking and Computing:

The silicon vendors for both computing and networking devices operate in the same technological ecosystem. CPU manufacturers often had access to the newest fabrication processes and the leading edge of shrinking gate size. Furthermore, in the past 20 years, the interconnect rate of networking devices doubled every 18 months, whereas computing system I/O throughput doubled approximately every 24 months. At the interface between network and processor PCI-Express, the dominant Processor-I/O inter connect, the third generation of which was released in 2010, achieves 128 Gbps over 16 serial links. The fourth generation expected in 2016 aims to double this bandwidth. The limitation of existing computing interconnects vexes major CPU vendors.

 

General purposes processors are extremely complex devices whose traits cannot be limited to specifications such as data path bandwidth or I/O inter connect. Subsequently, we evaluate the performance of CPUs using the Standard Performance Evaluation Corporation (SPEC) CPU2006benchmark and contrast this with the improvement in network-switching devices and computing interconnect.  

 

4.      Summary

Computer architectures have evolved to optimally exploit the underlying hardware and software technologies to achieve increasing levels of performance. Computer performance has increased faster than the underlying increase in performance of the digital switches from which they are designed by exploiting the increase in density of digital switches and storage. This is accomplished by the use of replication and speculative execution within systems through the exploitation of parallelism. Three levels of parallelism have been exploited: instruction-level parallelism, task/process-level parallelism, and algorithmic parallelism. These three levels of parallelism are not mutually exclusive and will likely all be used in concert to improve performance in future systems. The limit of replication and speculation is to compute all possible outcomes simultaneously and to select the right final answer while discarding all of the other computations. This is analogous to the approach DNA uses to improve life on our planet. All living entities are a parallel computation of which only the "best" answers survive to the next round of computation. These are the concepts driving research in the area of using "biological" or “genetic” algorithms for application design as well as the research into using DNA to do computations. In the limit, molecular-level computing provides huge increases in density over current systems, but much more limited improvements in raw performance. Thus, we can expect continued advancements in those approaches and architectures that exploit density and parallelism over raw performance.

5.      Reference

1.      https://www.wikipedia.org

2.      https://www.google.com

3.      Er. Suvash Chandra Gautam (Computer Operator Google)

 

Post a Comment

0 Comments