Abstract
The computer system architecture has been, and
always will be, significantly influenced by the underlying trends and
capabilities of hardware and software technologies. The transition from
electromechanical relays to vacuum tubes to transistors to integrated circuits
has driven fundamentally different trade-offs in the architecture of computer
systems. Additional advances in software, which includes the transition of the
predominant approach to programming from machine language to assembly language
to high-level procedural language to an object-oriented language, have also
resulted in new capabilities and design points. The impact of these
technologies on computer system architectures past, present, and future will be
explored and projected.
1.
Definitions
of Computer Architecture
Computer architecture is a set of rules
and methods that describe the functionality, organization, and implementation
of computer systems. Some definitions of architecture define it as describing
the capabilities and programming model of a computer but not a particular
implementation. In other definitions, computer architecture involves instruction set architecture design, microarchitecture design, logic design, and implementation.
Computer architecture can be divided into five
fundamental components: input/output, storage, communication, control, and processing.
In practice, each of these components (sometimes called subsystems) is
sometimes said to have an architecture, so, as usual, context contributes to
usage and meaning.
2.
History
of Computer Architecture
The first documented computer architecture
was in the correspondence between Charles Babbage and Ada Lovelace, describing
the analytical engine. When building the computer Z1 in 1936, Konrad Zuse described in
two patent applications for his future projects that machine instructions could
be stored in the same storage used for data, i.e. the stored-program concept. Two other early and important examples are:
·
John von Neumann's 1945 paper, First Draft of a Report on the
EDVAC, which described an
organization of logical elements.
·
Alan Turing's more detailed Proposed Electronic
Calculator for the Automatic
Computing Engine, also 1945 and which cited John von Neumann's paper.
The term “architecture” in computer literature
can be traced to the work of Lyle R. Johnson, Frederick P. Brooks, Jr., and Mohammad Usman Khan, all
members of the Machine Organization department in IBM’s main research center in
1959. Johnson had the opportunity to write a proprietary research communication
about the Stretch, an
IBM-developed supercomputer for Los
Alamos National Laboratory (at the time known as Los Alamos Scientific
Laboratory). To describe the level of detail for discussing the luxuriously
embellished computer, he noted that his description of formats, instruction
types, hardware parameters, and speed enhancements were at the level of “system
architecture” – a term that seemed more useful than “machine organization.
Computer architecture, like other
architecture is the art of determining the needs of the user of a structure
and then designing to meet those needs as effectively as possible within
economic and technological constraints.
3.
Trends
in Computer Architecture
Computer trends are
changes or evolutions in the ways that computers are used which become
widespread and integrated into popular thought with regard to these systems.
These movements often begin with one or two companies adopting or promoting a
new technology, which grabs the attention of others and becomes popular. Both
hardware and software can be a part of computer trends, such as the development
and proliferation of mobile devices including smartphones and tablets. Changes
in the Internet, the development of new websites, and the expansion of cloud computing models are likely to be similar software
trends throughout the early part of the 21st Century.
Much like changing fashions in clothing, trends in
computers indicate the types of technology or concepts that are popular at a
given time. This can occur in a number of ways, including a company introducing
new technology to a market and customers finding that they can use certain
products more effectively than others. As these changes happen, computer trends
typically evolve and grow over time, so that popular technology one year, maybe considered outdated the next. Identifying the next major trend, and finding
a way to get in on it ahead of time, can be substantially profitable for
companies that work with technology.
Computer trends often involve hardware and the
development or release of something new and innovative. The proliferation of
smartphones throughout the first decade of the 21st Century, for example, is a
major hardware trend that has changed the way in which many people access
information. Mobile phones had already been established in the year 2000 as a
major commodity and had gone beyond the niche item they may have been seen as
in the 1980s. The development of smartphones in the years that followed, and
their release as affordable products, established one of several major trends
in which portability became a marketing factor for hardware developers.
Different types of
software are often involved in computer trends since applications people use
tend to evolve and change over time. Although the Internet has been around
since the late 20th Century, the way in which it is used and how information
can be presented on it continues to change. Developments in Internet coding and
viewing continue to make its growth a major trend in the computer industry.
Computer trends with software often involve the way
in which information is accessed and shared. Systems like cloud computing and
similar methods for data storage and file sharing are likely to continue to
grow and develop throughout the 21st Century. New applications for
communicating, sharing information with family or business partners, and using
every facet of innovative hardware are going to be popular for years to come.
3.1.Major Trends Affecting Microprocessor Performance and Design:
In a competitive processor, some of
the major trends affecting microprocessor performance are:
·
Increasing number of
Cores
·
Clock Speed
·
Number of Transistors
3.2.Increasing Number of Cores:
Multi-core processors
are referred to as a single computing component with two or more independent
central processing unit called “cores”. The multi-core processor enables users
to have boosted performance, improved power consumption, and parallel processing
that allows multiple tasks to be performed simultaneously. The development of
microprocessors for desktops and laptops today is expanding from core i3, core
i5, and Core i7 presently. This results in using several chips in the CPUs. In
the year 2017, it is estimated that embedded processors shall sport 4,096
cores, servers shall have 512 cores and desktop chips shall be using 128 cores.
3.3.Clock Speed:
Clock speed is defined as the frequency at which a processor executes instructions and/or data is processed.
The clock speed is measured in megahertz (MHz) or gigahertz (GHz). It is a
quartz crystal that vibrates and sends beats or pulse to each component that
is synchronized with it. (PC computer notes, 2003). The speed of
microprocessors measured in megahertz (MHz) processes one million instructions
per second. Besides that, the microprocessor that runs in gigahertz (GHz), is
able to process a billion instructions per second.
In modern technology, most CPU runs in the gigahertz range. For instance, a 3GHz Microprocessor and a 3.6GHz is faster
than a 500MHz microprocessor as it six times slower. The speed of the computer
is fast when the frequency of the microprocessor is higher.
3.4.Number of Transistors:
The number of transistors available
on the microprocessor has a massive effect on the performance of the CPU. For instance, in microprocessor 8088, it takes about 15 clock cycles to execute
instructions, with this we can assume that on one 16-bit multiplication of the
8088 processors, it takes about 80 cycles.
According to Moore’s Law, the
number of transistors on a chip roughly doubles every two years. As a result,
the scale gets smaller and transistor counts increases at a regular pace to
provide improvements in integrated circuit functionalities and performance
while decreasing costs.
By increasing the number of
transistors, it allows a technology known as pipelining. The execution of
instruction overlaps in the pipelined architecture. For instance, it might take
five clock cycles to execute each of the instructions; the five instructions
may be in different stages for executions and we can deduce that one
instruction is completed at every clock cycle. Most modern processors have
multiple instruction decoders with its very own pipeline that allows multiple
instruction streams, where one instruction is completed at each clock cycle
with a lot of transistors used in the microprocessors.
3.5.Microprocessor Design Goals for Laptops, Servers, and
Desktops and Embedded System:
The microprocessor in laptops, servers,
desktops varies as they have unique forms varying from each other. Laptop is
small and portable; a version that functions as a computer for use anytime and
anywhere. The Microprocessor design goal is to emphasize power
consumption. A laptop uses battery power; it would be inconvenient for laptop
users to carry the battery adapter wherever they go and thus, the
microprocessor in a laptop ensures that it consumes lesser power compared to a
desktop computer. Besides that, the processors also help in cooling the laptops
as they produce a lot of heat when they are in use which might damage the
internal hardware of the laptop. To ensure that the laptops have the required
cooling requirement, the processors allow the laptop to lower the clock speed
and bus speed. Cooling requirements is also achieved when the processors make
the laptop to run in a lower operating voltage which also helps in less power
consumption.
A server is a computer or device on a network
that works together with the network resources. Generally, serves runs24*7
hours to function efficiently in a network and avoid disruption in the server
operations may be disastrous than the failure of a desktop computer. The
microprocessor design for a server ensures that the server’s uptime is stable,
always available and reliable to use by having larger cache memory. The cache
memory in the server is higher than the desktops and embedded systems. The
microprocessor design implemented for servers helps in controlling the heat
released; i.e. the microprocessor relative size for a server is 2U (3.5-in
thick) or 1U (1.75-in thick) in size and permits the servers to implement large
cooling system as it runs 24*7. Whereas, a desktop computer is also personal computer
that is used regularly at a single location and it is not portable. The
microprocessor design goal for a desktop also ensures that it supports job
scheduling and multi tasks an operation which helps it performs more than one
job at a time. The microprocessor design goal for an embedded system focuses on
power consumption. The power consumption of an embedded microprocessor is based
on the relative size of the microprocessor; i.e. embedded system uses a very
small amount of power which reduces the power consumption of the system. The
microprocessor design goal of an embedded system would be memory management
through code density; which is the amount of space engaged by executable
programs in an embedded system. The microprocessor is aimed to lower the code
density.
3.6.Optimizing Performance on POWER8 Processor-Based Systems:
The optimization performance
guidance is organized into three broad categories:
3.7.Lightweight Tuning and Optimization Guidelines:
Lightweight tuning covers simple
prescriptive steps for tuning application performance onPOWER8 processor-based
systems. These simple steps can be carried out without detailed knowledge of
the internals of the application that is being optimized and usually without
modifying the application source code. Simple system utilization and
performance tools are used for understanding and improving your application
performance.
3.8.Deployment Guidelines:
Deployment guidelines cover tuning
considerations that are related to the: Configuration of a POWER8
processor-based system to deliver the best. This section presents some
guidelines and preferred practices. Understanding logical partitions (LPARs),
energy management, I/O configurations, and using multi-threaded cores are
examples of typical system considerations that can impact application
performance.
3.9.Deep Performance Optimization Guidelines:
Deep performance analysis covers
performance tools and general strategies for identifying and fixing application
bottlenecks. This type of analysis requires more familiarity with performance
tools and analysis techniques, sometimes requiring a deeper understanding of
the application internals, and often requiring a more dedicated and lengthy
effort. Another approach that exploits the economies of scale by using
commodity components is represented by Rack Scale and the Open Compute project.
The Rack Scale architecture is usually referred to by three key concepts. the
disaggregation of the compute, memory and storage resources; the use of silicon
photonics as a low-latency, high-speed fabric; and, finally, software that
combines disaggregated hardware capacity over the fabric to create ‘pooled
systems’. Rack scale is not well defined, it can refer to a large unit, filling
part of a rack; it may also refer to a single rack and it sometimes also refers
to a small number of racks. Several commercial products have tried to address
the growing computing needs. These machines range in size from one to10 rack
units and sometimes contain more than 1000 cores, divided between many small
server units.
3.10.
Limitations of
Current-Day Architectures:
The computing industry historically
relied on increased microprocessor performance as transistor density doubled,
while power density limits led to multi-processing. Common servers today
consist of multiple processors, each consisting of multiple cores, and
increasingly a single machine runs a hypervisor to support multiple virtual
machines (VMs). A hypervisor provides to each VM an emulation of the resources
of a physical computer. Upon each VM, a more typical operating system and
application software may operate. The hypervisor allocates each VM memory and
processor time. While a hypervisor gives access to other resources, e.g.
network and storage, limited guarantees (or constraints) are made on their
usage or availability. While VMs are popular, permitting consolidation and
increasing the mean utilization of machines, the hypervisor has limited ability
to isolate competing resource use or mitigate the impact of usage between VMs.
Resource isolation is not the only
challenge for scaling computing architectures. General purpose central
processing units (CPUs) are not designed to handle the high packet rates of new
networks. Doing useful work on a 100 Gbps data stream exceeds the limits of
today’s processors. This is despite the modern CPU intra-core/cache ring-bus
achieving a peak interface rate of3Tbps, and a peak aggregate throughput that
grows proportionally with the number of cores. A data stream of 100 Gbps, with
64 byte packets, is a packet rate of 148.8M packets per second; thus a 3GHz CPU
has only 20 cycles per packet: significantly less than required even just to
send or receive. The inefficiency of packet processing by the CPU remains a
great challenge, with a current tendency to offload to an accelerator on the
network interface itself. On February 11, 2016.
In-memory processing and the use of
remote direct memory access as the underlying communications system is a
growing trend in large-scale computing. Architectures such as scale out non-uniform
memory access (NUMA) for rack-scale computers are very sensitive to latency and
thus have latency-reducing designs. However, they have limited scalability due
to intrinsic physical limitations of the propagation delay among different
elements of the system. A fiber used for interserver connection has a
propagation delay of 5 ns/m; thus, within a standard height rack, the
propagation delay between the top and bottom rack units is approximately 9 ns,
and the round-trip time to fetch remote data is 18 ns.
While for current generation
architectures this order of latency is reasonable, it indicates scale-out NUMA
machines at data-centre scale (with each round-trip taking at least 1µs) are
not plausible, as the round-trip latency alone is many magnitudes the
time-scale for memory retrieval off local random access memory or the latency
contribution of any other element in the system.
Photonics has advanced hand in hand with
network-capacity growth. However, photonics has its own limitations the minimum
size for photonic devices is determined by the wavelength of light, e.g.
optical waveguides must be larger than one half of the wavelength of the light
in use.
Limitations are faced at several
levels in the system hierarchy: from the practical limitations of physics to
the increasing impedance mismatch between processor clock speed and network
data rates.
3.11. The Gap Between Networking and Computing:
The silicon vendors for both
computing and networking devices operate in the same technological ecosystem.
CPU manufacturers often had access to the newest fabrication processes and the
leading edge of shrinking gate size. Furthermore, in the past 20 years, the
interconnect rate of networking devices doubled every 18 months, whereas
computing system I/O throughput doubled approximately every 24 months. At the
interface between network and processor PCI-Express, the dominant Processor-I/O
inter connect, the third generation of which was released in 2010, achieves 128
Gbps over 16 serial links. The fourth generation expected in 2016 aims to
double this bandwidth. The limitation of existing computing interconnects vexes
major CPU vendors.
General purposes processors are
extremely complex devices whose traits cannot be limited to specifications such
as data path bandwidth or I/O inter connect. Subsequently, we evaluate the
performance of CPUs using the Standard Performance Evaluation Corporation
(SPEC) CPU2006benchmark and contrast this with the improvement in
network-switching devices and computing interconnect.
4.
Summary
Computer architectures have evolved to
optimally exploit the underlying hardware and software technologies to achieve
increasing levels of performance. Computer performance has increased faster
than the underlying increase in performance of the digital switches from which
they are designed by exploiting the increase in density of digital switches and
storage. This is accomplished by the use of replication and speculative
execution within systems through the exploitation of parallelism. Three levels
of parallelism have been exploited: instruction-level parallelism,
task/process-level parallelism, and algorithmic parallelism. These three levels
of parallelism are not mutually exclusive and will likely all be used in
concert to improve performance in future systems. The limit of replication and
speculation is to compute all possible outcomes simultaneously and to select
the right final answer while discarding all of the other computations. This is
analogous to the approach DNA uses to improve life on our planet. All living
entities are a parallel computation of which only the "best" answers
survive to the next round of computation. These are the concepts driving
research in the area of using "biological" or “genetic” algorithms
for application design as well as the research into using DNA to do
computations. In the limit, molecular-level computing provides huge increases
in density over current systems, but much more limited improvements in raw
performance. Thus, we can expect continued advancements in those approaches and
architectures that exploit density and parallelism over raw performance.
5.
Reference
3.
Er. Suvash
Chandra Gautam (Computer Operator Google)
0 Comments