# CENG 3420 Computer Organization and Design **Lecture 01: Introduction** Bei Yu CENG3420 L01 Intro.1 Spring 2016 # **Grading Information** Grade determinates | <ul><li>Attendance</li></ul> | <b>5%</b> | | |-------------------------------------------------------|-----------|--| | <ul><li>Homework</li></ul> | 10% | | | <ul><li>Two Quizzes (Feb. 18 &amp; Mar. 21)</li></ul> | 15% | | | <ul><li>Three Labs (Individual project)</li></ul> | 30% | | | <ul> <li>Final Exam</li> </ul> | 40% | | - Late submission per day is subject to 10% of penalty. - What's new: Q&A bonus \(\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\ti}\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\text{\texit{\texi\texi{\text{\texi\text{\text{\texit{\texi\tie\tii}\titt{\tex{\texi{\texi{\texi{\texi{\texi{\texi{\texi{\texi{\texi{\texi{\tet - A student must gain at least 40% of the full marks in each part in order to pass the course. CENG3420 L01 Intro.2 Spring 2016 # **General References** #### ■ Textbook: Computer Organization and Design, 5th Edition ©2013 Soft copy, amazon.cn, or amazon.com #### Manuals: - LC-3 Instruction Set Architecture (ISA) - Lab tutorials (slides) #### □ Slides: - on the course web page before lecture - summary may be uploaded afterwards CENG3420 L01 Intro.3 Spring 2016 ## **Course Administration** □ Instructor: Bei Yu byu@cse.cuhk.edu.hk • **Office**: SHB 914 Office Hrs: H14:00-16:00 ## □ TA: Wen Zongwzong@cse.cuhk.edu.hk Yichen Wang xblwyc@163.com CENG3420 L01 Intro.4 Spring 2016 ## **Course Contents** - Introduction to the major components of a computer system, how they function together in executing a program. - Introduction to CPU datapath and control unit design - Introduction to techniques to improve performance and energy-efficiency of computer systems - Introduction to multiprocessor architecture To learn what determines the capabilities and performance of computer systems and to understand the interactions between the computer's architecture and its software so that future software designers (compiler writers, operating system designers, database programmers, application programmers, ...) can achieve the best cost-performance trade-offs and so that future architects understand the effects of their design choices on software. CENG3420 L01 Intro.5 Spring 2016 # Why Learn This Stuff? - You want to call yourself a "computer scientist/engineer" - You want to build HW/SW people use (so need performance/power) - You need to make a purchasing decision or offer "expert" advice - Both hardware and software affect performance/power - Algorithm determines number of source-level statements - Language/compiler/architecture determine the number of machine-level instructions - Processor/memory determine how fast and how power-hungry machine-level instructions are executed CENG3420 L01 Intro.6 Spring 2016 # What You Should Already Know - Basic logic design & machine organization - logical minimization, FSMs, component design - processor, memory, I/O - Create, assemble, run, debug programs in an assembly language - Will be introduced in tutorial - Create, compile, and run C (C++, Java) programs - Create, organize, and edit files and run programs on Unix/Linux - Create, simulate, and debug hardware structures - will be introduced in tutorial CENG3420 L01 Intro.7 Spring 2016 # **Computer Organization and Design** - This course is all about how computers work - But what do we mean by a computer? - Different types: embedded, laptop, desktop, server - Different uses: automobiles, graphics, finance, genomics... - Different manufacturers: Intel, Apple, IBM, Sony, Oracle... - Different underlying technologies and different costs! - Analogy: Consider a course on "automotive vehicles" - Many similarities from vehicle to vehicle (e.g., wheels) - Huge differences from vehicle to vehicle (e.g., gas vs. electric) - Best way to learn: - Focus on a specific instance and learn how it works - While learning general principles and historical perspectives CENG3420 L01 Intro.8 Spring 2016 # **A Computer** zZounds Are there other kind of computers? CENG3420 L01 Intro.9 Spring 2016 # **Classes of Computers** # Desktop computers Designed to deliver good performance to a single user at low cost usually executing 3<sup>rd</sup> party software, usually incorporating a graphics display, a keyboard, and a mouse #### Servers Used to run larger programs for multiple, simultaneous users typically accessed only via a network and that places a greater emphasis on dependability and (often) security ## Supercomputers A high performance, high cost class of servers with hundreds to thousands of processors, terabytes of memory and petabytes of storage that are used for high-end scientific and engineering applications ## Embedded computers (processors) A computer inside another device used for running one predetermined application CENG3420 L01 Intro.10 Spring 2016 # **Supercomputers** - □ Tianhe-2 (MilkyWay-2, 天河-2) - Over 3 million cores - Power: 17.6 MW (24 MW with cooling) - Speed: 33.86 PFLOPS (peta = 10<sup>15</sup>) CENG3420 L01 Intro.11 Spring 2016 # **Embedded Computers in You Car** CENG3420 L01 Intro.12 Spring 2016 ## **PostPC Era** - Personal Mobile Device (PMD) - Battery-operated device with wireless connectivity - Warehouse Scale Computer (WSC) - Datacenter containing hundreds of thousands of servers providing software as a service (SaaS) CENG3420 L01 Intro.13 Spring 2016 # **Growth in Cell Phone Sales (Embedded)** embedded growth >> desktop growth Where else are embedded processors found? CENG3420 L01 Intro.14 Spring 2016 # **The Evolution of Computer Hardware** # When was the first transistor invented? **1947** - the bi-polar transistor – by Bardeen *et.al* at Bell Laboratories CENG3420 L01 Intro.15 Spring 2016 # **The Evolution of Computer Hardware** # When was the first IC (integrated circuit) invented? **1958**, by Jack Kilby@Texas Instruments, by hand, several transistors, resistors and capacitors on a single substrate CENG3420 L01 Intro.16 Spring 2016 # **The Evolution of Computer Hardware** # ■ When was the first Microprocessor? **1971**, Intel 4004 CENG3420 L01 Intro.17 Spring 2016 # The IC Manufacturing Process □ Yield: proportion of working dies per wafer CENG3420 L01 Intro.18 Spring 2016 # **AMD Opteron X2 Wafer** 300mm wafer, 117 chips, 90nm technology CENG3420 L01 Intro.19 Spring 2016 # **Integrated Circuit Cost** Cost per die = $$\frac{\text{Cost per wafer}}{\text{Dies per wafer} \times \text{Yield}}$$ Dies per wafer $\approx \text{Wafer area/Die area}$ Yield = $\frac{1}{(1+(\text{Defects per area} \times \text{Die area/2}))^2}$ - Nonlinear relation to area and defect rate - Wafer cost and area are fixed - Defect rate determined by manufacturing process - Die area determined by architecture and circuit design CENG3420 L01 Intro.20 Spring 2016 # **Impacts of Advancing Technology** Processor logic capacity: increases about 30% per year performance: 2x every 1.5 years Memory DRAM capacity: 4x every 3 years, about 60% per year memory speed: 1.5x every 10 years cost per bit: decreases about 25% per year Disk capacity: increases about 60% per year CENG3420 L01 Intro.21 Spring 2016 #### Moore's Law for CPUs and DRAMs From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CENG3420 L01 Intro.22 Spring 2016 # Main driver: device scaling ... From: "Facing the Hot Chips Challenge Again", Bill Holt, Intel, presented at Hot Chips 17, 2005. CENG3420 L01 Intro.23 Spring 2016 CENG3420 L01 Intro.24 Spring 2016 # **Technology Scaling Road Map (ITRS)** | Year | 2004 | 2006 | 2008 | 2010 | 2012 | |---------------------|------|------|------|------|------| | Feature size (nm) | 90 | 65 | 45 | 32 | 22 | | Intg. Capacity (BT) | 2 | 4 | 6 | 16 | 32 | #### Fun facts about 45nm transistors - 30 million can fit on the head of a pin - You could fit more than 2,000 across the width of a human hair - If car prices had fallen at the same rate as the price of a single transistor has since 1968, a new car today would cost about 1 cent CENG3420 L01 Intro.25 Spring 2016 # **Highest Clock Rate of Intel Processors** - Due to process improvements - Deeper pipeline - Circuit design techniques What if the exponential increase had kept up? Why not? CENG3420 L01 Intro.26 Spring 2016 ## **EX: Power Issue** # Power = Capacitive load × Voltage<sup>2</sup> × Frequency ■ Example: for a simple processor, if capacitive load is reduced by 15%, voltage is reduced by 15%, maintain the same frequency, how much power consumption can be reduced? Note: here we only consider dynamic power, but not static power CENG3420 L01 Intro.27 Spring 2016 # A Sea Change is at Hand - The power challenge has forced a change in the design of microprocessors - Since 2002 the rate of improvement in the response time of programs on desktop computers has slowed from a factor of 1.5 per year to less than a factor of 1.2 per year As of 2006 all desktop and server companies are shipping microprocessors with multiple processors – <u>cores – per chip</u> | Product | AMD<br>Barcelona | Intel<br>Nehalem | IBM Power 6 | Sun Niagara<br>2 | |----------------|------------------|------------------|-------------|------------------| | Cores per chip | 4 | 4 | 2 | 8 | | Clock rate | ~2.5 GHz | ~2.5 GHz | 4.7 GHz | 1.4 GHz | | Power | 120 W | ~100 W | ~100 W | 94 W | □ Plan of record is to double the number of cores per chip per generation (about every two years) add two cores CENG3420 L01 Intro.28 Spring 2016 ## **Intel Core i7 Processor** 45nm technology, 18.9mm $\times$ 13.6mm, 0.73billion transistors, 2008 CENG3420 L01 Intro.29 Spring 2016 # What is a Computer? - Components: - processor (datapath, control) - input (mouse, keyboard) - output (display, printer) - memory (cache (SRAM), main memory (DRAM), disk drive, CD/DVD) - network - Our primary focus: the processor (datapath and control) and its interaction with memory systems - Implemented using tens/hundreds of millions of transistors - Impossible to understand by looking at each transistor • We need abstraction! CENG3420 L01 Intro.30 Spring 2016 # **Major Components of a Computer** CENG3420 L01 Intro.31 Spring 2016 # **Machine Organization** - Capabilities and performance characteristics of the principal Functional Units (FUs) - e.g., register file, ALU, multiplexors, memories, ... - The ways those FUs are interconnected - e.g., buses - Logic and means by which information flow between FUs is controlled - □ The machine's Instruction Set Architecture (ISA) - Register Transfer Level (RTL) machine description CENG3420 L01 Intro.32 Spring 2016 # **Processor Organization** - Control needs to have circuitry to - Decide which is the next instruction and input it from memory - Decode the instruction - Issue signals that control the way information flows between datapath components - Control what operations the datapath's functional units perform - Datapath needs to have circuitry to - Execute instructions functional units (e.g., adder) and storage locations (e.g., register file) - Interconnect the functional units so that the instructions can be executed as required - Load data from and store data to memory CENG3420 L01 Intro.33 Spring 2016 ## **Below the Program** - System software - Operating system supervising program that interfaces the user's program with the hardware (e.g., Linux, iOS, Windows) - Handles basic input and output operations - Allocates storage and memory - Provides for protected sharing among multiple applications - Compiler translate programs written in a high-level language (e.g., C, Java) into instructions that the hardware can execute CENG3420 L01 Intro.34 Spring 2016 # **Advantages of Higher-Level Languages?** - Higher-level languages - Allow the programmer to think in a more natural language and for their intended use (Fortran for scientific computation, Cobol for business programming, Lisp for symbol manipulation, Java for web programming, ...) - Improve programmer productivity more understandable code that is easier to debug and validate - Improve program maintainability - Allow programs to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine) - Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine - As a result, very little programming is done today at the assembler level You can become programmers programming programs CENG3420 L01 Intro.35 that program programs! Spring 2016 # **Below the Program** High-level language program (in C) Assembly language program (for MIPS) ``` swap: sll $2, $5, 2 add $2, $4, $2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 ``` Machine (object) code (for MIPS) ``` 000000 00000 00101 000100001000000 000000 00100 00010 000100000100000 ``` one-to-one C compiler assembler one-to-many CENG3420 L01 Intro.36 Spring 2016 ## **Below the Program** High-level language program (in C) ``` swap (int v[], int k) ``` . . . Assembly language program (for MIPS) Machine (object) code (for MIPS) ``` 000000 00000 00101 0001000010000000 000100000100000 000000 00100 00010 100011 00010 01111 0000000000000000 0000000000000100 100011 00010 10000 10000 00000000000000000 101011 00010 00010 0000000000000100 101011 00000 0000000000001000 000000 ``` Max # of operations? CENG3420 L01 Intro.37 Spring 2016 ## **Below the Program** □ High-level language program (in C) ``` swap (int v[], int k) ``` . . . Assembly language program (for MIPS) Machine (object) code (for MIPS) Max # of operations? CENG3420 L01 Intro.38 Spring 2016 # **Input Device Inputs Object Code** ``` 000000 00000 00101 00010000100000000 000000 00100 000100000000000000 100011 00010 01111 00000000000000000 100011 00010 10000 000000000000000000 101011 00010 01111 000000000000000000 101011 00010 01111 000000000000000000 000000 11111 00000 000000000000000000 ``` CENG3420 L01 Intro.39 Spring 2016 ## **Object Code Stored in Memory** CENG3420 L01 Intro.40 Spring 2016 ### **Processor Fetches an Instruction** ## Processor fetches an instruction from memory CENG3420 L01 Intro.41 Spring 2016 ## **Control Decodes the Instruction** # Control decodes the instruction to determine what to execute CENG3420 L01 Intro.42 Spring 2016 ## **Datapath Executes the Instruction** # Datapath executes the instruction as directed by control CENG3420 L01 Intro.43 Spring 2016 # **What Happens Next?** #### **Processor** **Control** Datapath #### **Memory** 000000 00000 00101 0001000010000000 000000 00100 00010000000000000 100011 00010 01111 00000000000000000 100011 00010 10000 000000000000000000 101011 00010 01111 00000000000000000 101011 00010 01111 000000000000000000 000000 11111 00000 0000000000000000000 **Devices** **Network** Input **Output** CENG3420 L01 Intro.44 Spring 2016 ### **Processor Fetches the Next Instruction** Processor fetches the *next* instruction from memory How does it know which location in memory to fetch from next? CENG3420 L01 Intro.45 Spring 2016 # **Output Data Stored in Memory** # At program completion the data to be output resides in memory CENG3420 L01 Intro.46 Spring 2016 # **Output Device Outputs Data** CENG3420 L01 Intro.47 Spring 2016 ## The Instruction Set Architecture (ISA) The interface description separating the software and hardware CENG3420 L01 Intro.48 Spring 2016 # **Instruction Set Architecture (ISA)** - □ ISA, or simply architecture the abstract interface between the hardware and the lowest level software that includes all the information necessary to write a machine language program, including instructions, registers, memory access, I/O, ... - Enables implementations of varying cost and performance to run identical software - The combination of the basic instruction set (the ISA) and the operating system interface is called the application binary interface (ABI) - ABI The user portion of the instruction set plus the operating system interfaces used by application programmers. Defines a standard for binary portability across computers. CENG3420 L01 Intro.49 Spring 2016 ## The MIPS ISA - Instruction Categories - Load/Store - Computational - Jump and Branch - Floating Point - coprocessor - Memory Management - Special ### Registers R0 - R31 PC HI LO □ 3 Instruction Formats: all 32 bits wide | OP | rs | rt | rd | sa | funct | |----|-------------|----|-----------|----|-------| | ОР | rs | rt | immediate | | | | OP | jump target | | | | | CENG3420 L01 Intro.50 Spring 2016 ## **How Do the Pieces Fit Together?** - Coordination of many levels of abstraction - Under a rapidly changing set of forces - Design, measurement, and evaluation CENG3420 L01 Intro.51 Spring 2016 ## **How Do the Pieces Fit Together?** - Coordination of many levels of abstraction - Under a rapidly changing set of forces - Design, measurement, and evaluation CENG3420 L01 Intro.52 Spring 2016