|Wednesday, 20 September 2006|
Microprocessors run application programs. An application program comprises a group of instructions. Microprocessors may implement the technique of overlapping a fetching stage, a decoding stage, an execution stage and possibly a write back stage. These steps are controlled by a periodic clock signal. The period of the clock signal is the processor cycle time. A microprocessor receives instructions composing a program from a storage device, and decodes the instructions with the decoder to control its constituent device such as a calculation device, an input device, an output device, a storage device, or a control device depending on the contents of the instructions, thereby to proceed processes in sequence. In general, a microprocessor's operation includes a process for initializing, or beginning, its own internal logic and/or intended software application, also known as a boot method. Within the microprocessor's internal initialization logic is a reset vector that is used to point to the location of the executing instructions for the operating system or other application software. Microprocessors are often required to manipulate binary data having wide ranges of bit lengths, for example data that ranges from a single logic bit to high precision arithmetic operations involving data that may be more than 128 bits in length. A significant portion of the operations performed by microprocessors is to read data from or write data to memory. Reading data from memory is commonly referred to as a load, and writing data to memory is commonly referred to as a store. A microprocessor generates load and store operations in response to an instruction that accesses memory. Load and store operations can also be generated by the microprocessor for other reasons necessary to the operation of the microprocessor. A microprocessor, such as an image processing processor, occasionally executes a program by replacing an order of read and write. Such a microprocessor executes instructions by replacing the order of a read instruction and a write instruction that have been issued from a central processing unit to a main memory or a peripheral system in order to improve a processing performance.
A typical microprocessor includes a memory storing a program and various data, a processor core executing the program stored in memory, an external bus interface serving as an interface portion with respect to external bus, a processor bus for interconnecting processor core, memory and external bus interface, and an external event request signal input terminal connected to test circuit. A computer system comprising a microprocessor architecture capable of supporting multiple processors typically comprises a memory, a memory system bus comprising data, address and control signal buses, an input/output I/O bus comprising data, address and control signal buses, a plurality of I/O devices and a plurality of microprocessors. The I/O devices may comprise a direct memory access (DMA) controller-processor, an ethernet chip, and various other I/O devices. Typical microprocessors have registers, arithmetic logic units, memory, input/output circuits and other similar components which are hard wired together. The microprocessor is coupled to the other components of the system by a processor bus and the microprocessor communicates with the other devices over the processor bus, such as by transferring data. The processor bus is a collection of signals that enable the microprocessor to transfer data in relatively large chunks. When the microprocessor executes program instructions that perform computations on the data stored in the memory, the microprocessor must fetch the data from memory into the microprocessor using the processor bus. The processor bus operates at one clock frequency, and the circuitry inside the microprocessor operates internally at a much higher clock frequency. The internal microprocessor clock frequency is commonly referred to as the core clock frequency. A typical microprocessor includes a number of different functional units, each of which requires current state to perform its functions during the processing of microinstructions. Examples of such functional units in the Intel Architecture are the microinstruction checker retirement unit (CRU), instruction translation lookaside buffer (ITLB), trace cache (TC), and segment and address translation unit (SAAT).
In a present day microprocessor, the speed at which data can be transferred between internal logic blocks is an order of magnitude faster than the speed that external memory accesses exhibit. As microprocessor clock speeds continue to increase at an exponential rate, processor performance is becoming increasingly constrained by the delays involved in transferring instructions and data between memory and computational circuitry within the processor core. Cache memories can be used to reduce memory access time in response to a speed difference between microprocessors and memories. This allows the microprocessor to access the instructions and data items from the local cache memories, without the significant delay involved in accessing an off-chip main memory. Consequently, a hierarchy of cache structures has evolved over more recent years to allow high-performance microprocessors to run at speed without having to execute transactions over a slow memory bus every time data is to be read or written. A cache memory is a relatively small memory inside the processor that stores a subset of the data in the system memory in order to reduce data access time, since accesses to the cache memory are much faster than to the system memory. A typical cache line size is 32 bytes, and cache lines are arranged on cache line size memory address boundaries. A cache memory is situated between the processing core of a processing device and the main memory. The main memory is generally much larger than the cache, but also significantly slower. Each time the processing core requests information from the main memory, the cache controller checks the cache memory to determine whether the address being accessed is currently in the cache memory. The cache memory operates based on two theories, one being temporal locality and the other one being spatial locality. Temporal locality means that a previously accessed memory address is accessed again in a short time. Spatial locality means that an address adjacent to an accessed memory address is accessed.
Microprocessors determine the speed and power of personal computers, and a growing number of other computational devices by handling most of the data processing in the device. There are many factors influencing a microprocessor system's operational speed. These various factors may include for example, clock frequency, circuit delay, and thermal throttling. The clock frequency is the frequency of periodic pulses that are used to schedule the operation of the processor device, which may also be referred to as the operating speed of the processor device. The clock frequency for a processor device is typically set during the testing stage of the processor device. The clock frequency will likely be preset based on the operating temperature that the processor device is likely to experience during its operation. Processor devices typically can operate at higher speeds when the operating temperature is relatively low. For example, if the operating temperature of a microprocessor is expected to be relatively low, then the clock frequency may be set at a relatively high rate than if the operating temperature is projected to be relatively high. Exception handling is another factor influencing a microprocessor system's operational speed. Exceptions arise in a microprocessor system whenever there is a need for the normal flow of program execution to be broken. This flow may be broken so that the microprocessor can be diverted to handle an interrupt from an external I/O device. An interrupt causes a microprocessor to suspend execution of the current task being performed by the CPU in order to execute a specific software routine, known as an interrupt service routine (ISR). This routine comprises a set of software instructions typically unrelated to the instructions being executed by the microprocessor at the time the interrupt is signaled. The time taken by a microprocessor to complete a program is determined by at least three factors: the number of instructions required to execute the program, the average number of processor cycles required to execute an instruction, and the processor cycle time. Microprocessor performance is improved by reducing the time taken by the microprocessor to complete the program, which dictates reducing one or more of these three factors. In response to the need for improved performance, several techniques have been used to extend the capabilities of these early microprocessors including pipelining, superpipelining, superscaling, and speculative instruction execution.
Pipeline processing is a technique that provides simultaneous, or parallel, processing within a computer. Pipelining is a key design technique employed to improve processing speed and employs multiple instructions overlapped in execution. Within a pipelined microprocessor, the functional units necessary for executing different stages of an instruction are operated simultaneously on multiple instructions to achieve a degree of parallelism leading to performance increases over non-pipelined microprocessors. The pipeline moves the data associated with the process through the processor as the processor executes the process. A pipeline consists of sequential stages, wherein each stage completes a part of an execution of an instruction passing through the pipeline. The pipeline is divided into segments and each segment can execute its operation concurrently with the other segments. When a segment completes an operation, it passes the result to the next segment in the pipeline and fetches the next operation from the preceding segment. The final results of each instruction emerge at the end of the pipeline in rapid succession. Pipelining reduces the average number of cycles to execute an instruction by overlapping instructions and thus permitting the processor to handle more than one instruction at a time. During execution of the pipelined process, multiple registers typically store context information associated with the execution. Pipelined designs increase the rate at which instructions can be executed by allowing a new instruction to begin execution before a previous instruction is finished executing. Pipelined processors have been developed to speed up the processing speed of computer processors and to improve or increase the instruction flow through a microprocessor. Pipelined architectures have been extended to "superpipelined" or "extended pipeline" architectures where each execution pipeline is broken down into even smaller stages. Superpipelining increases the number of instructions that can be executed in the pipeline at any given time.
Speeds within microprocessors are increasing with the use of scalar computation with superscalar technology being the next logical step in the evolution of microprocessor. Superscalar microprocessors are a class of microprocessor architectures that include multiple pipelines that process instructions in parallel. The term superscalar describes an implementation that improves performance by a concurrent execution of scalar instructions. Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. Scalar instructions are the type of instruction typically found in general purpose microprocessors. Superscalar microprocessors typically execute more than one instruction per clock cycle, on average. Superscalar microprocessors allow parallel instruction execution in two or more instruction execution pipelines. The number of instructions that may be processed is increased due to parallel execution. In a superscalar processor, instructions of the same or different types are dispatched and executed in parallel in multiple execution units. Each instruction dispatch port is typically connected to one execution pipe and certain type of instructions is always issued in a specific port since they can only be executed in a specific execution unit. The execution units work independently, in parallel and any dependency among instructions are detected before a group of instructions is formed and dispatched. In a superscalar processor, a decoder feeds an instruction queue, from which the maximum allowable number of instructions are issued per cycle to the available execution units. This is called the grouping of the instructions. A typical pipelined scalar microprocessor executes one instruction per processor cycle. A superscalar microprocessor further reduces the average number of cycles per instruction beyond what is possible in a pipelined scalar processor, by concurrent execution of several instructions in different pipelines.
The management of power consumption within microprocessors is becoming increasingly crucial. Microelectronic power regulation systems generally include a power regulator configured to supply a desired, regulated power to a microelectronic device such as microprocessors, microcontrollers, and memory devices. The system may also include capacitors located near and/or packaged with the microprocessor to supply additional charge during the operation of the microprocessor. As operating frequencies and circuit densities have increased, heat generation within microprocessors has also increased. Microprocessors generate excessive amounts of heat. The heat generated by a microprocessor is especially problematic in multiple processor systems, including many server systems, in which multiple processors are located on a single motherboard. Microprocessors utilized in mobile applications are particularly sensitive to power dissipation and generally require the lowest power dissipation. This heat must be continuously removed, or the microprocessor may overheat, resulting in damage to the device and/or a reduction in operating performance. Because most microprocessors do not have a physical structure to remove the heat generated by the microprocessor, many computer systems include a heat sink that is placed near the microprocessor to dissipate the heat generated by the microprocessor. Heat sinks operate by conducting heat from the processor to the heat sink and then radiating it into the air. To be most effective in dissipating heat generated by the microprocessor, a heat sink must be placed in close proximity to the surface of the microprocessor package. Microprocessors typically use cooling fans mounted to the processor package to ensure that the processor continues to operate within acceptable temperature limits.