background image

EMBEDDED Intel486™ PROCESSOR HARDWARE REFERENCE MANUAL

3-8

The bus interface unit contains the following architectural features:

Address Transceivers and Drivers — The A31–A2 address signals are driven on the 
processor bus, together with their corresponding byte-enable signals, BE3#–BE0#. The 
high-order 28 address signals are bidirectional, allowing external logic to drive cache 
invalidation addresses into the processor.

Data Bus Transceivers — The D31–D0 data signals are driven onto and received from the 
processor bus (for the Ultra-Low Power Intel486 GX processor, signals D15–D0 comprise 
the data bus transceivers).

Bus Size Control — Three sizes of external data bus can be used: 32, 16, and 8 bits wide. 
Two inputs from external logic specify the width to be used. Bus size can be changed on a 
cycle-by-cycle basis. The Ultra-Low Power Intel486 GX does not support dynamic bus 
sizing; its external data bus is 16 bits wide.

Write Buffering — Up to four write requests can be buffered, allowing many internal 
operations to continue without waiting for write cycles to be completed on the processor 
bus.

Bus Cycles and Bus Control — A large selection of bus cycles and control functions are 
supported, including burst transfers, non-burst transfers (single- and multiple-cycle), bus 
arbitration (bus request, bus hold, bus hold acknowledge, bus locking, bus pseudo-locking, 
and bus backoff), floating-point error signalling, interrupts, and reset. Two software-
controlled outputs enable page caching on a cycle-by-cycle basis. One input and one output 
are provided for controlling burst read transfers.

Parity Generation and Control — Even parity is generated on writes to the processor and 
checked on reads. An error signal indicates a read parity error.

Cache Control — Cache control and consistency operations are supported. Three inputs 
allow the external system to control the consistency of data stored in the internal cache unit. 
Two special bus cycles allow the processor to control the consistency of external cache.

3.2.1

Data Transfers 

To support the cache, the bus interface unit reads 16-byte cacheable transfers of operands, in-
structions, and other data on the processor bus and passes them to the cache unit. When cache
contents are updated from an internal source, such as a register, the bus interface unit writes the
updated cache information to the external system. Non-cacheable read transfers are passed
through the cache to the integer or floating-point units. 

During instruction prefetch, the bus interface unit reads instructions on the processor bus and
passes them to both the instruction prefetch unit and the cache. The instruction prefetch unit may
then obtain its inputs directly from the cache.

3.2.2

Write Buffers

The bus interface unit has temporary storage for buffering up to four 32-bit write transfers to
memory. Addresses, data, or control information can be buffered. Single I/O-mapped writes are
not buffered, although multiple I/O writes may be buffered. The buffers can accept memory

Summary of Contents for Embedded Intel486

Page 1: ...ual The embedded Intel486 processors may contain design defects known as errata which may cause the products to deviate from published specifications Currently characterized errata are available on request Release Date July 1997 Order Number 273025 001 ...

Page 2: ...or other intellectual property right Intel products are not intended for use in medical life saving or life sustaining applications Intel retains the right to make changes to specifications and product descriptions at any time without notice Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order Copies of documents which ...

Page 3: ...INTRODUCTION 2 1 PROCESSOR FEATURES 2 2 2 2 Intel486 PROCESSOR PRODUCT FAMILY 2 4 2 2 1 Operating Modes and Compatibility 2 5 2 2 2 Memory Management 2 5 2 2 3 On chip Cache 2 6 2 2 4 Floating Point Unit 2 6 2 2 5 Upgrade Power Down Mode 2 7 2 3 SYSTEM COMPONENTS 2 7 2 4 SYSTEM ARCHITECTURE 2 7 2 4 1 Single Processor System 2 8 2 4 2 Loosely Coupled Multi Processor System 2 9 2 4 3 External Cache ...

Page 4: ... 15 3 10 PAGING UNIT 3 16 CHAPTER 4 BUS OPERATION 4 1 DATA TRANSFER MECHANISM 4 1 4 1 1 Memory and I O Spaces 4 1 4 1 1 1 Memory and I O Space Organization 4 2 4 1 2 Dynamic Data Bus Sizing 4 3 4 1 3 Interfacing with 8 16 and 32 Bit Memories 4 5 4 1 4 Dynamic Bus Sizing During Cache Line Fills 4 9 4 1 5 Operand Alignment 4 10 4 2 BUS ARBITRATION LOGIC 4 12 4 3 BUS FUNCTIONAL DESCRIPTION 4 15 4 3 1...

Page 5: ...ng Point Error Handling for the IntelDX2 and IntelDX4 Processors 4 46 4 3 14 1 Floating Point Exceptions 4 46 4 3 15 IntelDX2 and IntelDX4 Processors Floating Point Error Handling in AT Compatible Systems 4 47 4 4 ENHANCED BUS MODE OPERATION WRITE BACK MODE FOR THE WRITE BACK ENHANCED IntelDX4 PROCESSOR4 50 4 4 1 Summary of Bus Differences 4 50 4 4 2 Burst Cycles 4 50 4 4 2 1 Non Cacheable Burst O...

Page 6: ...ernal Cache 6 2 6 3 CACHE TRADE OFFS 6 2 6 3 1 Cache Size and Performance 6 3 6 3 2 Associativity and Performance Issues 6 5 6 3 3 Block Line Size 6 10 6 3 4 Replacement Policy 6 11 6 4 UPDATING MAIN MEMORY 6 11 6 4 1 Write Through and Buffered Write Through Systems 6 12 6 4 2 Write Back System 6 13 6 4 3 Cache Consistency 6 13 6 5 NON CACHEABLE MEMORY LOCATIONS 6 15 6 6 CACHE AND DMA OPERATIONS 6...

Page 7: ...Recovery Time 7 27 7 2 8 Non Cacheability of Memory Mapped I O Devices 7 27 7 2 9 Intel486 Processor On Chip Cache Consistency 7 28 7 3 I O CYCLES 7 29 7 3 1 Read Cycle Timing 7 29 7 3 2 Write Cycle Timings 7 31 7 4 DIFFERENCE BETWEEN THE Intel486 DX PROCESSOR FAMILY AND Intel386 PROCESSORS 7 33 7 5 INTERFACING TO x86 PERIPHERALS 7 34 7 5 1 Universal Peripheral Interface 7 34 7 5 2 82C59A Interfac...

Page 8: ... the EBC 8 11 8 3 4 1 EBC and EISA Bus Interface Signals 8 11 8 3 4 2 EBC and ISA Bus Interface Signals 8 12 8 3 5 EBC and ISP Interface 8 13 8 3 6 EBC and EBB Data and Address Buffer Controls 8 14 8 3 6 1 Functions of the ISP 8 16 8 3 6 2 ISP to Host Interface 8 17 8 3 7 ISP to EISA Interface 8 17 8 4 PCI BUS SYSTEM DESIGN EXAMPLE 8 19 8 4 1 Introduction to PCI Architecture 8 19 8 4 2 Example PCI...

Page 9: ... 4 ON CHIP WRITE BUFFERS 9 7 9 5 EXTERNAL MEMORY CONSIDERATIONS 9 8 9 5 1 Introduction 9 8 9 5 2 Wait States in Burst and Non Burst Modes 9 9 9 5 3 Impact of Wait States on Performance 9 10 9 5 4 Bus Utilization and Wait States 9 10 9 6 SECOND LEVEL CACHE PERFORMANCE CONSIDERATIONS 9 11 9 6 1 Advantages of a Second Level Cache 9 11 9 6 2 An Example of a Second Level Cache 9 12 9 6 3 System Perform...

Page 10: ...10 24 10 3 2 4 Vias Feed Through Connections 10 25 10 3 3 Interference 10 25 10 3 3 1 Electromagnetic Interference EMI 10 25 10 3 3 2 Minimizing Electromagnetic Interference 10 26 10 3 3 3 Electrostatic Interference 10 28 10 3 4 Propagation Delay 10 29 10 4 LATCH UP 10 30 10 5 CLOCK CONSIDERATIONS 10 30 10 5 1 Requirements 10 31 10 5 2 Routing 10 31 10 6 THERMAL CHARACTERISTICS 10 33 10 7 DERATING...

Page 11: ...mories 4 9 4 7 Single Master Intel486 Processor System 4 12 4 8 Single Intel486 Processor with DMA 4 13 4 9 Single Intel486 Processor with Multiple Secondary Masters 4 14 4 10 Basic 2 2 Bus Cycle 4 16 4 11 Basic 3 3 Bus Cycle 4 17 4 12 Non Cacheable Non Burst Multiple Cycle Transfers 4 20 4 13 Non Cacheable Burst Cycle 4 21 4 14 Non Burst Cacheable Cycles 4 23 4 15 Burst Cacheable Cycle 4 24 4 16 ...

Page 12: ...er BOFF Overlaying a Pseudo Locked Cycle 4 73 5 1 Typical Burst Cycle 5 3 5 2 Burst Cycle KEN Normally Active 5 4 5 3 Intel386 Processor Bus Cycle Mix Intel486 Processor Bus Cycle Mix 5 5 6 1 A Fully Associative Cache Organization 6 5 6 2 Direct Mapped Cache Organization 6 7 6 3 Two Way Set Associative Cache Organization 6 8 6 4 Sector Buffer Cache Organization 6 9 6 5 The Cache Data Organization ...

Page 13: ...am of Integrated System Peripheral ISP 8 8 8 4 EBB Byte Transfer 8 15 8 5 Example System Block Diagram 8 20 8 6 System Controller Block Diagram 8 22 8 7 ISA Bridge Block Diagram 8 23 8 8 Internal DMA Controller 8 34 9 1 Cache Hit Rate for Various Programs 9 6 9 2 Intel486 Processor Bus Cycle Mix with On Chip Cache 9 7 9 3 Effect of Wait States on Performance 9 10 9 4 Effect of External Bus Utiliza...

Page 14: ...Example 10 23 10 19 Use of Series Termination to Avoid Impedance Mismatch 10 24 10 20 Daisy Chaining 10 24 10 21 Avoiding 90 Degree Angles 10 25 10 22 Typical Layout 10 26 10 23 Removing Closed Loop Signal Paths 10 28 10 24 Typical Clock Timings 10 31 10 25 Clock Routing 10 32 10 26 Star Connection 10 32 10 27 Typical Heat Sinks 10 35 10 28 Heat Sink Dimensions 10 36 10 29 Derating Curves for the ...

Page 15: ...e Fill or Replacement Cycle 4 54 5 1 Access Length of Typical CPU Functions 5 2 5 2 Clock Latencies for DRAM Functions 5 6 6 1 Level 1 Cache Hit Rates 6 3 7 1 Next Byte Enable Values for the BSx Cycles 7 4 7 2 Valid Data Lines for Valid Byte Enable Combinations 7 5 7 3 PLD Input Signals 7 9 7 4 Equations 7 9 7 5 32 Bit to 8 Bit Steering 7 9 7 6 PLD Input Signals 7 12 7 7 PLD Output Signals 7 12 7 ...

Page 16: ......

Page 17: ... GUIDE TO THIS MANUAL Chapter Contents 1 1 Manual Contents 1 1 1 2 Text Conventions 1 3 1 3 Special Terminology 1 4 1 4 Electronic Support Systems 1 5 1 5 Technical Support 1 5 1 6 Product Literature 1 6 ...

Page 18: ......

Page 19: ...uct frequency voltage and package offerings Chapter 3 Internal Architecture This chapter describes the Intel486 processor internal architecture with a description of the processor s functional units Chapter 4 Bus Operation This chapter describes the features of the processor bus including bus cycle handling interrupt and reset signals cache control and floating point error control Chapter 5 Memory...

Page 20: ...s an overview of system bus design considerations including implementing of the EISA and PCI system buses Chapter 9 Performance Considerations This chapter focuses on the system parameters that affect performance External L2 caches are also examined as a means of improving memory system performance Chapter 10 Physical Design and System Debugging The higher clock speeds of Intel486 processor system...

Page 21: ...se Numbers Hexadecimal numbers are represented by a string of hexadecimal digits followed by the character H A zero prefix is added to numbers that begin with A through F For example FF is shown as 0FFH Decimal and binary numbers are represented by their customary notations That is 255 is a decimal number and 1111 1111 is a binary number In some cases the letter B is added for clarity Units of Mea...

Page 22: ...OLOGY The following terms have special meanings in this manual Assert and Deassert The terms assert and deassert refer to the acts of making a signal active and inactive respectively The active polarity high low is defined by the signal name Active low signals are designated by the pound symbol suffix active high signals have no suffix To assert RD is to drive it low to assert HOLD is to drive it ...

Page 23: ...448 Hong Kong 886 2 514 0815 Taiwan 822 767 2594 Korea 61 2 975 3922 Australia 1 503 264 6835 Worldwide Think of the FaxBack service as a library of technical documents that you can access with your phone Just dial the telephone number and respond to the system prompts After you select a doc ument the system sends a copy to your fax machine 1 4 2 World Wide Web Intel offers a variety of informatio...

Page 24: ...atasheet 272770 001 Embedded Ultra Low Power Intel486 SX Processor datasheet 272731 001 Embedded Ultra Low Power Intel486 GX Processor datasheet 272755 001 Embedded Write Back Enhanced IntelDX4 Processor datasheet 272771 001 MultiProcessor Specification 242016 005 Manuals Intel Architecture Software Developer s Manual Volumes 1 and 2 243190 001 243191 001 Embedded Intel486 Processor Family Develop...

Page 25: ...isted Document Name Web Site Standard 1149 1 1990 IEEE Standard Test Access Port and Boundary Scan Architecture and its supplement Standard 1149 1a 1993 Contact the IEEE at http www ieee org PCI Local Bus Specification Revisions 2 0 and 2 1 Contact the PCI Special Interest Group at http www pcisig com ...

Page 26: ......

Page 27: ...2 Introduction Chapter Contents 2 1 Processor Features 2 2 2 2 Intel486 Processor Product Family 2 4 2 3 System Components 2 7 2 4 System Architecture 2 7 2 5 Systems Applications 2 11 ...

Page 28: ......

Page 29: ... like the other Intel486 processors supports dynamic data bus sizing for 8 16 or 32 bit bus sizes whereas the Ultra Low Power Intel486 GX processor has a 16 bit external data bus The entire Intel486 processor family incorporates energy efficient SL Technology for mobile and fixed embedded computing SL Technology enables system designs that exceed the Environ mental Protection Agency s EPA Energy S...

Page 30: ...for both data and instructions Cache hits provide zero wait state access times for data within the cache Bus activity is tracked to detect alterations in the memory represented by the internal cache The internal cache can be invalidated or flushed so that an external cache controller can maintain cache consistency External Cache Control Write back and flush controls for an external cache are provi...

Page 31: ...HALT instruction the Intel486 processor issues a normal Halt bus cycle and the clock input to the Intel486 processor core is automatically stopped causing the processor to enter the Auto HALT Power Down state Upgrade Power Down Mode When an Intel486 processor upgrade is installed the Upgrade Power Down Mode detects the presence of the upgrade powers down the core and three states all outputs of th...

Page 32: ... product may have 1x 2x or 3x clock Please con tact Intel for the latest product availability and specifications Table 2 1 Product Options Intel486 Processor VCC Processor Frequency MHz 168 Pin PGA 208 Lead SQFP 196 Lead PQFP 176 Lead TQFP VCCP 16 20 25 33 40 50 66 75 100 1x Clock Intel486 SX Processor 3 3 V 5 V Ultra Low Power Intel486 SX Processor 2 4 3 3 2 7 3 3 Ultra Low Power Intel486 GX Proc...

Page 33: ...anism The addressing mechanism is more sophisticated in Protected Mode than in Real Mode Virtual 8086 Mode a sub mode of Protected Mode allows 8086 programs to be run with the segmentation and paging protection mechanisms of Protected Mode This mode offers more flexibility than the Real Mode for running 8086 programs Using this mode the Intel486 processor can execute 8086 operating systems and app...

Page 34: ...processor from reading invalid data from its own internal cache or from external caches For ex ample when the Intel486 processor attempts to read an operand from memory that is also held in the cache of another bus master the other bus master is forced to write its cached data back to memory before the Intel486 processor can complete its read from memory This is done because the cached version of ...

Page 35: ...387 DX numerics coprocessor with some extensions The processor eliminates the need for an external memory management unit and the on chip cache minimizes the need for external cache and associated control logic The remaining chapters of this manual detail the Intel486 processor s architecture hardware functions and interfacing For more information on the architecture and software interface see the...

Page 36: ...e system Typical applica tions include personal computers small desktop workstations and embedded controllers Such applications are implemented as a single board usually called a motherboard the processor bus does not extend beyond the board occupied by the Intel486 processor Figure 2 2 shows an example of such a system In a single processor system devices that share the processor bus must be sele...

Page 37: ...Coupled Multi Processor System Loosely coupled multi processor systems include board level products that communicate with one another through a standard system bus In this architecture each board contains a processor and associated logic There is typically only one processor per board Components within each board communicate on either a processor bus or on the buffered system bus The system bus us...

Page 38: ... fast SRAM and cache control logic External cache systems typically provide access to the cache from both the processor and the sys tem buses This is shown in Figure 2 4 These caches typically monitor processor memory ac cesses processor access time and consistency between cache and memory The cache controller is responsible for maintaining an optimal mix of data and instructions in cache Intel486...

Page 39: ...n goals and constraints as described in the following sec tions Software running on the processor even in stand alone embedded applications should use a standard operating system such as DOS Windows 95 Windows NT OS 2 or UNIX Sys tem V 386 to facilitate debugging documentation and transportability External Cache Controller A5131 01 Processor Bus System Bus i486 Processor DRAM Controller SRAM DRAM ...

Page 40: ...plication Figure 2 5 Embedded Personal Computer and Embedded Controller Example External cache is optional in such environments particularly if system performance is not a crit ical parameter Where an external cache is used memory access speeds improve only if the cache is designed as a write back system and memory access has zero to one wait states 2 5 2 Embedded Controllers Most embedded control...

Page 41: ...urpose Frequently used routines and variables such as interrupt handlers and interrupt stacks can be locked in the processor s internal cache so they are always available quickly Embedded controllers usually require less memory than other applications and control programs are usually tightly written machine level routines that need optimal performance in a limited va riety of tasks The processor t...

Page 42: ......

Page 43: ...ipelining 3 6 3 2 Bus Interface Unit 3 7 3 3 Cache Unit 3 10 3 4 Instruction Prefetch unit 3 13 3 5 Instruction Decode Unit 3 14 3 6 Control Unit 3 14 3 7 Integer Datapath Unit 3 14 3 8 Floating Point Unit 3 15 3 9 Segmentation Unit 3 15 3 10 Paging Unit 3 16 ...

Page 44: ......

Page 45: ...it external data bus The Ultra Low Power Intel486 GX also has advanced power management features Table 3 1 lists the functional units of the embedded Intel486 processors Figure 3 1 is a block diagram of the embedded IntelDX2 and IntelDX4 processors Note that the cache unit is 8 Kbytes for the IntelDX2 processor and 16 Kbytes for the IntelDX4 processor Figure 3 2 is a block diagram of the embedded ...

Page 46: ...16 Kbyte Cache DX4 Clock Multiplier Floating Point Register File Control Protection Test Unit Control ROM Address Drivers CLK Core Clock 32 32 Data Bus Transceivers 32 Request Sequencer Bus Size Control Cache Control Parity Generation and Control Boundary Scan Control Bus Interface D31 D0 A31 A2 BE3 BE0 ADS W R D C M IO PCD PWT RDY LOCK PLOCK BOFF A20M BREQ HOLD HLDA RESET SRESET INTR NMI SMI SMIA...

Page 47: ...Cache Control Protection Test Unit Control ROM Address Drivers 32 32 Data Bus Transceivers 32 Request Sequencer Bus Size Control Cache Control Parity Generation and Control Boundary Scan Control Bus Interface D31 D0 A31 A2 BE3 BE0 ADS W R D C M IO PCD PWT RDY LOCK PLOCK BOFF A20M BREQ HOLD HLDA RESET SRESET INTR NMI SMI SMIACT FERR IGNNE STPCLK A5443 01 BRDY BLAST BS16 BS8 KEN FLUSH AHOLD EADS DP3...

Page 48: ... and follows a write through policy The Write Back Enhanced IntelDX4 processor can be set to use an on chip write back cache pol Paging Unit Prefetcher 32 Byte Code Queue 2x16 Bytes Code Stream Barrel Shifter 24 Cache Unit Burst Bus Control Bus Control Write Buffers 4 x 32 64 Bit Interunit Transfer Bus Register File ALU Segmentation Unit Descriptor Registers Limit and Attribute PLA 32 Base Index B...

Page 49: ...memory management unit MMU consists of a segmentation unit and a paging unit which perform address generation The segmentation unit translates logical addresses and passes them to the paging and cache units on a 32 bit linear address bus Segmentation allows management of the logical address space by providing easy relocation of data and code and efficient sharing of global resources The paging mec...

Page 50: ...tware The Intel486 processor also has features that facilitate high performance hardware designs The 1X bus clock input eases high frequency board level designs The clock multiplier on IntelDX2 and IntelDX4 processors improves execution performance without increasing board design com plexity The clock multiplier enhances all operations operating out of the cache that are not blocked by external bu...

Page 51: ...frame pointer and the additional clock is not used very often Compilers often place an unrelated instruction between one that changes an addressing register and one that uses the register Such code is compatible with the Intel386 pro cessor and the Intel486 processor provides special stack increment decrement hardware and an extra register port to execute back to back stack push pop instructions i...

Page 52: ...nalling interrupts and reset Two software controlled outputs enable page caching on a cycle by cycle basis One input and one output are provided for controlling burst read transfers Parity Generation and Control Even parity is generated on writes to the processor and checked on reads An error signal indicates a read parity error Cache Control Cache control and consistency operations are supported ...

Page 53: ...tten from the write buffers To ensure that no more than one such re ordering is done for a given set of buffered writes all buffered writes are re flagged as cache misses when a read request is re ordered ahead of them Buffered writes thus marked are propagated to the processor bus before the next read request is acted upon Invalidation of data in the internal cache also causes all pending writes ...

Page 54: ...a cache miss the information is read into the cache in one or more 16 byte cacheable data transfers called cache line fills An internal write request to an area currently in the cache causes two distinct actions if the cache is using a write through policy the cache is updated and the write is also passed through the cache to memory If the cache is using a write back policy then the internal write...

Page 55: ...they indicate whether a 16 byte cache line is stored for that physical address The low order 4 bits of the physical ad dress select the byte within the cache line Finally a 4 bit valid field one for each way within a given set indicates whether the cached data at that physical address is currently valid Way 3 Way 2 Way 1 Data Block Way 0 Way 3 Way 2 Way 1 Tag Block A5141 02 Valid LRU Block Way 0 S...

Page 56: ...ers the memory area mapped to the internal cache When the IntelDX4 processor is enabled for normal caching and write back operation an internal write only causes the cache to be updated The modified data is stored for the future update of main memory and is not immediately written to memory 3 3 3 Cache Replacement Replacement in the cache is handled by a pseudo LRU least recently used mechanism Th...

Page 57: ...fer to Chapter 6 Cache Subsystem 3 4 INSTRUCTION PREFETCH UNIT When the bus interface unit is not performing bus cycles to execute an instruction the instruction prefetch unit uses the bus interface unit to prefetch instructions By reading instructions before they are needed the processor rarely needs to wait for an instruction prefetch cycle on the pro cessor bus Instruction prefetch cycles read ...

Page 58: ...mory access This allows execution of a two instruction se quence that loads and operates on data in just two clocks as described in Section 3 2 The decode unit simultaneously processes instruction prefix bytes opcodes modR M bytes and displacements The outputs include hardwired microinstructions to the segmentation integer and floating point units The instruction decode unit is flushed whenever th...

Page 59: ...ranscendental functions e g tangent sine cosine and log functions The floating point unit fully conforms to the ANSI IEEE standard 754 1985 for floating point arithmetic All software written for the Intel386 processor Intel387 math coprocessor and previous members of the 86 87 architectural family runs on these processors without modifications 3 9 SEGMENTATION UNIT A segment is a protected indepen...

Page 60: ...paging is not enabled the physical address is identical to the linear address The paging unit includes a translation lookaside buffer TLB that stores the 32 most recently used page table entries Figure 3 7 shows the TLB data structures The paging unit looks up linear addresses in the TLB If the paging unit does not find a linear address in the TLB the unit gen erates requests to fill the TLB with ...

Page 61: ...cache can be used to supply data for the TLB although this may not be desirable when external logic monitors TLB updates Unlike segmentation paging is invisible to application programs and does not provide the same kind of protection against programs altering data outside a restricted part of memory Paging is visible to the operating system which uses it to satisfy application program memory requi...

Page 62: ......

Page 63: ... Chapter Contents 4 1 Data Transfer Mechanism 4 1 4 2 Bus Arbitration Logic 4 12 4 3 Bus Functional Description 4 15 4 4 Enhanced Bus Mode Operation Write Back Mode for the Write Back Enhanced IntelDX4 Processor 4 50 ...

Page 64: ......

Page 65: ...1 A2 The byte enables BE3 BE0 form the low order ad dress and provide linear selects for the four bytes of the 32 bit address bus The byte enable outputs are asserted when their associated data bus bytes are involved with the present bus cycle as listed in Table 4 1 Byte enable patterns that have a deasserted byte enable separating two or three asserted byte enables never occur see Table 4 5 on pa...

Page 66: ... be 32 16 or 8 bits wide The byte enable signals BE3 BE0 allow byte granularity when addressing any memory or I O structure whether 8 16 or 32 bits wide Table 4 2 Generating A31 A0 from BE3 BE0 and A31 A2 Intel486 Processor Address Signals A31 through A2 BE3 BE2 BE1 BE0 Physical Address A31 A2 A1 A0 A31 A2 0 0 X X X 0 A31 A2 0 1 X X 0 1 A31 A2 1 0 X 0 1 1 A31 A2 1 1 0 1 1 1 Physical Memory 4 Gbyte...

Page 67: ...nd I O Space Organization 16 bit memories are organized as arrays of two bytes each Each two bytes begins at addresses divisible by two The byte enables BE3 BE0 must be decoded to A1 BLE and BHE to ad dress 16 bit memories To address 8 bit memories the two low order address bits A0 and A1 must be decoded from BE3 BE0 The same logic can be used for 8 and 16 bit memories because the decoding logic f...

Page 68: ...ificantly different than that of the Intel386 processor Unlike the Intel386 processor the Intel486 processor requires that data bytes be driven on the addressed data pins The simplest example of this function is a 32 bit aligned BS16 read When the Intel486 processor reads the two high order bytes they must be driven on the data bus pins D31 D16 The Intel486 processor expects the two low order byte...

Page 69: ...asserted for all bus cycles involving the 32 bit array For 16 and 8 bit memories byte swapping logic is required for routing data to the appropriate data lines and logic is required for generating BHE BLE and A1 In systems where mixed memory widths are used extra address decoding logic is necessary to assert BS16 or BS8 Figure 4 3 Intel486 Processor with 32 Bit Memory Table 4 4 Data Pins Read with...

Page 70: ...e decoded to produce A0 and A1 The same byte select logic can be used in 16 and 8 bit systems because BLE is exactly the same as A0 see Table 4 5 Figure 4 4 Addressing 16 and 8 Bit Memories BE3 BE0 can be decoded as shown in Table 4 5 The byte select logic necessary to generate BHE and BLE is shown in Figure 4 5 Intel486 Processor BS16 BS8 Address Bus A31 A2 BE3 BE0 A31 A2 BE3 BE0 BHE BLE A1 A0 BL...

Page 71: ... 0 0 0 1 1 1 1 0 1 0 1 1 0 x x x x not contiguous bytes 0 1 0 1 x x x x not contiguous bytes 0 1 0 0 x x x x not contiguous bytes 0 1 1 1 0 0 0 0 1 0 x x x x not contiguous bytes 0 0 0 1 0 0 1 0 0 0 0 0 0 0 NOTES 1 BLE asserted when D7 D0 of 16 bit bus is asserted 2 BHE asserted when D15 D8 of 16 bit bus is asserted 3 A1 low for all even words A1 high for all odd words KEY x don t care a non occur...

Page 72: ...deasserted byte enables These combinations are don t care conditions in the decoder A decoder can use the non occurring BE3 BE0 combinations to its best advantage Figure 4 6 shows an Intel486 processor data bus interface to 16 and 8 bit wide memories Ex ternal byte swapping logic is needed on the data lines so that data is supplied to and received from the Intel486 processor on the correct data pi...

Page 73: ...e of a cache line fill The Intel486 processor generates proper byte enables for subsequent cycles in the line fill Table 4 6 shows the appropriate A0 BLE A1 and BHE for the various combinations of the Intel486 processor byte enables on both the first and subsequent cycles of the cache line fill The marks all combinations of byte enables that are generated by the Intel486 processor dur ing a cache ...

Page 74: ...lignment and data bus sizing When multiple cycles are required to transfer a multibyte logical operand the highest order bytes are transferred first For example when the processor executes a 4 byte unaligned read beginning at byte location 11 in the 4 byte aligned space the three high order bytes are read in the first bus cycle The low byte is read in a subsequent bus cycle Table 4 6 Generating A0...

Page 75: ...ed by the upper byte In the final cycle the lower byte of the 4 byte operand is trans ferred as shown in the 32 bit example above Table 4 7 Transfer Bus Cycles for Bytes Words and Dwords Byte Length of Logical Operand 1 2 4 Physical Byte Address in Memory Low Order Bits xx 00 01 10 11 00 01 10 11 Transfer Cycles over 32 Bit Bus b w w w hb lb d hb l3 hw lw h3 lb Transfer Cycles over 16 Bit Bus BS 1...

Page 76: ...implementations range from single master designs to those with multiple masters and DMA devices Figure 4 7 shows a simple system in which only one master controls the bus and accesses the memory and I O devices Here no arbitration is required Figure 4 7 Single Master Intel486 Processor System Intel486 Processor I O MEM Control Bus Data Bus Address Bus ...

Page 77: ...DMA wants control of the bus it asserts the HOLD request to the processor The processor then responds with a HLDA output when it is ready to relinquish bus control to the DMA device Once the DMA device completes its bus activity cycles it negates the HOLD signal to relinquish the bus and return control to the processor Figure 4 8 Single Intel486 Processor with DMA Intel486 Processor DMA MEM I O Ad...

Page 78: ...ys it to the requesting devices Figure 4 9 Single Intel486 Processor with Multiple Secondary Masters As systems become more complex and include multiple bus masters hardware must be added to arbitrate and assign the management of bus time to each master The second master may be a DMA controller that requires bus time to perform memory transfers or it may be a second pro cessor that requires the bu...

Page 79: ...processor can acquire the bus Otherwise if HOLD is asserted then the Intel486 processor has to wait for HOLD to be deassert ed before acquiring the bus If the Intel486 processor does not have the bus then its address data and status pins are 3 stated However the processor can execute instructions out of the internal cache or instruction queue and does not need control of the bus to remain active T...

Page 80: ...available on the cycle def inition lines and address bus Figure 4 10 Basic 2 2 Bus Cycle The non burst ready input RDY is asserted by the external system in the second clock RDY indicates that the external system has presented valid data on the data pins in response to a read or the external system has accepted data in response to a write The Intel486 processor samples RDY at the end of the second...

Page 81: ...rt a wait state Figure 4 11 illustrates a sim ple non burst non cacheable signal with one wait state added Any number of wait states can be added to an Intel486 processor bus cycle by maintaining RDY deasserted Figure 4 11 Basic 3 3 Bus Cycle The burst ready input BRDY must be deasserted on all clock edges where RDY is deasserted for proper operation of these simple non burst cycles 4 3 2 Multiple...

Page 82: ... The Intel486 processor indicates that it is willing to perform a burst cycle by holding the burst last signal BLAST deasserted in the second clock of the cycle The external system indicates its willingness to do a burst cycle by asserting the burst ready signal BRDY The addresses of the data items in a burst cycle all fall within the same 16 byte aligned area cor responding to an internal Intel48...

Page 83: ...rted The Intel486 processor determines how many cy cles a transfer will take based on its internal information and inputs from the external system BLAST is not valid in the first clock of a bus cycle because the Intel486 processor cannot de termine the number of cycles a transfer will take until the external system asserts KEN BS8 and BS16 BLAST should only be sampled in the second T2 state and su...

Page 84: ...nto a burst cycle by asserting BRDY rath er than RDY in the first cycle of the transfer This is illustrated in Figure 4 13 There are several features to note in the burst read ADS is asserted only during the first cycle of the transfer RDY must be deasserted when BRDY is asserted BLAST behaves exactly as it does in the non burst read BLAST is deasserted in the second clock of the first cycle of th...

Page 85: ...only memory reads or prefetches into a cache fill KEN is ignored during write or I O cycles Memory writes are stored only in the on chip cache if there is a cache hit I O space is never cached in the internal cache To transform a read or a prefetch into a cache line fill the following conditions must be met 1 The KEN pin must be asserted one clock prior to RDY or BRDY being asserted for the first ...

Page 86: ...e Enables during a Cache Line Fill For the first cycle in the line fill the state of the byte enables should be ignored In a non cache able memory read the byte enables indicate the bytes actually required by the memory or code fetch The Intel486 processor expects to receive valid data on its entire bus 32 bits in the first cycle of a cache line fill Data should be returned with the assumption tha...

Page 87: ...cle would be a single bus cycle if KEN was not sampled asserted at the end of the first clock The subsequent three reads would not have happened since a cache fill was not requested The BLAST output is invalid in the first clock of a cycle BLAST may be asserted during the first clock due to earlier inputs Ignore BLAST until the second clock During the first cycle of the cache line fill the externa...

Page 88: ...sserts KEN at the end of the first clock in the cycle The external system informs the Intel486 processor that it will burst the line in by asserting BRDY at the end of the first cycle in the transfer Note that during a burst cycle ADS is only driven with the first address Figure 4 15 Burst Cacheable Cycle 242202 036 CLK ADS A31 A4 M IO D C W R A3 A2 BE3 BE0 RDY BLAST DATA PCHK Ti To Processor T1 T...

Page 89: ...in the clock before BRDY or RDY to determine if a bus cycle would be a cache line fill Similarly it uses the value of KEN in the last cycle before early RDY to load the line just retrieved from memory into the cache KEN is sampled every clock and it must satisfy setup and hold times KEN can also change multiple times before a burst cycle as long as it arrives at its final value one clock before BR...

Page 90: ...ntel486 processor strobes data into the chip only when either RDY or BRDY is asserted Deasserting BRDY and RDY adds a wait state to the transfer A burst cycle where two clocks are required for every burst item is shown in Figure 4 17 Figure 4 17 Slow Burst Cycle 242202 038 CLK ADS A31 A2 M IO D C W R KEN RDY BLAST DATA Ti T1 T2 T2 T2 T2 T2 T2 T2 T2 To Processor BRDY A3 A2 BE3 BE0 ...

Page 91: ... is shown in Figure 4 18 Figure 4 18 Burst Cycle Showing Order of Addresses The sequences shown in Table 4 8 accommodate systems with 64 bit buses as well as systems with 32 bit data buses The sequence applies to all bursts regardless of whether the purpose of the burst is to fill a cache line perform a 64 bit read or perform a pre fetch If either BS8 or BS16 is asserted the Intel486 processor com...

Page 92: ...t a burst cycle by asserting RDY instead of BRDY RDY can be asserted after any number of data cycles terminated with BRDY An example of an interrupted burst cycle is shown in Figure 4 19 The Intel486 processor imme diately asserts ADS to initiate a new bus cycle after RDY is asserted BLAST is deasserted one clock after ADS begins the second bus cycle indicating that the transfer is not complete Fi...

Page 93: ...w request by asserting ADS to address 100 If the external system terminates the second cycle with BRDY the Intel486 processor next requests expects address 10C The correct order is deter mined by the first cycle in the transfer which may not be the first cycle in the burst if the system mixes RDY with BRDY Figure 4 20 Interrupted Burst Cycle with Non Obvious Order of Addresses 4 3 5 8 and 16 Bit C...

Page 94: ...ransfer Figure 4 21 8 Bit Bus Size Cycle Extra cycles forced by BS16 and BS8 signals should be viewed as independent bus cycles BS16 and BS8 should be asserted for each additional cycle unless the addressed device can change the number of bytes it can return between cycles The Intel486 processor deasserts BLAST until the last cycle before the transfer is complete Refer to Section 4 1 2 Dynamic Dat...

Page 95: ...ycles Locked cycles are also generated when the LOCK instruction prefix is used with selected instructions Locked cycles are implemented in hardware with the LOCK pin When LOCK is asserted the Intel486 processor is performing a read modify write operation and the external bus should not be relinquished until the cycle is complete Multiple reads or writes can be locked A locked cycle is shown in Fi...

Page 96: ...t be aligned for correct operation of a pseudo locked cycle PLOCK need not be examined during burst reads A 64 bit aligned operand can be retrieved in one burst note that this is only valid in systems that do not interrupt bursts The system must examine PLOCK during 64 bit writes since the Intel486 processor cannot burst write more than 32 bits However burst can be used within each 32 bit write cy...

Page 97: ... several times during a cycle settling to its final value in the clock in which RDY is asserted 4 3 7 1 Floating Point Read and Write Cycles For IntelDX2 and Write Back Enhanced IntelDX4 processors 64 bit floating point read and write cycles are also examples of operand transfers that take more than one bus cycle Figure 4 24 Pseudo Lock Timing 4 3 8 Invalidate Cycles Invalidate cycles keep the Int...

Page 98: ...validation cycle The Intel486 processor recognizes AHOLD on one CLK edge and floats the address bus in response To allow the address bus to float and avoid contention EADS and the invalidation address should not be driven until the following CLK edge The Intel486 processor reads the address over its address lines If the Intel486 processor finds this address in its internal cache the cache entry is...

Page 99: ... need not track bus activity Alternatively systems can request one invalidate per clock provided that the bus is monitored 4 3 8 2 Running Invalidate Cycles Concurrently with Line Fills Precautions are necessary to avoid caching stale data in the Intel486 processor cache in a system with a second level cache An example of a system with a second level cache is shown in Figure 4 27 An external devic...

Page 100: ...rocessor is reading data from the same address in the second level cache The system must force an invalidation cycle to invalidate the data that the Intel486 pro cessor has requested during the line fill Figure 4 27 System with Second Level Cache Intel486 Processor Second Level Cache System Bus External Memory External Bus Master Address Data and Control Bus Address Data and Control Bus ...

Page 101: ... RDY or BRDY is asserted or any subsequent clock in the line fill the data is read into the Intel486 processor input buffers but it is not stored in the on chip cache This is illustrated by asserted EADS signal labeled 2 in Figure 4 28 The stale data is used to satisfy the request that initiated the cache fill cycle Figure 4 28 Cache Invalidation Cycle Concurrent with Line Fill 242202 093 NOTES 1 ...

Page 102: ...ample of a HOLD HLDA transaction is shown in Figure 4 29 Unlike the Intel386 processor the Intel486 processor can respond to HOLD by floating its bus and asserting HLDA while RESET is asserted Figure 4 29 HOLD HLDA Cycles Note that HOLD is recognized during un aligned writes less than or equal to 32 bits with BLAST being asserted for each write For a write greater than 32 bits or an un aligned wri...

Page 103: ...erted is unimportant as long as both are asserted prior to the first RDY BRDY asserted by the system Figure 4 30 shows the case where HOLD is asserted first HOLD could be asserted simultaneously or after BOFF and still be acknowledged The pins floated during bus hold are BE3 BE0 PCD PWT W R D C M O LOCK PLOCK ADS BLAST D31 D0 A31 A2 and DP3 DP0 Figure 4 30 HOLD Request Acknowledged during BOFF 242...

Page 104: ...ts of the data bus The Intel486 processor has 256 possible interrupt vectors The state of A2 distinguishes the first and second interrupt acknowledge cycles The byte address driven during the first interrupt acknowledge cycle is 4 A31 A3 low A2 high BE3 BE1 high and BE0 low The address driven during the second interrupt acknowledge cycle is 0 A31 A2 low BE3 BE1 high BE0 low Each of the interrupt a...

Page 105: ...ss of 2 BE0 and BE2 are the only signals that distinguish HALT indication from shutdown indi cation which drives an address of 0 During the HALT cycle undefined data is driven on D31 D0 The HALT indication cycle must be acknowledged by RDY asserted A halted Intel486 processor resumes execution when INTR if interrupts are enabled NMI or RESET is asserted 4 3 11 2 Shutdown Indication Cycle The Intel...

Page 106: ...ecial Bus Cycle Encoding Cycle Name M IO D C W R BE3 BE0 A4 A2 Write Back 0 0 1 0111 000 First Flush Ack Cycle 0 0 1 0111 001 Flush 0 0 1 1101 000 Second Flush Ack Cycle 0 0 1 1101 001 Shutdown 0 0 1 1110 000 HALT 0 0 1 1011 000 Stop Grant Ack Cycle 0 0 1 1011 100 These cycles are specific to the Write Back Enhanced IntelDX4 processor The FLUSH cycle is applicable to all Intel486 processors See ap...

Page 107: ...FF is asserted the Intel486 processor floats its address data and status pins in the next clock see Figures 4 33 and 4 34 Any bus cycle in progress when BOFF is asserted is aborted and any data returned to the processor is ignored The pins that are floated in response to BOFF are the same as those that are floated in response to HOLD HLDA is not generated in response to BOFF BOFF has higher priori...

Page 108: ...ing out the address and status and asserting ADS The bus cycle then continues as usual Asserting BOFF during a burst BS8 or BS16 cycle forces the Intel486 processor to ignore data returned for that cycle only Data from previous cycles is still valid For example if BOFF is asserted on the third BRDY of a burst the Intel486 processor assumes the data returned with the first and second BRDY is correc...

Page 109: ...5 Bus State Diagram 240950 069 Ti T1 T2 T1b Tb Request Pending HOLD Deasserted AHOLD Deasserted BOFF Deasserted BRDY BLAST Asserted HOLD Deasserted AHOLD Deasserted BOFF Deasserted AHOLD Deasserted BOFF Deasserted HOLD Deasserted RDY Asserted BRDY BLAST Asserted HOLD AHOLD No Request BOFF Deasserted Request Pending RDY Asserted BOFF Asserted BOFF Deasserted BOFF Asserted BOFF Deasserted BOFF Asser...

Page 110: ...of the two In some cases FERR is asserted when the next floating point instruction is encountered and in other cases it is asserted before the next floating point instruction is encountered depending upon the execution state of the instruction causing the exception 4 3 14 1 Floating Point Exceptions The following class of floating point exceptions drive FERR at the time the exception occurs i e be...

Page 111: ...riving the IGNNE pin low when clearing the interrupt request the interrupt handler can allow execution of a floating point instruction within the interrupt handler before the error condition is cleared by FNCLEX FNINIT FNSAVE or FNSTENV If execution of a non control floating point in struction within the floating point interrupt handler is not needed the IGNNE pin can be tied high 4 3 15 IntelDX2 ...

Page 112: ...GNNE signal is also activated by the decoder output 5 Usually the ISR then executes an FNINIT instruction or other control instruction before restarting the program FNINIT clears the FERR output Figure 4 36 illustrates a sample circuit that performs the function described above Note that this circuit has not been tested and is included as an example of required error handling logic Note that the I...

Page 113: ...Figure 4 36 DOS Compatible Numerics Error Circuit RESET VCC 5V VCC VCC I O Port 0F0H Address decoder Processor Bus FERR Intel486 Processor IGNNE INTR 8259A Programmable Interrupt Controller IRQ13 Q Q Q Q CLR CLR D D PR PR ...

Page 114: ...se signals function the same as the equivalent signals on the Pentium OverDrive processor pins 3 The SRESET signal has been modified so that it does not write back invalidate or disable the cache Special test modes are also not initiated through SRESET 4 The FLUSH signal behaves the same as the WBINVD instruction Upon assertion FLUSH writes back all modified lines invalidates the cache and issues ...

Page 115: ...fies a replacement or snoop write back cycle These cycles consist of four doubleword transfers either bursts or non burst The signals KEN and WB WT are not sampled during write back cycles because the processor does not attempt to redefine the cacheability of the line 4 4 2 2 Burst Cycle Signal Protocol The signals from ADS through BLAST which are shown in Figure 4 37 have the same func tion and t...

Page 116: ...e an output pin to indicate a snoop hit to an S state line or an E state line However the Write Back Enhanced IntelDX4 pro cessor invalidates the line if the system snoop hits an S state E state or M state line provided INV was driven high during snooping If INV is driven low during a snoop cycle a modified line is written back to memory and remains in the cache as a write back line a write throug...

Page 117: ...IntelDX4 processor can accept EADS in every clock period while in Standard Bus mode In En hanced Bus mode the Write Back Enhanced IntelDX4 processor can accept EADS every other clock period or until a snoop hits an M state line The Write Back Enhanced IntelDX4 processor does not accept any further snoop cycles inputs until the previous snoop write back operation is completed All write back cycles ...

Page 118: ... with line fill Complete replacement write back if the cycle is burst Processor does not initiate a snoop write back but asserts HITM until the replacement write back is completed If the replacement cycle is non burst the snoop write back is re ordered ahead of the replacement write back cycle The processor does not continue with the replacement write back cycle Complete replacement write back if ...

Page 119: ...s allows snoop forced write backs to be backed off BOFF when snooping under AHOLD HITM is guaranteed to remain asserted until the RDY or BRDY signals corresponding to the last doubleword of the write back cycle is returned HITM is de asserted from the clock edge in which the last BRDY or RDY for the snoop write back cycle is asserted The write back cycle could be a burst or non burst cycle In eith...

Page 120: ...ack cycle until the line fill is completed because the line fill shown in Figure 4 39 is a burst cycle In this figure AHOLD is asserted one clock after ADS In the clock after AHOLD is asserted the Write Back Enhanced IntelDX4 processor floats the address bus not the Byte En ables Hence the memory controller must determine burst addresses in this period The chipset must comprehend the special order...

Page 121: ...ON Figure 4 39 Snoop Cycle Overlaying a Line Fill Cycle 242202 151 CLK AHOLD EADS INV HITM BRDY CACHE 1 2 3 4 5 6 7 8 9 10 11 12 13 BLAST A31 A4 A3 A2 0 4 8 C ADS W R To Processor Write back from Processor Fill 0 Fill ...

Page 122: ...is asserted The snoop write back cycle is re ordered ahead of an ongoing non burst cycle After the write back cycle is completed the fractured non burst cycle continues The snoop write back ALWAYS precedes the comple tion of a fractured cycle regardless of the point at which AHOLD is de asserted and AHOLD must be de asserted before the fractured non burst cycle can complete Figure 4 40 Snoop Cycle...

Page 123: ...s the processor uses the operand that triggered the line fill 3 If the snoop occurs when INV 1 the processor never updates the cache with the fill data 4 If the snoop occurs when INV 0 the processor loads the line into the internal cache 4 4 3 3 Snoop During Replacement Write Back If the cache contains valid data during a line fill one of the cache lines may be replaced as deter mined by the Least...

Page 124: ...t a specific ADS to initiate the write back cycle If there is a snoop hit to a different line from the line being replaced and if the replacement write back cycle is burst the replacement cycle goes to completion Only then is the snoop write back cycle initiated If the replacement write back cycle is a non burst cycle and if there is a snoop hit to the same line as the line being replaced it fract...

Page 125: ... the first uncompleted transfer Snoops are per mitted under BOFF but write back cycles are not started until BOFF is de asserted Conse quently multiple snoop cycles can occur under a continuously asserted BOFF but only up to the first asserted HITM Snoop under BOFF during Cache Line Fill As shown in Figure 4 42 BOFF fractures the second transfer of a non burst cache line fill cycle The system begi...

Page 126: ...minates and RDY is ignored Consequently the Write Back Enhanced IntelDX4 processor accepts only up to the x4h doubleword and the line fill resumes with the x0h doubleword ADS initiates the re sumption of the line fill operation in clock period 15 HITM is de asserted in the clock period following the clock period in which the last RDY or BRDY of the write back cycle is asserted Hence HITM is guaran...

Page 127: ...op cycle Snoop under BOFF during Replacement Write Back If the system snoop under BOFF hits the line that is currently being replaced burst or non burst the entire line is written back as a snoop write back line and the replacement write back cycle is not continued However if the system snoop hits a different line than the one currently being replaced the replacement write back cycle continues aft...

Page 128: ...ly multiple snoop cycles are permitted under a continuously asserted HLDA only up to the first asserted HITM Snoop under HOLD during Cache Line Fill As shown in Figure 4 44 HOLD asserted in clock two does not fracture the burst cache line fill cycle until the line fill is completed in clock five Upon completing the line fill in clock five the Write Back Enhanced IntelDX4 processor asserts HLDA and...

Page 129: ... seven which is the clock period in which the next RDY is asserted If the system snoop hits a modified line the snoop write back cycle begins after HOLD is released After the snoop write back cycle is completed an ADS is issued and the code prefetch cycle resumes 242202 156 CLK HOLD HLDA INV HITM A31 A4 A3 A2 ADS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 BLAST CACHE BRDY To Processor W R 0 4...

Page 130: ...llision of snoop cycles under a HOLD during the replacement write back cycle can never oc cur because HLDA is asserted only after the replacement write back cycle burst or non burst is completed 242202 157 CLK HOLD EADS HITM A31 A4 A3 A2 ADS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 BLAST CACHE To Processor W R 0 4 8 C INV RDY BRDY HLDA C 0 4 8 Prefetch Cycle Write Back Cycle Prefetch Cont ...

Page 131: ...the cache and is in an E or S state it is invalidated If the line is in the M state the processor does a write back and then in validates the line A locked cycle to an M S or E state line is always forced out to the bus If the operand is misaligned across cache lines the processor could potentially run two write back cy cles before starting the first locked read In this case the sequence of bus cy...

Page 132: ...portion of the locked cycle is completed the snoop write back starts under HITM After the write back is completed the locked cycle continues But during all this time including the write back cycle the LOCK signal remains asserted Because HOLD is not acknowledged if LOCK is asserted snoop lock collisions are restricted to AHOLD and BOFF snooping 242202 158 CLK ADS DATA Ti T1 T2 T1 T2 T1 T2 T1 T2 T1...

Page 133: ...Depending on the number of modified lines in the cache the flush could take a minimum of 1280 bus clocks 2560 processor clocks and up to a maximum of 5000 bus clocks to scan the cache perform the write backs invalidate the cache and run the flush ac knowledge cycles FLUSH is implemented as an interrupt in the Enhanced Bus mode and is rec ognized only on an instruction boundary Write back system de...

Page 134: ...nd transfers that take more than one bus cycle A 64 bit aligned operand can be read in one burst cycle or two non burst cycles if BS8 and BS16 are not asserted Figure 4 49 shows a 64 bit floating point oper and or Segment Descriptor read cycle which is burst by the system asserting BRDY 4 4 6 1 Snoop under AHOLD during Pseudo Locked Cycles AHOLD can fracture a 64 bit transfer if it is a non burst ...

Page 135: ...until clock four After the 64 bit transfer is completed the Write Back Enhanced IntelDX4 processor writes back the modified line to mem ory if snoop hits a modified line If the 64 bit transfer is non burst the Write Back Enhanced IntelDX4 processor can issue HLDA in between bus cycles for a 64 bit transfer 242202 161 CLK AHOLD EADS HITM A31 A4 A3 A2 ADS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18...

Page 136: ...igure 4 51 BOFF fractured a current 64 bit read cycle in clock four If there is a snoop hit under BOFF the snoop write back operation begins after BOFF is deasserted The 64 bit write cycle resumes after the snoop write back op eration completes 242202 162 CLK HOLD EADS HITM A31 A4 A3 A2 ADS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 BLAST CACHE To Processor W R 0 4 8 C INV PLOCK BRDY HLDA 64 ...

Page 137: ...ure 4 51 Snoop under BOFF Overlaying a Pseudo Locked Cycle CLK AHOLD EADS HITM A31 A4 A3 A2 ADS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 BLAST CACHE To Processor W R 0 4 8 C INV PLOCK BRDY Write Back Cycle 242202 163 ...

Page 138: ......

Page 139: ...5 Memory Subsystem Design Chapter Contents 5 1 Introduction 5 1 5 2 Processor and Cache Feature Overview 5 1 ...

Page 140: ......

Page 141: ... as multi user office computers require this feature to meet performance goals Single user systems on the other hand may not warrant the extra cost Due to the variety of applications incorporating the Intel486 processor memory system architecture is very diverse 5 2 PROCESSOR AND CACHE FEATURE OVERVIEW The improvements made to the processor bus interface impact the memory subsystem design It is im...

Page 142: ...at are not in the memory subsystem or are not capable of supporting burst cycles The RDY input is used for example to terminate an EPROM or I O cycle 5 2 2 The KEN Input The primary purpose of the KEN input is to determine whether a cycle is to be cached Only read data and code cycles can be cached Therefore these cycles are the only cycles affected by the KEN input Figure 5 1 shows a typical burs...

Page 143: ...shed a problem arises Decode functions are inherently asynchro nous Therefore the decoded output that generates KEN must be synchronized If it is not the CPU s setup and hold times are violated and internal metastability results With synchronization the delay required to generate KEN will be at least three clocks In the example shown four clocks are required In either case the KEN signal will not ...

Page 144: ...uses other effects that impact the memory subsystem design Perhaps the most obvious of these is the effect on bus traffic The fact that the internal cache uses the write through policy dramatically increases the number of write bus cycles Figure 5 3 illustrates this effect The chart on the left shows the bus cycle mix for an application executed with the Intel386 DX CPU The chart on the right show...

Page 145: ...de Cycle time and TCP CAS Precharge time would prevent zero wait state access at 33 MHz 5 2 4 2 Write Posting Analysis has shown that in general 6 degradation in performance can be expected for every additional wait state added to write cycles This analysis was performed by measuring the CPU clocks required to execute several applications A technique called write posting can be used to improve wri...

Page 146: ...tes for use with the Intel486 processor For single CPU systems the different architectures offer similar performance benefits in most cases The reason they are so similar is the mechanism which improves performance The primary benefit of the L2 cache is bus cycle latency reduction In most systems that incorporate a single Intel486 processor bus traffic from other bus masters is minimal With most m...

Page 147: ...ubsystem as an option both users requirements can be met A single sys tem design can be manufactured for both customers The operating system user can add the cache module Users can choose the system configuration which meets their price performance needs ...

Page 148: ......

Page 149: ... 6 2 Cache Memory 6 1 6 3 Cache Trade offs 6 2 6 4 Updating Main Memory 6 11 6 5 Non Cacheable Memory Locations 6 15 6 6 Cache and DMA Operations 6 16 6 7 Cache for Single Versus Multiple Processor Systems 6 16 6 8 An Intel486 Processor System Example 6 18 ...

Page 150: ......

Page 151: ...method maintain cache consistency and the impact on external bus utilization Cache consistency issues that arise when a DMA occurs while the Intel486 processor s cache is enabled Methods that ensure cache and main memory consistency during cache accesses Cache used in single versus multiple CPU systems 6 2 CACHE MEMORY Cache memory is high speed memory that is placed between microprocessors and ma...

Page 152: ...he subsystem in an embedded Intel486 processor design These considerations in clude the performance expectations operating system used DRAM cycle speed possible future upgrades to the initial application and system costs Although the Intel486 processor based per sonal computer often required a 256 K to 512 K L2 cache for optimal performance embedded applications have a wide variety of performance ...

Page 153: ... Speed Not all cache return data to the CPU as quickly as possible It is less expensive and complex to use slower cache memories and cache logic 6 3 1 Cache Size and Performance Hit rates for various first level cache configurations are shown in Table 6 1 These statistics are conservative because they illustrate the lowest hit rates generated by analyzing several main frame traces The hit rates ar...

Page 154: ...cache system access time as a fraction of main memory access time Cm cache memory access time as compared to main memory cycle time If the cache always misses then M 1 and Cm 1 and the main memory access is equal to the ef fective access time of the cache If the cache is infinitely fast then Cm is equal to the miss rate Because the cache access time is finite the cache system access time approache...

Page 155: ... by the processor regardless of the distance be tween the words in main memory The size of a block in the cache is also known as the line size and corresponds to the width of a cache word For example a block can be eight bytes for a 32 bit processor in which case two doublewords are accessed each time the cache line is filled In the example shown in Figure 6 1 the block size is one doubleword Figu...

Page 156: ...rts a cache index field which specifies the block s location in the cache and a tag field that distinguishes blocks within a particular cache location For example consider a 64 Kbyte direct mapped cache that contains 16 Kbyte 32 bit locations and cache 16 Mbytes of memory The cache index field must include 14 bits to select one of the 16 Kbyte blocks in cache plus two bits to decode one of the fou...

Page 157: ...th the value 02 cor responding to the index 0004 and the data D would be replaced by the word at location 020004 If the processor accesses locations that have the same index bits then the cache would have to be updated constantly This type of program behavior is infrequent however so a direct mapped 32 Bit Processor Address Cache DRAM Select TAG Index 31 24 23 16 15 0 16 Mbyte DRAM 24 Bits Main Me...

Page 158: ...ch cache index there are several block locations allowed and the block can be placed in any set or retrieved from any set Figure 6 3 shows a two way set associative cache memory Figure 6 3 Two Way Set Associative Cache Organization 32 Bit Processor Address TAG Index 32 24 23 15 14 0 16 Mbyte DRAM 24 Bits 2 x 32 K SRAM 2 x 15 Bits TAG Cache 64 Kbyte Index 001 1FF 000 001 000 0008 0004 0000 A B C D ...

Page 159: ... is in the cache but the block is not then the block is fetched from the main memory Sector buffering has its own trade offs associated with miss ratios and bus utilization Having smaller blocks increases the miss ratio but reduces the number of external bus accesses Conversely having a large number of blocks increases the hit ratio but also increases the external bus utilization Figure 6 4 shows ...

Page 160: ... increases additional words are fetched with the requested word Because of program locality the additional words are less likely to be needed by the pro cessor When a cache is refilled with four dwords or eight words on a miss the performance is dramati cally better than a cache size that employs single word refills Those extra words that are read into the cache because they are subsequent words a...

Page 161: ...pdate occurs In the pseudo LRU method the set that was assumed to be the least recently accessed is overwritten In the FIFO method the cache overwrites the block that is resident for the longest time In the random method the cache arbitrarily replaces a block The performance of the algorithms depends on the program behavior The LRU method is preferred because it provides the best hit rate 6 4 UPDA...

Page 162: ...te cycle increases the bus traffic on a slower memory bus This can create contention for use of the memory bus by other bus masters Even in a buffered write through scheme each write eventually goes to memory Thus bus utilization for write cycles is not re duced by using a write through or buffered write through cache Main Memory Cache CPU 1 Processor reads data into cache from main memory 2 The d...

Page 163: ...ache accesses memory less often than a write through cache because the number of times that the main memory must be updated with altered cache locations is usually lower than the number of write accesses This results in reduced traffic on the main memory bus A write back cache can offer higher performance than a write through cache if writes to main memory are slow The primary use of the a write b...

Page 164: ... all the other cache so that all copies are updated This is accomplished by routing the accesses of all devices to main memory through the same cache Another method is by copying all cache writes to main memory and to all of the cache that share main memory A hardware transparent system is shown in Figure 6 8 Figure 6 8 Hardware Transparency In non cacheable memory systems all shared memory locati...

Page 165: ...n memory locations must not be cached The PC ar chitecture has several special memory areas which may not be cached If ROM locations on add in cards are cached for example write operations to the ROM can alter the cache while main memory contents remain the same Further if the mode of a video RAM subsystem is switched it can produce altered versions of the original data when a read back is perform...

Page 166: ...7 1 Cache in Single Processor Systems In single CPU systems a write through cache is an ideal cache solution Write through cache solves consistency issues may be designed as a plug in option and is less expensive The main drawback of a write through cache is its inability to reduce main memory utilization for write cy cles However this is not as critical a consideration to single CPU designs 6 7 2...

Page 167: ... 10 Intel486 Processor System Arbitration Memory bus utilization in multiple CPU systems may be the most important performance con sideration In this type of system a cache should have a very high hit rate for both reads and writes Accesses to main shared memory must be minimized Write back cache is best suited for Arbitration Logic DMA Intel486 Processor 1 Intel486 Processor 0 BREQ 2 BACK 2 BREQ ...

Page 168: ...Cache consistency must be maintained whenever main memory accesses occur during DMA op erations Bus snooping and validation logic can monitor the bus to detect memory writes that may be initiated by other bus masters If such writes are detected portions of the processor and the L2 cache may have to be invalidated The Intel486 processor has mechanisms that can invalidate cache entries the L2 cache ...

Page 169: ...equent three reads in a burst cycle System performance degrades if main memory accesses are required However with the on chip L1 cache and the external L2 cache the number of main memory read accesses is reduced con siderably Figure 6 12 shows the memory hierarchy in a typical Intel486 processor system Figure 6 12 Intel486 Processor System Memory Hierarchy Because the Intel486 processor internal c...

Page 170: ......

Page 171: ...cessor Bus Interface 7 1 7 2 Basic Peripheral Subsystem 7 17 7 3 I O Cycles 7 29 7 4 Differences Between the Intel486 DX Processor Family and Intel386 Processors 7 33 7 5 Interfacing to x86 Peripherals 7 34 7 6 Intel486 Processor LAN Controller Interface 7 38 ...

Page 172: ......

Page 173: ...s for I O mapped devices or by memory operand instructions for memory mapped devices In addition the Intel486 processor always synchronizes I O instruction execution with external bus activity All previous instructions are completed before an I O oper ation begins In particular all writes pending in the write buffers are completed before an I O read or write is performed These functions are descri...

Page 174: ...s of ports which add up to less than 64 Kbytes The 64 Kbytes of I O address space refers to physical memory because I O instructions do not utilize the segmentation or paging hardware and are directly addressable using DX registers Memory mapped devices can be accessed using the Intel486 processor s instructions so that I O to memory memory to I O and I O to I O transfers as well as compare and te...

Page 175: ...s width is determined during each bus cycle to accommodate data transfers to or from 32 bit 16 bit or 8 bit devices The decoding circuitry can assert BS16 for 16 bit devices or BS8 for 8 bit devices for each bus cycle For addressing 32 bit devices both BS16 and BS8 are deasserted If both BS16 and BS8 are asserted an 8 bit bus width is assumed Appropriate selection of BS16 and BS8 drives the Intel4...

Page 176: ...a on the appropriate data bus pins Table 7 2 shows the data bus lines where the Intel486 pro cessor expects valid data to be returned for each valid combination of byte enables and bus sizing options Valid data is driven only on data bus pins which correspond to byte enable signals that are active during write cycles Other data pins are also driven but they do not contain valid data Unlike the Int...

Page 177: ...where I O instructions are separate I O addresses are shorter than memory addresses Typically processors with a 16 bit address bus use an 8 bit address for I O One technique for decoding memory mapped I O addressed is to map the entire I O space of the Intel486 processor into a 64 Kbyte region of the memory space The address decoding logic can be reconfigured so that each I O device responds to a ...

Page 178: ...ce to I O Devices To access to 8 bit devices the byte enable signals must be decoded to generate A0 and A1 Be cause A0 and BLE are the same the same generation logic can be used For 32 bit memo ry mapped devices A31 A2 can be used in conjunction with BE3 BE0 This logic is shown in Figure 7 3 Address Bus A31 A2 BE3 BE0 32 Bit I O Devices 16 Bit I O Devices 8 Bit I O Devices A31 A2 BE3 BE0 Byte Sele...

Page 179: ...combinations Table 7 2 byte swapping logic for 32 to 8 bit conversions can be implemented in various ways This section discusses an example in which BE3 BE0 are low and D7 D0 are used when BS8 is enabled Figure 7 4 shows the interfacing of an Intel486 processor to an 8 bit device This implementation requires seven 8 bit bidirectional data buffers A1 BE0 BE1 BHE BE1 BE3 BE0 BE2 BE0 BE1 BLE OR A0 ...

Page 180: ...or Byte 1 Buffer 2 and 4 are enabled BE1 and BEN8H For Byte 2 Buffer 1 and 5 are enabled BE2 and BEN8UL For Byte 3 Buffer 0 and 6 are enabled BE3 and BEN8UH Table 7 5 shows the truth table for 8 bit I O interface to the Intel486 processor The table also contains the values of the control signals used to enable the second set of buffers The PLD equa tions used to implement these signals are shown i...

Page 181: ...terface BEN8UL ADS BE2 BE1 BE0 BS8 ADS BEN8UL Swapping third byte for 8 bit interface BEN8UH ADS BE3 BE2 BE1 BE0 BS8 ADS BEN8UH Swapping fourth byte for 8 bit interface Table 7 5 32 Bit to 8 Bit Steering Sheet 1 of 2 Intel486 Processor 3 8 Bit Interface 1 BE3 BE2 BE1 BE0 BEN16 BEN8UH BEN8UL BEN8H BHE 2 A1 A0 0 0 0 0 1 1 1 1 X 0 0 1 0 0 0 1 1 1 1 X 0 0 0 1 0 0 1 1 1 1 X X X 1 1 0 0 1 1 1 1 X 0 0 0 ...

Page 182: ... and 5 0 1 1 0 1 1 1 1 X X X 1 1 1 0 1 1 1 1 X 0 0 0 0 0 1 1 1 1 1 X 0 1 1 0 0 1 1 1 1 0 X 0 1 0 1 0 1 1 1 1 0 X X X 1 1 0 1 1 1 1 0 X 0 1 0 0 1 1 1 1 0 0 X 1 0 1 0 1 1 1 1 0 1 X 1 0 0 1 1 1 1 0 1 1 X 1 1 1 1 1 1 1 1 1 1 X X X Table 7 5 32 Bit to 8 Bit Steering Sheet 2 of 2 Intel486 Processor 3 8 Bit Interface 1 BE3 BE2 BE1 BE0 BEN16 BEN8UH BEN8UL BEN8H BHE 2 A1 A0 Inputs Outputs NOTES 1 X implies...

Page 183: ... 9 shows the truth table for 32 to 16 bit bus swapping logic and A0 A1 and BHE gen eration The PLD equation used to implement 32 bit to 16 bit byte swap logic is shown in Tables 7 6 and 7 7 BUFF 0 BUFF 1 BUFF 2 BUFF 3 BE3 BE2 BE1 BE0 8 8 8 8 8 8 8 8 16 Bit BEN16 BUFF 4 BUFF 5 ...

Page 184: ...ation BEN16 ADS BE2 BE1 BE0 BS16 BS8 ADS BE3 BE1 BE0 BS16 BS8 ADS BEN16 swapping upper 16 bits Table 7 9 32 Bit to 16 Bit Bus Swapping Logic Truth Table Sheet 1 of 2 Intel486 Processor 3 8 Bit Interface 1 BE3 BE2 BE1 BE0 BEN16 BEN8UH BEN8UL BEN8H BHE 2 A1 A0 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 0 1 0 1 0 0 1 1 1 1 X X X 1 1 0 0 1 1 1 1 1 0 1 0 0 1 0 0 1 1 1 0 X 0 1 0 1 0 0 1 1 1 X X 0 Inputs Ou...

Page 185: ...wn in Figure 7 6 0 1 1 0 0 1 1 1 X X X 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1 X X X 1 1 0 1 1 1 1 1 1 0 1 0 0 1 1 0 1 1 1 0 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 X X X Table 7 9 32 Bit to 16 Bit Bus Swapping Logic Truth Table Sheet 2 of 2 Intel486 Processor 3 8 Bit Interface 1 BE3 BE2 BE1 BE0 BEN16 BEN8UH BEN8UL BEN8H BHE 2 A1 A0...

Page 186: ...le uses only four 8 bit wide bi directional buffers which are enabled by BE3 BE0 Table 7 2 provides different combinations of BE3 BE0 To provide greater flexibility in I O interface implementation the design should include interfaces for 32 16 and 8 bit devices The truth table for a 32 to 32 bit interface is shown in Table 7 10 PLD BS8 BS16 BS0 BS1 BS2 BS3 ADS From 8 Bit From 16 Bit BEN16 BEN8UH B...

Page 187: ...7 15 PERIPHERAL SUBSYSTEM Figure 7 7 32 Bit I O Interface BUFF 0 BUFF 1 BUFF 2 BUFF 3 BE3 8 8 8 8 8 8 8 8 BE2 BE1 BE0 Intel486 Processor Data Bus 32 Bit I O Device ...

Page 188: ...1 1 1 X X X 1 0 1 0 1 1 1 1 X X X 0 1 1 0 1 1 1 1 X X X 1 1 1 0 1 1 1 1 X X X 0 0 0 1 1 1 1 1 X X X 1 0 0 1 1 1 1 1 X X X 0 1 0 1 1 1 1 1 X X X 1 1 0 1 1 1 1 1 X X X 0 0 1 1 1 1 1 1 X X X 1 0 1 1 1 1 1 1 X X X 0 1 1 1 1 1 1 1 X X X 1 1 1 1 1 1 1 1 X X X Inputs Outputs NOTES 1 X implies do not care either 0 or 1 2 BHE byte high enable is not needed in 8 bit interface 3 indicates a non occurring pat...

Page 189: ...Figure 7 8 System Block Diagram An embedded Intel486 processor system may consist of several subsystems The heart of the sys tem is the processor The memory subsystem is also important and must be efficient and opti mized to provide peak system level performance As described in Chapter 5 Memory Intel486 Processor LAN Controller Cache Subsystem Memory Subsystems DMAC Memory Bus Bus Translator I O B...

Page 190: ...mance Intel486 processor based system requires an efficient peripheral subsystem This section describes the elements of this system including the I O devices on the expansion bus the memory bus and the local I O bus In a typical system a number of slave I O devices can be controlled through the same local bus interface Complex peripheral devices which can act as bus masters may require a more comp...

Page 191: ...ADS M IO D C W R RDY ADS M IO D C W R RDY Intel486 CPU Bus Control and Ready IOCYC EN Address Decoder CS1 CS0 INTA RECOV IOR IOW CS0 CS1 OE Data Transceiver DIR Data Bus CS0 RD WR A2 I O 2 32 Bit I O 1 32 Bit RD WR A2 CS1 32 32 32 4 To Interrupt Controller Data Bus Addr Bus BE3 BE0 ...

Page 192: ...nal such as a memory read command EPRD The command forces the selected memory device to output data Chapter 8 System Bus Design provides further explanation 2 Generate the IOCYC signal which indicates to the address decoder that a valid I O cycle is taking place As a result the relevant chip select CS signal should be enabled for the I O device Once IOCYC is generated the IOR and IOW signals are a...

Page 193: ...ndicates that the current bus cycle is complete It also indicates that the I O de vice has returned valid data to the Intel486 processor s data pins following an I O write cycle For the Intel486 processor RDY is ignored when the bus is idle and at the end of the first clock of the bus cycle The signal is utilized in wait state generation which is covered in the next section CLK Clock Input Signal ...

Page 194: ...pose and functionality of a wait state generator is de scribed in the next section C0 C1 C2 Counter Outputs 0 1 and 2 These outputs are internally decoded to generate a RDY signal and they represent the number of wait states implemented by the bus control logic The wait state generation logic is used to patch timing differences between the peripheral device and the Intel486 processor The next sect...

Page 195: ...ct signals for each system device The address space is divided into blocks and the address select signals indicate whether the address on the address bus is within the predetermined range The block size usually represents the amount of address space that can be accessed within a particular device and the address select signal is asserted for any address within that range Inputs ADS M IO D C W R SE...

Page 196: ...nge of addresses for each address select signal is much smaller than the address space of the memory mapped devices The minimum block size is determined according to the number of addresses being used by the peripheral device A typical address decoding circuit for a basic I O interface implementation is shown in Figure 7 11 It uses 74S138 Only one output is asserted at a time The signal correspond...

Page 197: ...0 9 7 Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Data Outputs 6 4 5 G1 G2A G2B Enable Inputs 1 2 3 A B C Select Inputs Function Table Inputs Outputs Enable Select G1 G2 C B A Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 X 1 X X X 1 1 1 1 1 1 1 1 0 X X X X 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 ...

Page 198: ...s are needed if the widest device has 16 data bits and if the I O device addresses are connected only to the lower byte of the data bus The 74S245 transceiver is controlled through two input signals Data Transmit Receive DT R The transceiver for write cycles is enabled when this signal is high and a read cycle is enabled when it is low This signal is simply a latched version of the Intel486 proces...

Page 199: ...tel486 processor up dates all memory locations before reading status from an I O device The Intel486 processor never buffers single I O writes When processing an I O write instruction OUT OUTS internal execution stops until the I O write actually completes on the external bus This allows time for the external system to drive an invalidate into the Intel486 processor or to mask interrupts before th...

Page 200: ... Consistency Some peripheral devices can write to cacheable main memory If this is the case cache consis tency must be maintained to prevent stale data from being left in the on chip cache Cache con sistency is maintained by adding bus snooping logic to the system and invalidating any line in the on chip cache that another bus master writes to Cache line invalidations are usually performed by asse...

Page 201: ... The I O read signal IOR signal is not asserted until RECOV is deasserted Data becomes valid after IOR is asserted with the timing dependent on the number of wait states implemented In the example two wait states are required for the slowest I O device to do a read and the bus control logic keeps IOR active to meet the minimum active time requirement The worst case timing values are calculated by ...

Page 202: ...EMBEDDED Intel486 PROCESSOR HARDWARE REFERENCE MANUAL 7 30 Figure 7 14 I O Read Timings CLK ADS M IO D C A31 A2 W R IOCYC IOR CS DATA RDY TDSU TDHD To CPU TRVD T2 T2 T2 T2 T1 T1 ...

Page 203: ...imilar to that of I O read cycle timings The processor outputs data in T2 The I O write signal IOW may be asserted one or two clocks after the chip select The exact delay between the chip select and the IOW varies according to the write requirements of the I O device Data is written into the I O device on the rising edge of IOW and the processor stops driving data once RDY data is sampled ac tive ...

Page 204: ...s the processor operation and the write cycle to the peripheral device can continue simultaneously This is illustrated in Figure 7 18 The write cycle appears to be only two clocks long from ADS to RDY because the actual write overlaps other CPU bus cycles Figure 7 17 Posted Write Circuit TWVD Write Signal Valid Delay TWVD TPLDpd 10 ns TDVD Write Data Valid Delay TDVD TVD TBUFpd 19 9 28 ns TDFD Wri...

Page 205: ...rnal memory to the on chip cache using only five clock cycles The Intel386 DX processor requires at least eight clock cycles to transfer the same amount of data The Intel486 processor has a BREQ output which supports multi processor environments The Intel486 processor s bus is significantly faster than the Intel386 processor s bus New features include a 1x clock parity support burst cycles cacheab...

Page 206: ...6 and TR7 have been added for testing of the on chip cache TLB testability has been enhanced The prefetch queue has been increased from 16 bytes to 32 bytes A jump must always execute after code modification to ensure proper execution of the new instruction After reset the ID in the upper byte of the DX register is 04 The contents of the base register including the floating point registers may be ...

Page 207: ...errupt source Once the interrupt service rou tine is executed the previous processor state is restored and program execution resumes The Intel486 processor can handle up to 256 interrupts exceptions Refer to the Embedded Intel486 Processor Family Developer s Manual for the interrupt table The interrupt driven environment increases system throughput and allows more tasks to be ac complished by the ...

Page 208: ... timings are as follows Each interrupt acknowledge cycle must be extended by at least one wait state which is implemented by the wait state generator logic described in Section 7 2 Basic Peripheral Subsystem Four idle cycles must be inserted between two interrupt acknowledge cycles DATA W R Intel486 Processor Clock Generator Master Mode VCC IRQ7 IRQ6 IRQ1 IRQ0 CAS0 CAS1 CAS2 INTA WR RD A0 CS 8 Bit...

Page 209: ...ve devices Figure 7 20 Cascaded Interrupt Controller The function of each slave controller is to identify the priorities among eight interrupt requests and generate a single interrupt request for the master controller The master controller must iden tify the priorities among eight slave controllers and transmit a single interrupt request to the Intel486 processor Master 82C59A Programmable Interru...

Page 210: ...outine vector to the Intel486 processor The service routine must include com mands to poll the third level to determine the source of the interrupt request The additional hardware required to implement this configuration includes additional 82C59A devices and the chip select logic 7 6 Intel486 PROCESSOR LAN CONTROLLER INTERFACE This section describes two LAN interface solutions using Intel control...

Page 211: ...e spacing for back to back frame transmission and reception 80 106 Mbytes second bus transfer rate burst at 25 33 MHz 50 66 Mbytes second bus transfer rate non burst at 25 33 MHz Figure 7 21 is a block diagram of the 82596 coprocessor A serial subsystem interfaces to the physical layer device for the network This subsystem performs CSMA CD media access control and channel interface functions It su...

Page 212: ... network management functions including internal and external loopback exception condition tallies channel activity indicators optional capture of all frames promiscuous mode optional capture of erroneous or collided frames and time domain reflectometry for locating fault points on the network cable The 32 bit statistical counters monitor CRC errors alignment errors overrun errors resource errors ...

Page 213: ...bus parallel interface and the network serial interface as shown in Fig ure 7 22 The signals for both interfaces are listed in Table 7 12 The coprocessor s bus cycles including burst cycles bus interface timing bus arbitration method and signal definitions are compatible with the Intel486 processor When the coprocessor is not holding the bus its bus in terface signals are floated The state machine...

Page 214: ...HK O Parity error Cycle Definition and Control ADS O Address status W R O Write or read PORT I Port access RDY I Non burst data ready BRDY I Burst data ready BLAST O Last burst cycle Bus Control CLK I Clock RESET I Reset INT INT O Interrupt BREQ I Bus request HOLD O Bus hold request HLDA I Bus hold acknowledgment AHOLD I Address hold request BOFF I Bus backoff LOCK O Bus lock CA I Channel attentio...

Page 215: ...d BREQ from the proces sor can trigger the coprocessor s bus throttle timers when needed as shown in Figure 7 23 Network Serial Interface TxD O Transmit data TxC O Transmit clock LPBK O Loopback RxD I Receive data RxC I Receive clock RTS O Request to send CTS I Clear to send CRS I Carrier sense CDT I Collision detect Table 7 12 82596 Signals Sheet 2 of 2 Signal Type Description Signals marked with...

Page 216: ...he processor bus as either a bus master or a slave port ac cess mode In normal operation it is a bus master which moves data between the system memory and the coprocessor s control registers or internal FIFOs The coprocessor can use the same burst cycles bus hold address hold bus backoff and bus lock operations that the Intel486 processor uses The coprocessor and the processor communicate through ...

Page 217: ...processor intervention The processor becomes involved only after a command sequence has finished executing or after a sequence of frames has been received and stored ready for processing In addition to this normal operating mode the processor can initiate a port access in the copro cessor This mode may be entered whenever the coprocessor is not actively driving the bus It allows the processor to w...

Page 218: ...ent Linear Mode Uses 32 bit addresses with no restrictions on the placement of any shared memory structure Big endian and little endian byte ordering schemes are supported For compatibility with the Intel486 processor the little endian scheme should be used 7 6 1 4 Media Access The 82596 coprocessor accesses the cable media network through the serial subsystem This sub system performs the full set...

Page 219: ...ructures The frame struc ture stored is the same as that for frames to be transmitted The data contained in the buffers is transferred by means of the on chip DMA controller This allows bidirectional autonomous transfer of data blocks partitioned as buffers or chained into frames Buffers which contain errors are recovered automatically without processor intervention The coprocessor monitors the fr...

Page 220: ...cessor Drive the M IO and D C processor bus signals when the coprocessor is bus master The coprocessor s RESET input is referred to in Figure 7 26 and the text below as 596RESET to distinguish it from the processor s RESET To assert the CA or 596RESET signals the processor drives a memory mapped I O cycle During such a cycle address decode is done while monitoring CLK ADS HLDA and D0 to distin gui...

Page 221: ...once by the processor for every update of this area made by the coprocessor The processor gains no advantage from caching locations which are used only once Also each time a cached memory location is written to by the coprocessor a cache invalidation cycle must be performed For systems in which caching is obligatory external logic must monitor ADS and W R and drive the EADS cache invalidation inpu...

Page 222: ...low cost through its high integration It contains a 32 bit PCI Bus Master interface to fully utilize the high bandwidth up to 132 Mbytes per second available to masters on the PCI bus The bus mas ter interface can eliminate the intermediate copy step in Receive RCV and Transmit XMT frame copies resulting in faster processing of these frames The 82557 maintains a similar mem ory structure to the 82...

Page 223: ... dynamic transmit chaining for enhanced performance Programmable transmit threshold for improved bus utilization Early receive interrupt for concurrent processing of receive data FLASH support up to 1 Mbyte Large on chip receive and transmit FIFOs 3 Kbytes each On chip counters for network management Back to back transmit at 100 Mbps EEPROM support Support for both 10 Mbps and 100 Mbps networks In...

Page 224: ...ta or deposit received data In both cases the 82557 as a bus master device initiates memory cycles via the PCI bus to fetch deposit the re quired data In order to perform these actions the 82557 is controlled and examined by the CPU via its control and status structures and registers Some of these control and status structures re side on chip and some reside in system memory For access to its Cont...

Page 225: ...ting the CU causes the 82557 to begin executing the CBL When execution is complete the 82557 updates the SCB with the CU status then interrupts the CPU if configured to do so Activating the RU causes the 82557 to access the RFA and go into the READY state for frame reception When a frame is received the RU updates the SCB with the RU status and interrupts the CPU It also au tomatically advances to...

Page 226: ......

Page 227: ...8 System Bus Design Chapter Contents 8 1 Introduction 8 1 8 2 System Bus Interface 8 1 8 3 EISA Bus System Design Example 8 2 8 4 PCI Bus System Design Example 8 19 ...

Page 228: ......

Page 229: ...bsystem memory cache and I O control Each of these subsystems have been described in detail in the previous chapters The sys tem bus is the vehicle by which the Intel486 processor communicates with other processing sub systems that perform operations simultaneously on their own local buses A major concern when designing a system with various subsystems is how to divide the allocated resources A de...

Page 230: ... 32 bit addressability and with 8 16 or 32 bit data 32 bit DMA devices can transfer data at 33 Mbytes sec using burst cycles EISA based computers support a bus master architecture for intelligent peripherals The bus mas ter provides a high speed channel with data rates up to 33 Mbytes sec The bus master provides localized intelligence with a dedicated I O processor and local memory to relieve the ...

Page 231: ...h performance system with an Intel486 processor residing on the host bus Three EISA support devices an EISA bus controller EBC an integrated system peripheral ISP and EISA bus buffers EBB interface the host bus to the EISA bus The three devices also communicate with each other ...

Page 232: ...eady Logic Page Hit Director DRAM Control Host Bus ISP 32 16 Bit Masters 32 16 8 Bit I O Slaves EBC EBB 32 16 8 Bit Memory EBB Address Buffers Select Logic 82077 Low Power SRAM BIOS AEN Decoder 8042 Real Time Clock Buffer Data Address Control XBUS Data Address Control Memory Address Decode Write Data Buffer Data Mux Addr MUX DRAM EIDA Bus ...

Page 233: ...32 bit DMA transfers and interacts with the DMA controller It provides byte assembly and disassembly for 8 16 and 32 bit data transfers The EBC generates the appropriate data conversion and assembly control signals to fa cilitate transfers of various data widths between the host and ISA and EISA buses The EBC posts processor to EISA ISA write cycles to improve system performance and provides I O r...

Page 234: ...t I O Recovery Testing SDCPYUP SDHDLE 3 0 SDOE 2 0 HDSDLE1 HDOE 1 0 HALAOE LAHAOE LASAOE SALAOE HALE SALE HBE 3 0 HADS 1 0 HNA HD C HW R HM IO HLOCK HRDYI HRDYO HERDYO HHOLD HHLDA HLOCMEM HLOCIO HGTI6M HSTRETCH HSSTRB RDE RST RSTCPU RST385 RSTAR SPWROK CPU 3 0 SDCPYEN HCLKCPU BCLK CLKKB BCLKIN BALE SA 1 0 SBHE IORC IOWC MRDC MWTC SMRDC SMWTC IO16 M16 NOWS CHRDY REFRESH MASTER16 BE 3 0 M IO W R LOC...

Page 235: ...interrupts NMI are also supported The ISP has five counters timers that can provide system timer interrupts for a time of day a diskette timeout DRAM refresh requests and other system timing operations The DMA controller is integrated in the ISP and it has the necessary logic to set up initiate and complete DMA transfers Various types of DMA transfers are pro vided for including single transfer bl...

Page 236: ...n various modes to support data and ad dress interfaces It has a 32 bit mode without parity and a 32 bit data mode with parity support Bus Interface 15 Level Interrupt Control IRQ 1 3 7 8 9 15 INT BCLK NMI IOCHK PARITY SPKR SLOWH OSC Timer 2 Counter 2 Timer 2 Counter 0 Timer 1 Counter 2 Timer 1 Counter 1 Timer 1 Counter 0 DREQ DACK MREQ MACK REFRESH DHLDA CPUMISS DHOLD EMSTR16 EXMASTER System Arbi...

Page 237: ...ectional signals that indicate valid bytes during an operation They are inputs during host bus master cycles and are outputs during EISA bus master cycles as well as when the ISP is performing DMA or refresh cycles Host Byte High Enable HBHE is a bidirectional signal When asserted it indicates that the up per byte of the 16 bit host bus is involved in the transfer It is an input during host bus ma...

Page 238: ...he HMI O signal If this signal is asserted on host bus master I O cycles it prevents EISA bus cycle from initiating This signal is used to determine if the I O device is being accessed on the host bus during EISA ISA master I O cycles Host bus stretch HSTRETCH is an input used by host bus slaves during EISA ISA master cy cles to run zero EISA wait state cycles This input can be used during DMA cyc...

Page 239: ...global AEN that is decoded with the LA bus ad dress bit to generate the AENx signals This is shown in Table 8 1 The following is a brief functional description of the interface signals between the EISA ISA bus and the EBC 8 3 4 1 EBC and EISA Bus Interface Signals Byte enables BE3 BE0 are bidirectional signals that indicate which bytes are involved in the current cycle They are outputs during host...

Page 240: ... in control It is asserted if the host memory is accessed and has asserted HSLBURST EISA 32 bit device EX32 is a bidirectional open collector signal that is asserted by 32 bit EISA slaves to indicate 32 bit bus size The signal is used to determine matched or unmatched data sizes on masters and slaves Once the sizes are determined the EBC assembles and disas sembles data and performs multiple EISA ...

Page 241: ...he data bus It is asserted during CPU DMA or EISA ISA master write cycles to 16 bit or 8 bit ISA memory slaves when the address range is less than one megabyte Channel ready CHRDY is a bidirectional open collector signal which is used by the ISA slaves to insert wait states It is an output during ISA master cycles and accesses host bus slaves or EISA slaves No wait states NOWS is an input asserted...

Page 242: ...SP is not a bus master The four signals function as address strobe for the ISP memory or I O cycle indicator the interrupt acknowledge cycle indicator and the EISA bus master cycle indicator respectively EISA master EXMASTER is an input signal to the EBC which indicates that a 16 bit or 32 bit EISA master has control of the EISA bus It is used with the MASTER16 signal to differentiate between 32 b...

Page 243: ...o copy the lower bytes to the higher bytes and vice versa System EISA to host data latch enables SDHDLE3 SDHDLE0 are outputs that control the latching of data from the EISA bus to the host bus System EISA data output enable SDOE2 SDOE0 are output enables to data buffers on the EISA bus Host data to system EISA data latch enables HDSDLE1 HDSDLE0 are outputs that con trol the latching of data from t...

Page 244: ...ss bus It can be asserted during EISA master CPU regular DMA and DMA burst cycles 8 3 6 1 Functions of the ISP The ISP provides system arbitration DMA control interrupt control and counting by using in terval timer counters The system arbiter on the ISP evaluates requests from several sources including DMA channels EISA devices refresh requesters and the host CPU DREQ is generated by 8 16 or 32 bi...

Page 245: ...output during master mode It is sent to the EBC which propa gates the appropriate read write signals to the EISA bus Upon reset this signal is 3 stated and configured as an input Slow down host CPU SLOWH is an output from CPU slowdown timer 2 which is used to slow down the host CPU CPU cache miss CPUMISS is an input signal from the host CPU or the cache controller sub system which indicates that a...

Page 246: ...ates that a chain buffer has expired and that a new chain buffer must be programmed Interrupt request IRQ 15 3 1 are interrupt inputs to the ISP Byte enables BE3 BE0 are the EISA bus byte enables BE3 BE1 are bidirectional and BE0 is output only In master mode the ISP drives these lines In slave mode the BE3 BE1 are inputs to the ISP and are used to access the internal registers BE0 is remains an o...

Page 247: ...2 1 requires that all de vices support operation down to 0 MHz Revision 2 2 adds support for 66 MHz implementation requiring that all devices operate from 0 MHz 66 MHz PCI based computers support a Bus Initiator Target architecture for intelligent peripherals All transactions on the PCI bus are in burst mode The initiator starts by driving an address on the PCI Address Data bus and by driving the ...

Page 248: ...performance IDE A block diagram of a system that uses this type of PCI chip set is shown in Figure 8 5 Figure 8 5 Example System Block Diagram Power Management Interrupt DMA Timer ISA Bus 5 Slots 82091AA AIP KBC BIOS RTC X Bus PCI Graphics Device IDE Interface PCI Bus Main Memory System Controller ISA Bridge System Interface L2 Cache Host Bus Intel486 Processor Family and Upgrades HA 17 2 Data Opt...

Page 249: ... DRAM controller interfaces main memory to the Host bus and the PCI bus The system controller supports a two way interleaved DRAM organization for optimum performance Up to ten single sided SIMMs or four double sided and two single sided SIMMs provide a maximum of 128 Mbytes of main memory The system controller provides memory write posting to PCI for enhanced CPU to PCI memory write performance I...

Page 250: ...RESET INIT HD31 HD0 HDP3 HDP0 BE3 BE0 M IO D C W R PCD CACHE ADS RDY BRDY BLAST HOLD HLDA AHOLD KEN EADS HITM SMI SMIACT CI3E CI302 CWE1 CWE0 COE1 COE0 TWE TAG8 TAG0 PCLKIN HCLKIN CLK2IN CPURST KBDRST FRAME TRDY IRDY LOCK STOP PAR SERR DEVSEL C BE3 C BE0 AD22 AD16 AD31 or IDE1CS AD30 or IDE3CS AD29 or DIR AD28 or IORDY AD27 AD25 or AD24 or IOR AD23 or IOW AD15 AD0 or LBIDE CMDV SIDLE LREQ LGNT PRE...

Page 251: ...EQ LGNT SMI STPCLK EXTSMI CLK2IN CLK2OUT HCKLOUT2 HCKLOUT1 SYSCLK PCICLK2 PCICLK1 XBUSTR XBUSOE BIOSCS KBCCS RTCCS RTCALE FERR IGNNE OSC SPKR IOCS16 MEMCS16 ZEROWS MEMR MEMW SMEMR SMEMW IOCHRDY BALE IOR SERR DREQ7 DREQ5 DACK7 DACK5 TC REFRESH IRQ8 IRQ 15 14 11 9 7 3 1 IRQ12 M INTR PIRQ0 PIRQ1 TESTIN CPURST RSTDRV PCIRST PWROK SRESET IOW LA23 LA17 SA19 SA0 SD15 SD0 SBHE HCLKIN AEN IOCHK NMI DREQ3 D...

Page 252: ...ades 8 4 3 1 Host Bus Slave Device The PCI chip set can be configured via the HOST Device Control register to support an Intel486 Host bus slave device for example a graphics device Two special signals HDEV and HRDY as defined by the VL bus specification are used in the interface to the Host bus slave The system controller can be configured to monitor HDEV for all memory and I O ranges that are no...

Page 253: ...ignal is active along with the first ADS until the first RDY or BRDY For line fills the functionality of the CACHE signal is identical to that of the PCD signal During write back cycles CACHE is always asserted at the beginning of the line write back The beginning of a write back cycle is uniquely identified by active ADS W R and CACHE Beginning of the snoop write back is identified by the ADS W R...

Page 254: ...em controller is a PCI bus master for Host to PCI accesses and a target for PCI to main memory accesses or accesses that are forwarded to the ISA bus The Host can read or write configuration spaces PCI memory space and PCI I O space 8 4 3 4 PCI Bus Cycles Support When the host initiates a bus cycle to a PCI device the system controller becomes a PCI bus mas ter and translates the CPU cycle into th...

Page 255: ...data is then driven as a single dword cycle on the PCI bus Byte merging is performed in the compatible VGA range only 8 4 3 6 Exclusive Cycles The system controller as a PCI master never performs LOCKed cycles The CPU does not return active HLDA while it is performing a LOCKed sequence Also the CPU is the only active mas ter as long as HLDA is inactive Thus the system controller does not need to d...

Page 256: ...ated after a hard reset Initiator Ready IRDY is an output when system controller is a PCI master IRDY is an input when the system controller is a PCI slave IRDY indicates that the initiator of the cycle is ready This signal is 3 stated after a hard reset LOCK indicates an exclusive bus operation and may require multiple transactions to complete The system controller supports a bus type of LOCK onl...

Page 257: ...nd the ISA bridge Four sideband signals synchronize data flow and bus ownership Link Request LREQ Link Grant LGNT Command Valid CMDV and Slave Idle SIDLE LREQ and LGNT are used by the ISA bridge to arbitrate for link mastership Only the ISA bridge drives LREQ while on the system controller drives LGNT CMDV is driven by the current link master whereas SIDLE is driven by the current link slave Comma...

Page 258: ...SYSCLKs between back to back 8 and 16 bit I O cy cles to the ISA bus This delay is measured from the rising edge of the I O command IOR or IOW to the falling edge of the next I O command If a delay of greater than 3 5 SYSCLKs is required the ISA I O Recovery Timer register can be programmed to increase the delay in incre ments of SYSCLKs No additional delay is inserted for back to back I O sub cyc...

Page 259: ...llowing cycles if ZEROWS is sampled asserted low During ISA bridge master cycles not including DMA to 8 bit and 16 bit ISA memory During ISA bridge master cycles not including DMA to 8 bit ISA I O only For ISA master cycles targeted for the ISA bridge s internal registers or main memory the ISA bridge does not assert ZEROWS When IOCHRDY and ZEROWS are sampled low at the same time IOCHRDY takes pre...

Page 260: ...t a parity or an uncorrectable error has occurred for a device or memory on the ISA bus If IOCHK is asserted and NMIs are enabled an NMI is generated to the CPU I O Read IOR when asserted indicates to an ISA I O slave device that the slave may drive data on the ISA data bus SD15 SD0 The I O slave device must hold the data valid until after IOR is deasserted IOR is an output when the ISA bridge own...

Page 261: ... MEMW This signal is deasserted after a hard reset Zero Wait States ZEROWS is asserted by an ISA slave after its address and command signals have been decoded to indicate that the current cycle can be shortened If IOCHRDY is deasserted and ZEROWS is asserted during the same clock then ZEROWS is ignored and wait sates are added as a function of IOCHRDY i e IOCHRDY has precedence over ZEROWS System ...

Page 262: ...gure 8 8 Internal DMA Controller 8 4 6 1 DMA Status and Control Interface DMA Request lines DREQ3 DREQ0 DREQ7 DREQ5 are used to request DMA service from the ISA bridge s DMA controller or for a 16 bit master to gain control of the ISA expansion bus The active level high or low is programmed via the DMA Command register All inactive to active edges of DREQ are assumed to be asynchronous The request...

Page 263: ...n Performance 9 2 9 3 Internal Cache Performance Issues 9 4 9 4 On Chip Write Buffers 9 7 9 5 External Memory Considerations 9 8 9 6 Second Level Cache Performance Considerations 9 11 9 7 Dram Design Techniques 9 14 9 8 Extended Data Output RAM EDO RAM 9 14 9 9 Floating Point Performance 9 16 ...

Page 264: ......

Page 265: ...ractical for almost all applications since they would require huge amounts of 15 ns memory to run at 33 MHz Practical systems use DRAM of 60 100 ns access times The Intel486 proces sor is designed to effectively use DRAM This chapter examines memory system design using DRAM There are many different performance options in the design of the memory subsystem for the Intel486 processor The CPU clock s...

Page 266: ...pared to earlier microprocessors It also explains how memory bandwidth and latency affect performance 9 2 1 Intel486 Processor Execution Times The Intel486 processor uses several techniques to execute many frequent instructions in a single clock The processor has an on chip code data cache and a five stage pipelined execution unit The Intel486 processor decodes many simple instructions directly in...

Page 267: ... R 6 9 1 0 069 Push R 6 1 1 0 061 Move R R 5 7 1 0 057 Move R I 5 5 1 0 055 JCC taken 4 6 3 4 0 156 JCC fail 4 5 1 0 045 ALU2 R R 4 3 1 0 043 POP R 4 0 1 16 0 046 JMP M 2 9 3 4 0 099 ALU2 R M 2 9 2 16 0 063 ALU2 M I 2 9 3 16 0 092 Call 2 8 3 4 0 095 Shift R 2 8 2 0 056 ALU2 R I 2 8 1 0 028 RET 2 7 5 56 0 028 String 2 6 3 16 0 150 ALU1 R 1 2 1 0 082 LDS 1 4 12 0 020 ALU2 M R 1 3 3 16 0 168 ALU1 M 1...

Page 268: ...byte on the IntelDX4 processor sets Each set contains 128 lines 256 lines on the IntelDX4 processor Cache lines are 16 bytes long Lines in the cache are either valid or not valid There is no provision for partially valid lines Read requests are generated either by program flow data request or an instruction prefetch code request The great majority of the time these requests are usually satisfied b...

Page 269: ...to execute many common instructions in one clock 2 The system bus utilization decreases Because a high percentage of reads are satisfied by the cache the Intel486 processor bus is idle a large percentage of the time Additional bus masters can reside in the system without bus saturation and the resulting performance degradation 3 The ratio of writes to reads is increased on the external bus The num...

Page 270: ...reads because of this mix of bus cycles With the Intel486 processor s on chip cache however the high hit rate reduces the number of ex ternal reads As the on chip cache implements a write through policy the number of writes to the bus is not reduced As a result external bus read cycles are now a minor portion of the overall Table 9 2 Programs Used Name Description A FRAME Desktop publishing packag...

Page 271: ...write then the on chip cache is updated immediately Writes are normally executed on the external bus in the same order in which they are received by the write buffers as in a FIFO Under certain conditions a memory read can take priority and the sequence of external bus cycles can be reordered even though the writes occurred earlier in pro gram execution A memory read will only be reordered before ...

Page 272: ...ade off can be compromised by partitioning functions and using a com bination of both fast and slow memories The most frequently used functions are placed in a faster memory A common use of faster memory devices is implementation of an external cache built of fast SRAM devices Fast SRAM devices have high enough bandwidth to achieve optimum performance An external cache also called L2 cache can als...

Page 273: ...active in the second clock of the cycle indicating that it is able to perform a burst cycle The external system indicates that it will initiate a burst cycle by asserting BRDY If BRDY is not asserted at the second clock wait states are inserted If a system executes non burst reads in two clocks burst reads in one clock and writes in three clocks a 2 1 3 system is indicated Because of the on chip c...

Page 274: ...rites write latency due to slower external memory should impact overall performance more than read latency However the on chip write buffers reduce the de pendence on write latency 9 5 4 Bus Utilization and Wait States Figure 9 4 demonstrates external bus utilization versus systems with different wait state config urations The percentage figures were calculated by dividing the number of bus cycles...

Page 275: ...that miss the internal cache will result in external read bus cycles being executed For best system performance an external L2 cache reduces wait states for these read cycles This section discusses the use of a L2 cache Different applications and operating environments experience varying performance benefits from use of an L2 cache Hit rates for L2 caches depend on the application being executed a...

Page 276: ...uent three doublewords This implies the fastest read cycle time for cache hits on the 485Turbocache Module For cache miss es the data is fetched from the main memory and then sent to both the Intel486 processor and the 485Turbocache Module On write operations the 485Turbocache Module operates like the Intel486 processor s cache by updating write hits and not updating write misses The main mem ory ...

Page 277: ...ter devices like DMA or LAN controllers Systems with multiple CPUs are sensitive to the amount of bus band width used by each CPU Note that with a write through cache the minimum bus bandwidth is the number of writes performed 0 0 0 1 0 4 0 5 0 6 0 7 0 8 0 9 1 0 0 3 0 2 3 1 3 Page Hit 5 1 4 4 2 4 Page Hit 7 1 5 Page Miss 7 2 5 Page Miss L2 Cache Performance Data with One Write Buffer Intel486 CPU ...

Page 278: ...igns because EDO RAM handles sequential reads better than Fast Page Mode FPM RAM Extended Data Out page mode read accesses are similar to FPM read accesses except that when CAS is driven high the data outputs are not dis abled and the data latch is used to guarantee that the valid data is held until CAS goes low again With EDO RAM the data latch is controlled during page mode accesses by CAS Data ...

Page 279: ...les so that as much as 77 percent of the external bus cycles are write cycles In program execution writes occur in strings of two about 60 to 70 of the time Writes occur in strings of three 40 50 of the time The DRAM subsystem must be optimized for write strings one method is to support posted writes with write buffers Posting writes means that RDY is returned to the CPU before the write transacti...

Page 280: ... instruction execution Within the Intel486 processor the floating point instructions share the microcode ROM with integer instructions However floating point operations do not utilize the microcode ROM after the operation has been prepared for execution For example only the first three clocks of the floating point add multiply and divide instructions use the microcode ROM After the third clock the...

Page 281: ...ly the store instruction takes 7 clocks Because the Intel486 processor provides a higher performance not only for floating point loads and stores but also for floating point compute operations a 3x to 4x performance boost is real ized for numerics intensive routines A large portion of the performance improvement is attrib uted to the fact that synchronous floating point transfers occur on chip 9 9...

Page 282: ......

Page 283: ...1 10 2 Power Dissipation and Distribution 10 1 10 3 High Frequency Design Considerations 10 9 10 4 Latch Up 10 30 10 5 Clock Considerations 10 30 10 6 Thermal Characteristics 10 33 10 7 Derating Curve and its Effects 10 36 10 8 Building and Debugging the Intel486 Processor Based System 10 37 ...

Page 284: ......

Page 285: ...signers who are respon sible for providing suitable interconnections at the system level The interconnections in a circuit behave like transmission lines which degrade the system s over all speed and distort output waveforms In laying out a conventional printed circuit board there is freedom in defining the length shape and sequence of interconnections But with devices such as the Intel486 process...

Page 286: ... in the Intel486 processor family datasheets The Intel486 processor s output valid delays increase if these load ings are exceeded The addressing pattern of the software can affect I O buffer power dissipation by changing the effective frequency at the address pins The frequency variations at the data pins tend to be smaller a varying data pattern should not cause a significant change in the total...

Page 287: ... First it provides a constant characteristic impedance to signal interconnections Second it provides a low impedance path for ground currents on the V supply The advantage of a power plane is to reduce EMI For example when adjacent signal lines are switching EMI may occur The power plane is used to separate adjacent layers of signal lines which reduces EMI All power and ground pins must be connect...

Page 288: ...imize inductance Figure 10 2 shows methods for reducing the inductive effects of PCB traces The power and ground trace layout has a low resis tance This is because the loop area between the integrated circuits ICs and the decoupling ca pacitors is small and the power and ground traces are physically close This results in lower characteristic impedance which in turn reduces the line voltage drop ...

Page 289: ...ESIGN AND SYSTEM DEBUGGING Figure 10 2 Typical Power and Ground Trace Layout for Double Layer Boards Decoupling Capacitors IC Packages GND VCC VCC VCC GND GND GND VCC Typical values should range between 01 µF and 1 µF ...

Page 290: ...us technique but produces similar results This arrangement is shown in Figure 10 3 These techniques reduce the electromagnetic interference EMI which is discussed in Section 10 3 3 1 Electromagnetic Interference EMI Figure 10 3 Decoupling Capacitors VCC GND VCC Trace GND GND GND GND GND VCC Return or GND Trace Typical values should range between 01 µF and 1 µF ...

Page 291: ... on the same board or other boards in a multi board system It is necessary to match the supply s impedance to that of the components in order to lessen the poten tial for voltage drops that can be caused by IC edge rates ground or signal level shifting noise induced currents or voltage reflections This mismatch can be minimized using suitable high frequency capacitors for bulk decoupling of major ...

Page 292: ... some cases it might be helpful to add a 1 µF tantalum capacitor at major supply trace branches particularly on large PCBs Surface mount chip capacitors are preferable for decoupling the Intel486 processor because they exhibit lower inductance and require less total board space They should be connected as shown in Figure 10 5 These capacitors reduce the inductance which keeps the voltage spikes to...

Page 293: ... Intel486 CPU designs require the identification of the transmission lines over backplane wiring printed circuit board traces etc Once this task is accomplished the designer s next con cern should be to deal with three major problems which are associated with electromagnetic prop agation impedance control propagation delay and coupling electromagnetic interference The following sections discuss th...

Page 294: ... inductance and a capacitance which combine to produce the characteristic impedance Z The value of Z depends upon physical attributes such as cross sectional area the distance between the conductors and other ground or signal conductors and the dielectric constant of the material between them Because the characteristic impedance is reactive its effect increases with frequency 10 3 1 1 Transmission...

Page 295: ...dielectric only This is calculated as follows tpd 1 017 ns ft For G 10 fiberglass epoxy boards er 5 0 the propagation delay of micro strip is calculated to be 1 77 ns ft Figure 10 7 Micro Strip Lines 10 3 1 3 Strip Lines A strip line is a flat conductor centered in a dielectric medium between two voltage planes The characteristic impedance is given theoretically by the equation below Z0 60 ln 5 98...

Page 296: ...eed Digital Design A Handbook of Black Magic by Howard W Johnson and Martin Graham Publisher Prentice Hall Inc 10 3 2 Impedance Mismatch As mentioned earlier the impedance of a transmission line is a function of the geometry of the line its distance from the ground plane and the loads along the line Any discontinuity in the im pedance causes reflections Impedance mismatch occurs between the transm...

Page 297: ...ershoot and Undershoot Effects Figure 10 10 Loaded Transmission Line Overshoot is caused by poor matching which occurs when the voltage level exceeds the maxi mum upper limit of the output voltage Undershoot occurs when the level exceeds the minimum lower limit These conditions can cause excess current on the input gates which results in per manent damage to the device A5285 02 Voltage Undershoot ...

Page 298: ... delayed by tpd VB VA t x v H t x v where x distance along the transmission line from point A and H t is the unit step function The waveform encounters the load ZL and this may cause reflection The reflected wave enters the transmission line at B and appears at point A after time delay tpd Vr1 tL VB This phenomenon continues infinitely but it is negligible after 3 or 4 reflections Hence Vr2 tS Vr1...

Page 299: ...7 28 6 H 5 17 28 6 59 sin 1 04 π 43 sin 2 52 π 08 sin 1 56 π The lattice diagram is a convenient visual tool for calculating the total voltage due to reflections as described in the previous equations Two vertical lines are drawn to represent points A and B on the horizontal dimension x The vertical dimension represents time A waveform travels back and forth between points A and B of the transmiss...

Page 300: ...in Figure 10 10 be assumed Assume the fol lowing VS 3 70 H t v Z0 75 ohms ZS 30 ohms ZL 100 ohms The appropriate reflection coefficients can be calculated as follows source 30 75 30 75 0 42857 load 100 75 100 75 0 14286 X t 5tpd 3tpd tpd t 0 2tpd 4tpd 6tpd Vr6 VA rL vA Vr1 rL 2 rS vA Vr3 rL 3 rS 2 vA Vr5 rS rL vA Vr2 rS 2 rL 2 vA Vr4 A B ...

Page 301: ... 10 12 Lattice Diagram Example Impedance discontinuity problems are managed by imposing limits and control during the rout ing phase of the design Design rules must be observed to control trace geometry including spec ification of the trace width and spacing for each layer This is very important because it ensures the traces are smooth and constant without sharp turns 5tpd 2 847 V 3tpd 2 835 V tpd...

Page 302: ... the conductors or when using twisted pairs of coaxial cable in place of printed circuit traces the characteristic impedance of a backplane may change Backplane im pedance is also affected by the number of boards plugged into the backplane Need for Termination The transmission line should be terminated when the tpd exceeds one third of tr risetime If the tpd 1 3 tr rise time the line can be left u...

Page 303: ... the end of the series ter minated connection However the drop in voltage across a series terminating resistor limits load ing to maximum 10 Parallel Terminated Lines Parallel termination is achieved by placing a resistor of an appropriate value between the input of the loading device and the ground as shown in Figure 10 14 To determine an appropriate value the currents required by all inputs and ...

Page 304: ... and backplane wiring where the char acteristic impedance is not exactly defined If the designer approximates the characteristic imped ance the reflection coefficient is very small This results in minimum overshoot and ringing Parallel termination is not recommended for characteristic impedances of less than 100 ohms be cause of large DC current requirements Thevenin s Equivalent Termination This ...

Page 305: ...does not introduce any addi tional impedance from the signal to the ground The main advantage of the series termination technique apart from its reduced power consumption is its flexibility The received signal am plitude can be adjusted to match the switching threshold of the receiver simply by changing the value of the terminating resistor This is a very useful technique for interconnecting the l...

Page 306: ...dance variations and this tolerance is valuable when three state drivers are connected to backplane buses However the ter minations are costly and the signals that are produced are not as clean as other terminations A common solution is to place active terminations at both ends of the bus This helps to maintain the uniform drive levels along the entire length of the bus and it reduces EMI and ring...

Page 307: ...oad rise time 3 ns normalized to 0 to 100 L length of interconnection 9 trace micro strip e dielectric constant 5 0 H 008 W 01 T 0015 Cu 1 oz Cu thickness v 6 ns The interconnection acts as a transmission line if as was shown in Section 10 3 1 Transmission Line Effects l tr x v 8 3 x6 8 3 The value of l 9 thus the interconnection acts like a transmission line The impedance of the transmission line...

Page 308: ... source In such cases a separate termination is required for each branch To eliminate these T connections high frequency designs are routed as daisy chains Along the chain each gate provides its own impedance load thus it is necessary to distribute these loads evenly along the length of the chain Hence the impedance along the chain changes in a series of steps and it is easier to match The overall...

Page 309: ...1 Variation of current and voltage in the lines causes frequency interference This interference increases with the frequency 2 Coupling occurs when conductors are in close proximity Two types of interference are observed in high frequency circuits 1 Electromagnetic Interference EMI 2 Electrostatic Interference ESI 10 3 3 1 Electromagnetic Interference EMI Electromagnetic Interference EMI is a prob...

Page 310: ...nd the receiver 10 3 3 2 Minimizing Electromagnetic Interference When laying out a board for an Intel486 processor based system several guidelines should be followed to minimize EMI One source of EMI is the presence of a common impedance path Figure 10 22 shows a typical layout which does not have the same earth ground or the signal ground Figure 10 22 Typical Layout To reduce EMI it is necessary ...

Page 311: ...tween VCC and Ground This technique is similar to the general technique discussed earlier The goal of the previous technique was to maintain correct logic levels The design of effective coupling and bypass schemes centers on maximizing the charge stored in the circuit bypass loops while minimizing the inductances in these loops Some other precautions that can minimize the EMI are as follows Runnin...

Page 312: ...t parallel wires 10 3 3 3 Electrostatic Interference We have discussed two types of coupling namely inductive and radiative coupling which are re sponsible for creating electromagnetic interference A third known as capacitive coupling oc curs when two parallel traces are separated by a dielectric and act as a capacitor According to the standard capacitor equation the electric field between the two...

Page 313: ...ion line which is a function of the dielectric constant Also the printed circuit interconnection adds to the propagation delay of every signal on the wire These interconnections not only decrease the operating speed of the circuits but also cause re flection which produces undershoot and overshoot When the propagation delays in the circuit are significant the design must compensate for the signal ...

Page 314: ...ounted for To maintain proper logic levels all digital signal outputs have a maximum load they are capable of driving DC loading is the constant current required by an input in either the high or the low state It limits the ability of a device driving the bus to maintain proper logic levels For an Intel486 processor based system a careful analysis must be performed to ensure that in a worst case s...

Page 315: ...as 11 ns at 25 MHz and 5 ns at 33 MHz The typ ical clock timings are shown in Figure 10 24 Figure 10 24 Typical Clock Timings 10 5 2 Routing Achieving the proper clock routing around a 25 33 MHz or higher printed circuit board is del icate because problems can arise if certain design guidelines are not followed For example fast clock edges cause reflections from high impedance terminations These r...

Page 316: ...e used for distributed loads Figure 10 25 Clock Routing A less desirable method is the star connection layout in which the clock traces branch to the load as closely as possible Figure 10 26 In this layout the stubs should be kept as short as possible The maximum allowable length of the traces depends upon the frequency and the total fanout but the length of all of the traces in the star connectio...

Page 317: ...s This section explains how to perform these calculations thereby making designing with the Intel486 processor more straightforward The thermal specifications for the Intel486 processor are designed to ensure a tolerable tempera ture at the surface of the Intel486 chip This temperature called Junction Temperature Tj can be determined from external measurements using the known thermal characteristi...

Page 318: ...tiple parallel devices may be helpful in reducing θsa because if the heat input to the heat sink is dispersed rather than concentrated the effective thermal impedance is lower To approximate the case temperature for varying environments the two equations discussed ear lier should be combined by making the junction temperature the same for both resulting in the following equation Ta Tc θja θjc Pd R...

Page 319: ...10 35 PHYSICAL DESIGN AND SYSTEM DEBUGGING Figure 10 27 Typical Heat Sinks A5286 01 Spring Heat Sink PGA Frame Add conductive grease or a thermal pad ...

Page 320: ...esistance values chosen for the output buffers are at the highest specified temperature and are rising worst case values The value of the capacitors centers around the AC timing val ues for the chip For 25 MHz and above this is 50 pF Since the AC timing specifications are measured for a signal reaching 1 5 V the output buffer delay is the time that it takes for a signal to rise from 0 V to 1 5 V A...

Page 321: ... BASED SYSTEM Although an Intel486 processor based system designer should plan the entire system it is neces sary to begin building different elements of the core and begin testing them before building the final system If a printed circuit board layout has to be done the whole system may be simulated before generating the net list for the layout vendor It is advisable to work with a preliminary la...

Page 322: ...designed as shown in Chapter 4 Bus Operation This circuitry is used to generate the RESET signal for the Intel486 processor The system should be checked during reset for all of the timings The clock continues to run during these tests 3 The INT and HOLD pins should be held low deasserted The READY pin is held high to add additional delays wait states to the first cycle At this instance the Intel48...

Page 323: ...r IOPL sensitive but INTn is IOPL sensitive in Protected Mode and Virtual 8086 Mode 10 8 3 Single Step Trap The Intel486 processor supports x86 compatible single step feature If the single stepflag bit bit 8 TF is set to 1 in the EFLAG register a single step exception occurs This exception is auto vectored to exception 1 and occurs immediately after completion of the next instruction Typical ly a ...

Page 324: ...nnot occur unless the debug resisters are programmed It is possible to specify up to four breakpoint addresses by writing into debug registers The debug registers are shown in Figure 10 31 The addresses specified are 32 bit linear addresses The pro cessor hardware continuously compares the linear breakpoint addresses in DR3 DR0 with the linear addresses generated by executing software When the pag...

Page 325: ...d Do not define LENi Encoding Breakpoint Field Width Usage of Least Significant Bits in Breakpoint Address Register i i 0 3 00 1 Byte All 32 bits used to specify a single byte breakpoint field 01 2 Byte A31 A1 used to specify a two byte word aligned breakpoint field A0 in breakpoint address register is not used 10 Undefined Do not use this encoding 11 4 Byte A31 A2 used to specify a four byte dwor...

Page 326: ...read write data breakpoints The data breakpoint can be setup by writing the linear address into DRi For data breakpoints RWi can 01 M write only 11 M read write LENi 00 01 11 An instruction execution breakpoint can be setup by writing the address of the beginning of the instruction into DRi RWi must equal 00 and LENi must equal 00 for instruction execution break points If the instruction beginning...

Page 327: ...re traps 10 8 6 Debugging Overview Once the Intel486 processor based system is designed and the printed circuit board is fabricated and stuffed the next step is to debug the hardware in increments The design of a microprocessor based system can be subdivided into several phases The design starts with preparation of the system specification followed by conceptual representation in the form of block...

Page 328: ...ssor and its connections to the system The JTAG specifications with which this unit complies are documented in Standard 1149 1 1990 IEEE Standard Test Access Port and Boundary Scan Architecture and its supplement Standard 11 49 1a 1993 You can also refer to the Boundary Scan section of the individual Intel486 processor datasheets for more informa tion on using the JTAG unit ...

Page 329: ...ion 10 22 Address bus interface to I O devices 7 6 Address decoding 7 23 for I O devices 7 5 Address signals 4 1 ALU 3 14 Applications of the Intel486 processor 2 11 Assert defined 1 4 B Block diagrams 82420EX PCIset 8 20 82557 LAN Controller 7 52 lntel486 SX processor 3 3 lntelDX2 and IntelDX4 processors 3 2 peripheral subsystem example 7 17 ULP lntel486 SX and ULP Intel486 GX processors 3 4 Bloc...

Page 330: ...al 5 2 to 5 4 Cache transparency 6 16 Cache unit 3 10 Cacheable cycles 4 21 5 2 to 5 4 Chapter summaries 1 1 Chip capacitors decoupling 10 8 CHMOS IV process 10 1 Clear defined 1 4 Clock CLK signal skew 10 30 Clock considerations 10 30 to 10 32 Clock routing 10 32 Clock timings 10 31 Control registers debug 10 42 Control unit 3 14 Controllers embedded 2 12 Cross talk 10 25 Customer service 1 5 D D...

Page 331: ...erface 3 7 to 3 8 cache 3 10 control 3 14 datapath 3 14 floating point 3 15 instruction decode 3 14 instruction prefetch 3 13 integer datapath 3 14 memory management 3 5 paging 3 16 segmentation 3 15 G General purpose registers 3 14 Ground planes 10 2 to 10 3 double layer boards 10 3 to 10 5 H HALT cycle 4 41 Hardware transparency with cache 6 14 Heatsink 10 34 to 10 36 I I O cycles 7 27 I O devic...

Page 332: ...terface wit EBC 8 13 K KEN 5 2 L L2 cache see Second level cache LAN controller 82596CA 7 38 Latches 7 32 Latch up 10 30 Lattice diagram 10 16 Leaded capacitors decoupling 10 9 Level 1 cache see also Cache hit rates 6 3 Line size in cache 6 10 Literature 1 6 Literature ordering 1 6 1 7 Locked cycles 3 9 4 31 Loosely coupled multiprocessor system 2 9 LRU cache replacement 3 12 M Machine status regi...

Page 333: ... 14 machine status 3 12 4 47 notational conventions 1 4 Related documents 1 6 Restart cycles 4 43 S Second level cache 2 10 5 6 memory hierarchy 6 19 overview 6 16 to 6 18 see also Cache Sector buffering cache 6 9 Segmentation unit 3 5 3 15 Segmentation overview 2 5 Series termination 10 18 Set associative cache 6 8 Set defined 1 4 Shutdown indication cycle 4 41 Signals 82596CA coprocessor 7 42 ad...

Page 334: ... 2 7 V Vias 10 25 Virtual 8086 mode 2 5 W Wait states inserting 4 17 logic 7 22 performance considerations 9 9 signals 7 22 World Wide Web 1 5 Write buffers 3 8 in I O cycles 7 27 on chip 9 7 Write bursting 2 3 Write cycles overlapping 5 5 timings 7 31 to 7 33 Write posting 5 5 Write back cache 2 3 6 13 Write through cache 3 12 6 12 ...

Reviews: