Price 6325 + VAT
DURATION 4 Days

Prerequisite:

• Knowledge of ARM Architecture V7, particularly the LPAE

Skills Gained: After completing this training, you will be able to:

After completing this training, you will be able to:
• This course aims to highlight the new features offered by the V8 architecture
• It has been developed for engineers developing low level software
• First, an overview of Cortex-A53 is provided, to highlight the differences between a Cortex-A15/Cortex-A7 hardware platform based on CCI-400 and a Cortex-A57/Cortex-A53 hardware platform based on CCN-504
• The new exception mechanism is described
• The enhancements regarding the LPAE are detailed
• New A64 assembler instructions are explained through practical examples
• The AAPCS64 is also covered
• The course also details the new debug ARM V8 features
Cortex-A53 hardware implementation is explained, particularly the low power states Documentation

Lab Description:

• Labs are based on Linaro AArchV8 environment

Course Outline:

1. OVERVIEW OF CORTEX-A53 [1-hour]
• Block diagram
• Memory interface that implements either an ACE or CHI interface
• Optional ACP to connect a coherent DMA
• Coherent interface, studying examples of hardware coherency within a Cluster and between Clusters
• SoC architecture based on CCN-504 interconnect
• Implementation options

2. INTRODUCTION TO ARM ARCHITECTURE V8 [1-hour]
• Enhancement with regard to AArchv7
• General-purpose register file and the stack pointer in AArch64 state
• Register mapping between A32/T32 and A64
• Mapping of AArch64 System registers to the AArch32 System registers
• Process state, PSTATE

3. ARMv8 EXCEPTION [2 -hour]
• Four exception levels
• Definition of a precise exception
• Definitions of synchronous and asynchronous exceptions
• Exception Link Registers
• Current Exception Level register
• Register banking by exception level based on a new exception model
• Nesting on the same exception level
• Exception type and exception origin
• SPSR_EL1-3 when exception taken from AArch32 / Aarch64
• Syndrome registers used to provide a status information to the exception handler
• Status information when an Abort occurs
• Clarifying the differences between AArch32 and AArch64
• Exception return instruction
• Reset, requesting a warm reset

4. The ARMv8-A security model [1-hour]
• TrustZone implementation
• Security model when EL3 is using AArch64
• Trapping to EL3 using AArch64
• Disabling EL3, EL2, and EL1 execution of SMC instructions
• Cortex-A53 registers affected by CP15SDISABLE

5. INTERPROCESSING [1-hour]
• Managing two types of processes: 64-bit and 32-bit, switching on an exception
• Non secure space organization
• Saving the A32/T32 state when taking an exception leading to A64

6. VIRTUALIZATION  [1-hour]
• Similar implementation as V7-A with virtualization extensions
• The effect of implementing EL2 on the Exception model
• Virtual interrupts
• Trapping to EL2 using AArch64

7. INSTRUCTION PIPELINE [1-hour]
• In-order pipeline
• Predicted and non-predicted instructions
• Branch Target Instruction Cache
• Branch Target Address Cache
• Branch Predictor
• Return stack
• Branch accelerators invalidation and context switches
• ISB instruction

8. MULTICORE [1-hour]
• Atomicity in the ARM architecture
• Synchronization and semaphores
• Shareability memory attributes
• Operation of the global monitor
• Load acquire / Store release instruction pair
• Use of WFE and SEV instructions by spin-locks
• Send Event Local instruction

9. MEMORY ACCESSES [1 -hour]
• Mixed-endian support
• Program counter and stack pointer alignment
• Observability and completion
• Ordering requirements
• Page attributes : Normal or Device
• Gathering, Reordering, Early Write Acknowledgement hint attributes
• Mismatched memory attributes
• Shareability and access limitations on the data barrier operations
• Memory barriers
• Cacheability, cache allocation hints, and cache transient hints

10. ARMv8 MMU SUPPORT [4-hour]
• LPAE enhancements to adapt to AArch64
• Supporting up to 48 bits of VA per TTBR
• Upper 8 bits of address configured for Tagged Pointers
• Access permission checking
• Hierarchical control of data access permissions
• Determining the valid address range
• Supporting up to 48 bits of IPA and PA spaces
• VMSAv8-64 address translation system
• Controlling address translation stages
• Memory translation granule size : 4-, 16- or 64-KB
• Descriptor page table organization, descriptor format
• Selection between TTBR0 and TTBR1
• Security state of translation table lookups
• Hierarchical control of Secure or Non-secure memory accesses
• Combining the stage 1 and stage 2 attributes
• TLB preload instructions
• TLB maintenance instructions in A64
• Cortex-A53 TLB implementation
• Detail of L1 and L2 TLBs of Cortex-A53
• Address Translation instructions in A64, perform stage 1 and 2 address translations as defined for EL0-3
• MMU faults
• Cortex-A53 Intermediate table walk caches

11. CACHES [2-hour]
• Cache ID registers
• Cache hierarchy, Point of Unification, Point of Coherency
• Cache preload instructions
• Load non temporal instruction
• Instruction and Data cache maintenance instructions in A64
• Data cache zero instruction
• Cortex-A53 L1 and L2 memory system
• Cache organization, replacement policies
• VIPT instruction cache
• Allocate policies
• Non-cacheable streaming enhancement
• L2 cache strictly-enforced inclusion property with L1 data caches
• L2 cache prefetcher
• Cache coherency, Snoop filter support12. A64 NEW INSTRUCTION SET [3-hour]
• A64 assembly language, regular bit encoding structure
• Instruction aliases
• Condition flags usage
• Instructions that operate directly on various PSTATE elements
• New instructions to support 64-bit operands
• X30 general-purpose register used as the procedure call link register
• Branches, function call and return
• Conditional select instructions, avoiding branches
• Load Store instructions, addressing modes
• Arithmetic and logical instructions, CRC calculation instructions
• Accessing a global data, ADRP instruction, no need for a literal pool
• Exception generating instructions
• Instructions for accessing AArch32 Execution environment registers
• System instruction encodings for System register accesses13. ARM ARCHITECTURE PROCEDURE CALL STANDARD 64-bit [1-hour]
• Floating point half precision format
• General register usage convention
• Stack pointer and frame pointer
• Stack pointer alignment to a 16-byte boundary is configurable at EL1
• NEON / V FP register usage convention14. NEON, VFP AND CRYPTOGRAPHIC UNITS [3-hour]
• New register banking for NEON and VFP
• Mapping of the SIMD and floating-point registers between the Execution states
• Vector formats in AArch64 state
• Scalar floating-point, FPCR/FPSR registers
• Floating-point minimum and maximum instructions
• New SIMD instructions
• Cryptography software support through a new family of instructions
• Acceleration of encryption and decryption to support the following: AES, SHA1, SHA2-256
• Large polynomial multiplies

15. GICv3  [2-hour]
• System register interface to the GIC CPU interface
• Generic Interrupt Controller CPU interface registers
• GIC CPU interface that implements an AXI4-Stream interface
• Interrupt virtualization
• Interrupt sources
• Interrupt priority levels
• Interrupt handling to support nesting

16. GENERIC TIMER  [1-hour]
• System counter clock frequency
• Physical and virtual timer count registers
• Physical up-count comparison, down-count value and timer control registers
• Virtual up-count comparison, down-count value and timer control registers
• Event streams

17. LOW POWER STATES  [2 -hour]
• Wait for Interrupt and Wait for Event
• WFE wake-up events in AArch64 state
• Cortex-A53 low power modes
• L2 Wait for Interrupt
• Processor dynamic retention
• Support for power management with multiple power domains
• Individual processor powerdown
• Separate power domain for Advanced SIMD and Floating-Point
• Multiprocessor powerdown without system driven L2 flush
• Dormant mode
• Debug powerdown

18. ARMV8 DEBUG  [2-hour]
• Self-hosted debug
• Debug related instructions: BKPT, DBG, HLT
• Debug state instructions
• Debug related exceptions, PSTATE debug mask bit
• Linked comparisons for Breakpoint/Watchpoint exception generation
• Software Step exceptions
• Routing debug exceptions
• Debug Communications Channel
• External debug, cross-triggering
• External access permissions
• Embedded Trace Macrocell architecture

19. PERFORMANCE MONITOR  [1-hour]
• 64-bit cycle counter
• Per-function performance monitoring at EL0 level
• Effect of EL3 and EL2 on Performance Monitor
• Event filtering
• Recommended Memory-mapped Interfaces to the Performance Monitors
• Event number and mnemonic

20. CORTEX-A53 HARDWARE IMPLEMENTATION [1-hour]
• Clocking
• Resets, WARMRSTREQ and DBGRSTREQ
• ACE and CHI coherent interconnect interfaces configuration through input signals to change the interface behavior

Close Menu