Understanding the linux kernel Chapter 6 Timing Measurements

A2023發表於2024-03-27

Clock and Timer Circuits

Real Time Clock(RTC)

All PCs include a clock called Real Time Clock (RTC), which is independent of the CPU and all other chips.

Time Stamp Counter(TSC)

All 80×86 microprocessors include a CLK input pin, which receives the clock signal of an external oscillator. The counter is accessible through the 64-bit Time Stamp Counter (TSC) register, which can be read by means of the rdtsc
assembly language instruction.

Programmable Interval Timer(PIT)

Besides the Real Time Clock and the Time Stamp Counter, IBM-compatible PCs include another type of time-measuring device called Programmable Interval Timer (PIT ), it makes the user aware that the cooking time interval has elapsed.

tick

This time interval triggering timer interrupt is called a tick, and its length in nanoseconds is stored in the tick_nsec variable. On a PC, tick_nsec is initialized to 999,848 nanoseconds (yielding a clock signal frequency of about 1000.15 Hz), but its value may be automatically adjusted by the kernel if the computer is synchronized with an external clock(syscall adjtimex()).

CPU Local Timer

The local APIC present in recent 80 × 86 microprocessors provides yet another time-measuring device: the CPU local timer.

differents between CPU local timer and PIT

  • The APIC’s timer counter is 32 bits long, while the PIT’s timer counter is 16 bits long; therefore, the local timer can be programmed to issue interrupts at very low frequencies (the counter stores the number of ticks that must elapse before the interrupt is issued).
  • The local APIC timer sends an interrupt only to its processor, while the PIT raises a global interrupt, which may be handled by any CPU in the system.

High Precision Event Timer(HPET)

The HPET provides a number of hardware timers that can be exploited by the kernel. Basically, the chip includes up to eight 32-bit or 64-bit independent counters. Each counter is driven by its own clock signal, whose frequency must be at least 10 MHz; therefore, the counter is increased at least once in 100 nanoseconds. Any counter is associated with at most 32 timers, each of which is composed by a comparator and a match register. The comparator is a circuit that checks the value in the counter against the value in the match register, and raises a hardware interrupt if a
match is found. Some of the timers can be enabled to generate a periodic interrupt.

ACPI Power Management Timer

The Linux Timekeeping Architecture

Linux’s timekeeping architecture is the set of kernel data structures and functions related to the flow of time.

Generally, in a uniprocessor system, all time-keeping activities are triggered by interrupts raised by the global timer, while in a multiprocessor system, all general activities (such as handling of software timers) are triggered by the interrupts raised by the global timer, while CPU-specific activities (such as monitoring the execution time of the currently running process) are triggered by the interrupts raised by the local APIC timer.

Data Structures of the Timekeeping Architecture

The timer object

timer_opts

In order to handle the possible timer sources in a uniform way, the kernel makes use of a “timer object,” which is a descriptor of type timer_opts consisting of the timer name and of four standard methods.

The most important methods of the timer object are mark_offset and get_offset. The mark_offset method is invoked by the timer interrupt handler, and records in a suitable data structure the exact time at which the tick occurred. Using the saved value, the get_offset method computes the time in microseconds elapsed since the last timer interrupt (tick). Thanks to these two methods, Linux timekeeping architecture achieves a sub-tick resolution—that is, the kernel is able to determine the current time with a precision much higher than the tick duration. This operation is called
time interpolation.

cur_timer

The cur_timer variable stores the address of the timer object corresponding to the “best” timer source available in the system. During kernel initialization, the select_timer() function sets cur_timer to the address of the appropriate timer object.

The jiffies variable

The jiffies variable is a counter that stores the number of elapsed ticks since the system was started. It is increased by one when a timer interrupt occurs.

You might wonder why jiffies has not been directly declared as a 64-bit unsigned long long integer on the 80 × 86 architecture. The answer is that accesses to 64-bit variables in 32-bit architectures cannot be done atomically. Therefore, every read operation on the whole 64 bits requires some synchronization technique to ensure that the counter is not updated while the two 32-bit half-counters are read; as a consequence, every 64-bit read operation is significantly slower than a 32-bit read operation.

get_jiffies_64()
unsigned long long get_jiffies_64(void)
{
unsigned long seq;
unsigned long long ret;
do {
seq = read_seqbegin(&xtime_lock);
ret = jiffies_64;
} while (read_seqretry(&xime_lock, seq));
return ret;
}
//protect by seqlock *xtime_lock*

//Conversely, the critical region increasing the jiffies_64 variable must be protected
//by means of write_seqlock(&xtime_lock) and write_sequnlock(&xtime_lock). Notice
//that the ++jiffies_64 instruction also increases the 32-bit jiffies variable, because
//the latter corresponds to the lower half of jiffies_64.

The xtime variable

The xtime variable stores the current time and date; it is a structure of type timespec having two fields:

  • tv_sec
    Stores the number of seconds that have elapsed since midnight of January 1, 1970 (UTC).
  • tv_nsec
    Stores the number of nanoseconds that have elapsed within the last second.

The xtime_lock seqlock avoids the race conditions that could occur due to concurrent accesses to the xtime variable. Remember that xtime_lock also protects the jiffies_64 variable; in general, this seqlock is used to define several critical regions of the timekeeping architecture.

Timekeeping Architecture in Uniprocessor Systems

Initialization phase

time_init()

  • Initializes the xtime variable.
  • Initializes the wall_to_monotonic variable. This variable is of the same type timespec as xtime, and it essentially stores the number of seconds and nanoseconds to be added to xtime in order to get a monotonic (ever increasing) flow of time.
  • Initialize high privilege timer objecter if supported(HPET\PIT).
  • Invokes select_timer() to select the best timer source available in the system, and sets the cur_timer variable to the address of the corresponding timer object.
  • Invokes setup_irq(0,&irq0) to set up the interrupt gate corresponding to IRQ0—the line associated with the system timer interrupt source (PIT or HPET).

The timer interrupt handler

The timer_interrupt() function is the interrupt service routine (ISR) of the PIT or of the HPET; it performs the following steps:

  • Protects the time-related kernel variables by issuing a write_seqlock() on the xtime_lock seqlock.
  • Executes the mark_offset method of the cur_timer timer object.
  • Invokes the do_timer_interrupt() function, which in turn performs the following actions:
    • Increases by one the value of jiffies_64. Notice that this can be done safely, because the kernel control path still holds the xtime_lock seqlock for writing.
    • Invokes the update_times() function to update the system date and time and to compute the current system load;
    • Invokes the update_process_times() function to perform several time-related accounting operations for the local CPU.
    • Invokes the profile_tick() function.
    • If the system clock was synchronized with an external clock(called adjtimex()), invokes the set_rtc_mmss( ) function once every 660 seconds (every 11 minutes) to adjust the Real Time Clock.
  • Releases the xtime_lock seqlock by invoking write_sequnlock().
  • Returns the value 1 to notify that the interrupt has been effectively handled.

Timekeeping Architecture in Multiprocessor Systems

two type of timer: global timer and local timer

In Linux 2.6, global timer interrupts—raised by the PIT or the HPET—signal activities not related to a specific CPU, such as handling of software timers and keeping the system time up-to-date. Conversely, a CPU local timer interrupt signals timekeeping activities related to the local CPU, such as monitoring how long the current process has been running and updating the resource usage statistics.

Initialization phase

the initialization of global timer done by time_init() same as in uniprocessor.

initialize local timer
1.The Linux kernel reserves the interrupt vector 239 (0xef) for local timer interrupts.
2.The calibrate_APIC_clock() function computes how many bus clock signals are received by the local APIC of the booting CPU during a tick (1 ms). This exact value is then used to program the local APICs in such a way to generate one local timer interrupt every tick. and all local APIC timers are synchronized because they are based on the common bus clock signal. This means that the value computed by calibrate_APIC_clock() for the boot CPU is also good for the other CPUs in the system.

The global timer interrupt handler

differ from uniprocessor:

  • writes into a port of the I/O APIC chip to acknowledge the timer IRQ.(since it triggered by APIC)
  • The update_process_times() and profile_tick() function is not invoked, because this function performs actions related to a specific CPU.

The local timer interrupt handler

do the work profiling the kernel code and checking how long the current process has been running on a given CPU.

smp_apic_timer_interrupt
apic_timer_interrupt() -> smp_apic_timer_interrupt

  • Gets the CPU logical number (say, n).
  • Increases the apic_timer_irqs [Number of occurrences of local APIC timer interrupts] field of the nth entry of the irq_stat array.
  • Acknowledges the interrupt on the local APIC.
  • Calls the irq_enter() function.
  • Invokes the smp_local_timer_interrupt() function.
  • Calls the irq_exit() function.

smp_local_timer_interrupt()

  • Invokes the profile_tick() function.
  • Invokes the update_process_times() function to check how long the current process has been running and to update some local CPU statistics.

Updating the Time and Date

The update_times( ) function, which is invoked by the global timer interrupt handler, updates the value of the xtime variable as follows:

update_times
void update_times(void)
{
unsigned long ticks;
ticks = jiffies - wall_jiffies;
if (ticks) {
wall_jiffies += ticks;
update_wall_time(ticks);
}
calc_load(ticks);
}

timer interrupts can be lost, for instance when interrupts remain disabled for a long period of time; in other words, the kernel does not necessarily update the xtime variable at every tick. However, no tick is definitively lost, and in the long run, xtime stores the correct system time. The check for lost timer interrupts is done in the mark_offset method of cur_timer;

Updating System Statistics

Updating Local CPU Statistics

performs the following steps:

  • Checks how long the current process has been running. Depending on whether the current process was running in User Mode or in Kernel Mode when the timer interrupt occurred, invokes either account_user_time( ) or account_system_time( ).
    • Updates either the utime field (ticks spent in User Mode) or the stime field (ticks spent in Kernel Mode) of the current process descriptor.
    • Checks whether the total CPU time limit has been reached; if so, sends SIGXCPU and SIGKILL signals to current.
    • Invokes account_it_virt() and account_it_prof() to check the process timers.
    • Updates some kernel statistics stored in the kstat per-CPU variable.
  • Invokes raise_softirq( ) to activate the TIMER_SOFTIRQ tasklet on the local CPU.
  • If some old version of an RCU-protected data structure has to be reclaimed, checks whether the local CPU has gone through a quiescent state and invokes tasklet_schedule( ) to activate the rcu_tasklet tasklet of the local CPU.
  • Invokes the scheduler_tick( ) function, which decreases the time slice counter of the current process, and checks whether its quantum is exhausted.

Keeping Track of System Load

At every tick, update_times() invokes the calc_load() function, which counts the number of processes in the TASK_RUNNING or TASK_UNINTERRUPTIBLE state and uses this number to update the average system load.(calculate "CPU load average")

Profiling the Kernel Code

Linux includes a minimalist code profiler called readprofile used by Linux developers to discover where the kernel spends its time in Kernel Mode.The **readprofile **command uses the /proc/profile information to print ascii data on standard output.

The profile_tick() function collects the data for the code profiler.To enable the code profiler, the Linux kernel must be booted by passing as a parameter the string profile=N, where 2N denotes the size of the code fragments to be profiled. The collected data can be read from the /proc/profile file.

The Linux 2.6 kernel includes yet another profiler called oprofile. Besides being more flexible and customizable than readprofile, oprofile can be used to discover hot spots in kernel code, User Mode applications, and system libraries. When oprofile is being used, profile_tick() invokes the timer_notify() function to collect the data used by
this new profiler.

Checking the NMI Watchdogs

In multiprocessor systems, Linux offers yet another feature to kernel developers: a watchdog system, which might be quite useful to detect kernel bugs that cause a system freeze. To activate such a watchdog, the kernel must be booted with the nmi_watchdog parameter.

The watchdog is based on a clever hardware feature of local and I/O APICs: they can generate periodic NMI interrupts on every CPU. Because NMI interrupts are not masked by the cli assembly language instruction, the watchdog can detect deadlocks even when interrupts are disabled.

Software Timers and Delay Functions

Linux considers two types of timers called dynamic timers and interval timers. The first type is used by the kernel, while interval timers may be created by processes in User Mode.

since checking for timer functions is always done by deferrable functions that may be executed a long time after they have been activated, the kernel cannot ensure that timer functions will start right at their expiration times. It can only ensure that they are executed either at the proper time or after with a delay of up to a few hundreds of milliseconds. For this reason, timers are not appropriate for real-time applications in which expiration times must be
strictly enforced.

Besides software timers, the kernel also makes use of delay functions, which execute a
tight instruction loop until a given time interval elapses.

Dynamic Timers

Dynamic timers may be dynamically created and destroyed. No limit is placed on the number of currently active dynamic timers.A dynamic timer is stored in the following timer_list structure:

timer_list
struct timer_list {
struct list_head entry;
unsigned long expires;
spinlock_t lock;
unsigned long magic;
void (*function)(unsigned long);
unsigned long data;
tvec_base_t *base;
};

The entry field is used to insert the software timer into one of the doubly linked circular lists that group together the timers according to the value of their expires field.

To create and activate a dynamic timer, the kernel must:

  • Create, if necessary, a new timer_list object.
  • Initialize the object by invoking the init_timer(&t) function. This essentially sets the t.base pointer field to NULL and sets the t.lock spin lock to “open.”
  • Load the function field with the address of the function to be activated when the timer decays. If required, load the data field with a parameter value to be passed to the function.
  • If the dynamic timer is not already inserted in a list, assign a proper value to the expires field and invoke the add_timer(&t) function to insert the t element in the proper list.
  • Otherwise, if the dynamic timer is already inserted in a list, update the expires field by invoking the mod_timer( ) function, which also takes care of moving the object into the proper list.

In Linux 2.6, a dynamic timer is bound to the CPU that activated it—that is, the timer function will always run on the same CPU that first executed the add_timer() or later the mod_timer() function. The del_timer() and companion functions, however, can deactivate every dynamic timer, even if it is not bound to the local CPU.

Dynamic timers and race conditions

case 1
The case that timer function acts on a discardable resource might run after or at the same timer with releasing of the resources.

  • in uniprocessor, need to handly del_timer before resources release.
  • in multiprocessor, del_timer_sync().
    del_timer_sync

del_timer_sync removes the timer from the list, and then it checks whether the timer function is executed on another CPU; in such a case, del_timer_sync() waits until the timer function terminates.

The del_timer_sync() function is rather complex and slow, because it has to carefully take into consideration the case in which the timer function reactivates itself. If the kernel developer knows that the timer function never reactivates the timer, she can use the simpler and faster del_singleshot_timer_sync() function to deactivate a timer and wait until the timer function terminates.

case 2

The implementation of the timer functions is made SMP-safe by means of the lock spin lock included in every timer_list object.(e.g. concurrent access of mod_timer())

Data structures for dynamic timers

The main data structure for dynamic timers is a per-CPU variable named tvec_bases: it includes NR_CPUS elements,
one for each CPU in the system. Each element is a tvec_base_t structure, which includes all data needed to handle the dynamic timers bound to the corresponding CPU:

tvec_base_t
typedef struct tvec_t_base_s {
spinlock_t lock;
unsigned long timer_jiffies;
struct timer_list *running_timer;
tvec_root_t tv1;
tvec_t tv2;
tvec_t tv3;
tvec_t tv4;
tvec_t tv5;
} tvec_base_t;

tvec_root_t and tvec_t

The tv1 field is a structure of type tvec_root_t, which includes a vec array of 256 list_head elements—that is, lists of dynamic timers.

running_timer

In multiprocessor systems, the running_timer field points to the timer_list structure of the dynamic timer that is currently handled by the local CPU.

timer_jiffies
the timer_jiffies is the value of jiffies last TIMER_SOFTIRQ excuted + 1.

The timer_jiffies field represents the earliest expiration time of the dynamic timers yet to be checked: if it coincides with the value of jiffies, no backlog of deferrable functions has accumulated; if it is smaller than jiffies, then lists of dynamic timers that refer to previous ticks must be dealt with. The field is set to jiffies at system startup and is increased only by the run_timer_softirq() function.

Dynamic timer handling

TIMER_SOFTIRQ softirq is used to handle software timers.

run_timer_softirq

The run_timer_softirq() function is the deferrable function associated with the TIMER_SOFTIRQ softirq.

  • Stores in the base local variable the address of the tvec_base_t data structure associated with the local CPU.
  • Acquires the base->lock spin lock and disables local interrupts.
  • Starts a while loop, which ends when base->timer_jiffies becomes greater than the value of jiffies.
    • Computes the index of the list in base->tv1 that holds the next timers to be handled(first 8bit of timer_jiffies).
    • refresh tvec_root_t and tvec_t.
    • Increases by one base->timer_jiffies.
    • execute timers:
      • Removes t from the base->tv1’s list.
      • In multiprocessor systems, sets base->running_timer to &t.
      • Sets t.base to NULL.
      • Releases the base->lock spin lock, and enables local interrupts.
      • Executes the timer function t.function passing as argument t.data.
      • Acquires the base->lock spin lock, and disables local interrupts.
      • Continues with the next timer in the list, if any.
    • All timers in the list have been handled. Continues with the next iteration of the outermost while cycle.
  • The outermost while cycle is terminated, which means that all decayed timers have been handled. In multiprocessor systems, sets base->running_timer to NULL.
  • Releases the base->lock spin lock and enables local interrupts.

if a timer interrupt occurs while run_timer_softirq( ) is being executed, dynamic timers that decay at this tick occurrence are also considered, because the jiffies variable is asynchronously increased by the global timer interrupt handler.

An Application of Dynamic Timers: the nanosleep() System Call

p250

Delay Function

Software timers are useless when the kernel must wait for a short time interval.For instance, often a device driver has to wait for a predefined number of microseconds until the hardware completes some operation. Because a dynamic timer has a significant setup overhead and a rather large minimum wait time (1 millisecond), the device driver cannot conveniently use it.

In these cases, the kernel makes use of the udelay() and ndelay() functions: the former receives as its parameter a time interval in microseconds and returns after the specified delay has elapsed; the latter is similar, but the argument specifies the delay in nanoseconds.

udelay and ndelay
void udelay(unsigned long usecs)
{
unsigned long loops;
loops = (usecs*HZ*current_cpu_data.loops_per_jiffy)/1000000;
cur_timer->delay(loops);
}
void ndelay(unsigned long nsecs)
{
unsigned long loops;
loops = (nsecs*HZ*current_cpu_data.loops_per_jiffy)/1000000000;
cur_timer->delay(loops);
}

Both functions rely on the delay method of the cur_timer timer object.

one "loop" of delay()

  • If cur_timer points to the timer_hpet, timer_pmtmr, and timer_tsc objects, one “loop” corresponds to one CPU cycle—that is, the time interval between two consecutive CPU clock signals.
  • If cur_timer points to the timer_none or timer_pit objects, one “loop” corresponds to the time duration of a single iteration of a tight instruction loop.

loops_per_jiffy

records how many “loops” fit in a tick. set by calibrate_delay() during the initalization phase.

syscall modify the time\date and create timers.

The time( ) and gettimeofday( ) System Calls

  • Acquires the xtime_lock seqlock for reading.
  • Determines the number of microseconds elapsed since the last timer interrupt by invoking the get_offset method of the cur_timer timer object.(get time from timer Circuits)

The adjtimex() System Call

The setitimer() and alarm() System Calls

System Calls for POSIX Timers

相關文章