Exam 1 study guide

The one-hour study guide for exam 1

Paul Krzyzanowski

Latest update: Tue May 5 15:51:06 EDT 2015

Disclaimer: This study guide attempts to touch upon the most important topics that may be covered on the exam but does not claim to necessarily cover everything that one needs to know for the exam. Finally, don't take the one hour time window in the title literally.

Introduction

Go here for lecture notes

Goal: Understand the basic goal of an operating system, some of its history, and the design principle of separating mechanism from policy.

Some introductory terms:

  • Batch systems: operating systems that would run a program to completion and then load and run the next program.

  • Multiprogramming: keep several programs in memory at once and switch between them. The goal is throughput and not interactivity.

  • Time-sharing: multiprogramming with preemption, where the system may stop one process from executing and starts another.

  • Portable operating system: an operating system that is not tied to a specific hardware platform. UNIX was one of the early portable operating systems. Windows NT was also written to be architecture-independent.

  • Microkernel operating system: an operating system that provides just the bare essential mechanisms that are necessary to interact with the hardware and manage threads and memory. Higher-level operating system functions, such as managing file systems, the network stack, or scheduling policies, are delegated to user-level processes that communicate with the microkernel via interprocess communication mechanisms (usually messages).

A mechanism is the presentation of a software abstraction: the functional interface. It defines how to do something. A policy defines how that mechanism behaves (e.g., enforce permissions, time limits, or goals). Good design dictates that we keep mechanisms and policies separate.

Booting an operating system

Go here for lecture notes

Goal: How does the operating system get loaded into the computer’s memory so it can run?

The boot loader is a small program that is run at boot time that loads the operating system. Boot loaders are sometimes broken into several stages. A multi-stage boot loader starts off with a simple boot loader that then loads a more sophisticated boot loader that then loads the operating system. This cascaded boot sequence is called chain loading.

On a classic PC architecture, the computer starts running code in the system’s BIOS on startup. This BIOS code performs a power-on test, identifies and initializes some hardware, and then loads the first block of the disk, which is known as the master boot record, or MBR. The MBR contains a table identifying disk partitions and code that will run and load the Volume Boot Record (VBR), which is the first disk block of the designated boot partition. The VBR contains code to read successive consecutive blocks to load additional blocks that comprise the second stage boot loader. This boot loader may present the user with a choice of operating systems to load and will have the intelligence to read those files from the file system.

The Universal Extensible Firmware Interface, UEFI, is a successor to the BIOS and contains a boot manager that can directly boot files that are placed into an UEFI boot partition. It has the intelligence to parse several file system types so that it can load a file that does not necessarily occupy consecutive disk blocks. The files that are loaded from the UEFI partition are often boot loaders for individual operating systems that then load in the file or files that comprise the operating system.

Operating Systems: essential concepts

Go here for lecture notes

Goal: Enable the operating system to run with special privileges that normal processes do not have by switching between user mode and kernel mode. Create a system call mechanism to enable a user process to request operations from the operating system Create timer interrupts to ensure the operating system can regain control periodically. Identify how an operating system interacts with devices.

An operating system does a number of things. It is a program that loads and runs other programs. It provides programs with a level of abstraction so they don’t have to deal with the details of accessing hardware. It also manages access to resource, including the CPU (via the scheduler), memory (via the memory management unit), persistent files (via the file system), a communications network (via sockets and IP drivers), and devices (via device drivers).

The operating system runs with the processor in kernel mode (also called privileged, supervisor, or system mode). In this mode, the processor can execute privileged instructions that define interrupt vectors, enable or disable system interrupts, interact with I/O ports, set timers and manipulate memory mappings. Programs other than the kernel run with the processor in user mode and do not have privileges to execute these instructions. A processor running in user mode can switch to kernel mode by executing a trap instruction, also known as a software interrupt. Some processors also offer explicit syscall instructions This is a faster mechanism since it does not need to read the branch address from an interrupt vector table that is stored in memory but keeps that address in a CPU register. Either of these approaches switches the processor to kernel mode, saves the current program counter on the stack, and transfers execution to a predefined address for that trap. Control is switched back to user mode and the location right after the trap by executing a return from exception instruction.

Programs interact with operating systems via system calls. A system call uses the trap mechanism to switch control to operating system code running in kernel mode. The operating system can either service the request immediately or may need to put the process in a waiting state because it requires data that isn’t ready (such as a disk block or network message). If an operating system decides that a process has been executing long enough, it may preempt that process to let another process run.

To allow the operating system to get control at regular intervals, a programmable interval timer can be configured to generate periodic hardware interrupts, for example every 10 milliseconds. When the timer generates the interrupt, control is transferred to an interrupt handler in the kernel and the processor is switched to kernel mode. When the operating system is ready to return back, it issues a return from exception instruction. An interrupt or trap results in a mode switch, where execution is transferred from user mode to kernel mode. If the operating system decides it is time to replace the currently executing processes with another process, it will save the current process’ context and restore the saved context of another process. The context includes the values of a processor’s registers, program counter, stack pointer, and memory mappings. Saving the context of one process and restoring that of another is called a context switch.

The operating system is responsible for managing system devices. The three categories of devices are:

  • character devices: any device whose data that can be thought of as a byte stream. This includes keyboard input, mouse movements, printer output, camera inputs, etc.

  • block devices: any device that has persistent storage that is randomly addressable and read or written in fixed-size chunks (blocks). These devices include disks and flash memory. Because the data is persistent and addressable, it can be cached by the operating system so that future requests for cached content may be satisfied from system memory instead of accessing the device again. This cache of frequently-used blocks of data is called the buffer cache. Basically, any device that can be used to hold a file system is a block device.

  • network devices: packet-based communications networks.

Devices can be controlled by mapping their control and data registers onto memory. This is called memory-mapped I/O. The processor can get device status notifications (e.g., data is ready) either via a hardware interrupt or by periodically polling the hardware (e.g., when the interval timer goes off). Data can be transferred between the device and system via software that reads/writes the device’s memory. This is called programmed I/O (PIO). Alternatively, the device may have access to the system’s memory bus and can use direct memory access (DMA) to transfer data to/from system memory without using the processor.

Processes

Go here for lecture notes

Goal: What is a running program and what states can it take? How does the operating system allow user programs to manage processes?

A process is a program in execution. It comprises the state of the processor’s registers and the memory map for the program. The memory map contains several regions that the operating system allocated and initialized when the process was first created. These regions are:

  • text: the machine instructions

  • data: initialized static and global data

  • bss: uninitialized static data that was defined in the program (e.g., global uninitialized strings, numbers, structures)

  • heap: dynamically allocated memory (obtained through memory allocation requests)

  • stack: the call stack, which holds not just return addresses but also local variables, temporary data, and saved registers

Text and data are present in the stored program. The size of the bss segment is also fixed and known in the stored program. The heap and stack are dynamically allocated. A process may be in one of three states:

  • running: the process is currently executing code on the CPU
  • ready: the process is not currently executing code but is ready to do so if the operating system would give it the chance
  • blocked (or waiting): the process is waiting for some event to occur (such as a requested I/O operation to complete) and will not ready to execute instructions until this event is ready.

The operating system’s process scheduler is responsible for moving processes between the ready and running states. A system where the operating system can save the context of a running program and restore the context of a ready program to give it a chance to run is called a preemptive multitasking system. Just about every operating system uses this today. Systems that allow program to run until it terminates or blocks on I/O are called non-preemptive or cooperative systems.

The operating system keeps track of all processes via a list. Each element in that list is a pointer to a data structure that contains information about one process. This data structure is called a process control block (PCB). The process control block stores all pertinent information about the process, including its state, saved registers, memory map, owner ID, parent process, child processes, and open files. Each process is uniquely identified with a process ID (PID).

The basic process manipulation capabilities that POSIX systems (Unix, OS X, Linux, *BSD, etc.) offer include:

fork
fork creates a new process. The new process is a copy of the parent. It is running the same program and has the same open files. It is, however, a copy and not a reference to the parent. Any memory modifications or changes to open file status will be invisible to the parent and vice versa. After a call to fork, both the parent and child continue execution at the same place in the program. They can identify their role by the return value from fork. The parent gets the process ID of the child while the child gets a return value of 0.
execve
execve does not create a new process. It loads a new program from the file system to overwrite the current process, reinitializing the process’ memory map. It is often used immediately after fork to allow have the child process created by fork to load and run a new program.
exit
exit tells the operating system to terminate the current process.
wait
wait allows a parent to wait for and detect the termination of child processes. Once a process exits, the operating system can reclaim the memory that it used. However, its PCB entry in the process list is not deleted so that the parent process can read that process’ exit code and execution statistics, which are stored in the PCB. Once the parent gets this data via a call to wait, this remaining process state can be cleaned up. A process in this state (exited but with an active PCB entry) is called a zombie.
signal
signal allows a process to detect signals, which include software exceptions, user-defined signals, and death of child signals.
kill
kill sends a signal to a specified process. The signal can usually be detected with signal. It does not kill the process unless the signal is a kill (or terminate) signal.

The operating system starts one process initially. On UNIX/Linux systems, it is called init. This process reads configuration files and forks other processes to run other programs. If a process dies and the parent exits without detecting the child’s death via a wait call, that child process is inherited by init.

Threads

Go here for lecture notes

Goal: How can an operating system manage multiple threads of execution within one program?

A thread can be thought of as the part of a process that is related to the execution flow. A process may have one or more threads of execution. A process with more than one thread is said to be multithreaded. A thread-aware operating system keeps track of processes like we mentioned above: via a list of process control blocks. Each PCB now keeps a list of threads for that process (a process will have to have at least one thread). Thread-specific information (registers, program counter, stack pointer, priority) is stored within a thread control block (TCB). All other information that is shared among threads in a process is stored within the common PCB. Note that even though each thread gets its own stack, the stacks all reside in memory that is accessible to all threads within the process.

One advantage of threads is that creating threads and switching among threads in a process is more efficient than creating processes and context switching among processes. Threading also makes certain types of programming easier since all memory is shared among threads. Moreover, on multi-core or multiprocessor systems, the operating system can take advantage of the hardware architecture and schedule threads onto multiple processor cores to allow the threads to run in parallel.

Threads can be implemented in, and managed by, the operating system. These are known as kernel-level threads. It is also possible to create a thread library that will create and manage threads within a process. The library will save and restore registers and manage switching stacks among threads. It may ask the operating system for periodic software interrupts to enable it to preempt threads. To the operating system, this appears as one single-threaded process. These are called user-level threads. One caveat with user-level threads is that if one thread blocks on a system call then the entire process is blocked and no threads within that process get to run. Most operating systems offer non-blocking versions of system calls that a threading library can use to simulate blocking system calls.

The advantage of user-level threads is that they can be even more efficient than kernel-level threads since there is no need to even do a mode switch into the kernel. The threading library can also have its own thread scheduling algorithm that is optimized to a specific job. A disadvantage is that user-level threads cannot take advantage of multiprocessors. Threading models can be combined and user-level thread libraries can be used with operating systems that offer kernel threads. In the general case, hybrid threading maps N user-level threads onto M kernel-level threads.

The operations for creating and handling threads are conceptually similar to those for processes. Since threads share the same program and memory map, there is no concept of an execve operation. When a new thread is created, it starts execution at a given function in the program. When it returns from that function, the thread is destroyed. A thread can also exit explicitly via a system call. Instead of a wait operation, where a parent process waits for the termination of a child process, threads support a join operation, where any one thread within a process can wait for any other thread to terminate. There is no parent-child relationship in threads.

In Linux, there is no distinction between processes and threads. Linux calls them all tasks. Linux provides a clone system call that allows one to implement the behavior of threads. Clone operates like fork in that it creates a running copy of the current process. However, it allows the caller to specify what, if any, data should remain in common with the parent. For example, memory, open files descriptors, and signals could be shared to provide thread-like behavior.

Because threads access the same memory, the potential for conflicts is greater than it is for processes, which do not share memory unless they explicitly request a shared memory segment. Access to shared data needs to be properly controlled to ensure mutual exclusion.

Synchronization

Go here for lecture notes

Goal: How do we ensure that one thread can stay out of the way of another thread?

Threads (or processes) are concurrent if they exist at the same time. They are asynchronous if they need to synchronize with each other occasionally. They are independent if they have no dependence on each other: it does not affect a thread whether another thread exists or not. Threads are synchronous if they synchronize frequently so that their relative order of execution is guaranteed.

A race condition is a bug where the outcome of concurrent threads is dependent on the precise sequence of execution (particularly, the interleaving of executing two threads). A race condition typically occurs when two or more threads try to read, write, and possibly make decisions based on the contents of memory that they are accessing concurrently. The regions of a program that try to access shared resources and cause race conditions are called critical sections. To avoid race conditions, we want to make sure that only one thread at a time can execute within a critical section. Enforcing this is called mutual exclusion. Deadlock is a condition where there is a circular dependency for resources among threads and each one is waiting on the other to release a lock (a simple case of deadlock is, “I need B before I release my lock on A” while “you need A before you release your lock on B”). Starvation is the case where the scheduler never gives a thread a chance to run. If a thread doesn’t get to run, it may not get a chance to release any resources that it is holding.

Mutual exclusion can be achieved in a number of was. Disabling interrupts when entering a critical section ensures that other threads will never get a chance to preempt you and run at all. This is a drastic solution. It also requires that you are running in kernel mode and have the privileges to disable interrupts. Moreover, doing so on a multiprocessor system will usually disable interrupts only on a single processor, allowing threads to run on other processors and access the critical section. Test and set locks (if (!locked) lock = 1) implemented in software are a simple approach but a buggy one because they introduce a race condition: a thread can be preempted between testing the value of a lock and setting it (grabbing it). This can allow multiple threads to get the same lock.

Processor manufacturers have introduced atomic instructions that can perform operations such as test-and-set as one indivisible operation, meaning that another thread cannot preempt the sequence of operations and they all execute sequentially and indivisibly. Some of these instructions are:

test-and-set
Set a memory location to 1 and return the previous value of the location. If the returned value is a 1 that means that somebody already grabbed the lock. If the value is 0 then you know that there was no lock before you executed the instruction and that you grabbed it while running the instruction.
compare-and-swap
Compare the value of a memory location with an old value that is passed to the instruction. If the two values match then write the new value into that memory location. With this instruction, we set a memory location to a specific value provided that that somebody did not change the contents since we last read them. If they did, our new value will not be set.
fetch-and-increment
Increment a memory location and return the previous value of that location. This is analogous to the deli counter number dispensing machine that gives out incrementally increasing numbers. You grab a ticket (fetch-and-increment) and wait until it’s your turn. When you’re done, you increment turn so that the thread that grabbed the next ticket gets to go.

These atomic-instruction-based mechanisms all require looping in software to wait until the lock is released. This is called busy waiting or a spinlock. It is undesirable because it keeps the threads in a ready to run state even though they have no useful work to do except loop and wait for a value to change. Moreover, if a thread that is looping on a spinlock is a high priority one, it is always ready to run. With a priority scheduler, it may always get scheduled to run, starving a low priority thread that may be the one that has the lock that needs to be released. This situation is known as priority inversion: the low priority thread needs to be given a high priority so it can exit the critical section as quickly as possible and let the high priority thread get it.

A more desirable approach than spinlocks is to have the operating system provide user processes with system calls that we can use for mutual exclusion and put processes that are waiting for a critical section into a waiting state. This way, a waiting process will not be running until it is allowed entry into the critical section.

Semaphores
A semaphore counts the number of wake-ups that are saved for future use. Each time you call down on a semaphore, you decrement its value. If the value is 0 and you call down then your thread goes to sleep. When a thread calls up on a semaphore, the operating system will wake up one of the threads that is sleeping on that semaphore. If no threads are sleeping on a semaphore then up will just increment the value of the semaphore. The most basic use of a semaphore is to initialize it to 1. When a thread want to enter a critical section, it calls down and enters the section. When another thread tries to do the same thing, the operating system will put it to sleep because the value of the semaphore is already 0 due to the previous call to down. When the first thread is finished with the critical section, it calls up, which wakes up the other thread that’s waiting to enter. In general, the semaphore is initialized to the number of concurrent threads that you want to enter a section before they block.
Event counters
An event counter works similar to the way you would use a fetch-and-increment instruction but without the spinlock since the thread can go to sleep waiting for a value. Three operations are permitted: you can read the current value of an event counter, increment it, and wait until the event counter reaches a certain value. This last operation is a blocking one.
Condition variables (also known as monitors)
A condition variable supports two operations: wait until a condition variable is signaled and signal a condition variable. Signaling a condition variable will wake up one thread that is waiting on the condition variable.
Message passing
Message passing supports two operations: send and receive. The most common behavior of message passing is where a send operation does not block and a receive operation does block. We can use the blocking aspect of receive to coordinate mutual exclusion by sending a null message ahead of time. The first thread that does a receive will get the message and then enter the critical section. The second thread will block on the receive. A message passing system where the send blocks until the data is received via receive is called rendezvous. Rendezvous is efficient when messaging is done within the machine (versus a network) in that it does not require the message to be copied to an intermediate buffer. However, it forces the senders and receivers to be tightly synchronized.

Conventional message passing requires that the sender knows the identity of the thread/process that will be receiving the message. This is known as direct addressing.

An alternate form of messaging is via mailboxes using indirect addressing. Mailboxes allow any number of threads to send to an intermediate queue, called a mailbox. Any number of threads can read from this queue. Hence, it is easy to support multiple readers and writers and nether the writers nor readers have to know how to address each other.

Scheduling

Go here for lecture notes

Goal: How does the operating system decide what process should run next? How does it decide to stop running one process and start running another one?

Processes execute for a while, then block on I/O, then execute some more, then block, etc. Each period of execution is called a CPU burst. Overall throughput is increased if another process can run when a process is in a blocked state. A scheduler is responsible for deciding what process should get to run next and, if the current process did not finish its CPU burst, whether the current process has run too long and needs to be preempted with another process. If a scheduler can preempt a process and context switch another process in then it is a preemptive scheduler. Otherwise it is a non-preemptive, or cooperative scheduler.

A time slice (also known as quantum) is the maximum time that a process will be allowed to run before it gets preempted and another process gets a chance to run. Short time slices are good for maximizing interactive performance. Long time slices reduce the overhead of context switching but reduce response time. The scheduling algorithm determines whether a process should be preempted and which process gets to run next. The dispatcher is the component of the scheduler that is responsible for restarting the chosen process. It does so by restoring the process’ context (loading saved registers and memory mapping) onto the CPU and executing a return from interrupt instruction that will cause the process to to execute from the location that was saved on the stack at the time that the program stopped running — either via an interrupt or a system call.

First-come, first served scheduling

This is a non-preemptive scheduler that uses a simple FIFO queue. New processes are added to the queue. When a process blocks on I/O, the scheduler takes the earliest (first in) process out of the queue. When a process is no longer blocking and is ready to run, it goes to the end of the queue. The drawback of FCFS is that it is non-preemptive and processes with long CPU bursts will delay other processes.

Round robin scheduling

This is a preemptive version of first-come, first served. When a quantum expires for a running process, the process is preempted and brought to the rear of the queue. The ready process at the head of the queue gets to run next. It’s a simple scheduler but giving every process an equal share of the CPU is not necessarily a good idea since it does not schedule highly interactive (I/O-intensive) processes more frequently.

Shortest remaining time first scheduling

This scheduler picks the process with the shortest estimated CPU burst time to run next. The idea is to maximize average response time and let I/O bound processes run first and, hopefully, quickly issue their I/O requests and block. It also maximizes I/O utilization since we give priority to processes that we expect to block quickly. The trick to this scheduler is that you need to estimate the next CPU burst time. A weighted exponential average of previous CPU bursts is used to do this. The weight may be adjusted to weigh recent CPU bursts more or less than historic averages.

Priority scheduling

A priority scheduler requires that each process be assigned a priority. It simply picks the highest priority process from the list of ready processes. Priorities may be internal or external. Internal priorities are determined by the system. External priorities are assigned by the administrator. Priorities may also be static or dynamic. Dynamic priorities are priorities that are adjusted by the system as a process executes. Static priorities remain fixed for the duration of execution. A danger with priority scheduling is starvation. Starvation occurs if there is always a higher priority process that is ready to run. The lower-priority process never gets scheduled. We saw an example of this in synchronization when we looked at priority inversion. Starvation can be handled by using dynamic priorities and temporarily boosting the priority of processes that has not been scheduled in a while (this is known process aging) or, conversely, by penalizing processes that do get scheduled and use up their quantum.

Multilevel queues

Instead of a pure priority scheduler that expects processes to have unique priorities, we can set up priority classes. Each class has its own unique priority and within each class we have a queue of processes that are associated with that priority class. We may give choose to give high-priority classes short time slices to ensure good interactive performance and low-priority classes longer time slices to run longer but less frequently. Each priority class can have its own scheduling algorithm to determine how to pick processes from that class. For example, some classes may use round robin scheduling while others may use shortest remaining time first scheduling.

Multilevel feedback queues

Multilevel feedback queues are multilevel queues with dynamic priorities thrown in. The two basic rules in multilevel feedback queues are:

  • A new process gets placed in the highest priority queue.

  • If a process does not finish its quantum (that is, it blocks on I/O) then it will stay at the same priority level (round robin) otherwise it moves to the next lower priority level

This enables a process with short CPU bursts to remain at a high priority level while processes that use the CPU for longer stretches of time get penalized.

High priority queues often have a short time slice associated with them. If a process reaches the end of its time slice rather than blocking, the scheduler demotes it to the next lower-priority queue. Lower priority queues are then assigned increasingly longer time slices. The process will get knocked to a lower priority level each time it runs for the full quantum. The goal is to keep interactive processes with short CPU bursts at high priorities and lower the priorities of CPU-bound processes while giving them longer stretches of time to run (but less often).

A drawback is that if an interactive process has a stretch of computation, it will get knocked to a low level and never get pushed up. Some systems deal with this by raising a process’ priority whenever it blocks on I/O. Another approach is to raise the priority of a process that has not had a chance to run for a while. This is called process aging. Aging also ensures that starvation does not occur.

Another problem with pure multilevel feedback queues is having a programmer game the system. If a programmer writes software that always blocks on a low-latency I/O operation just before a quantum expires, the process can remain at a high priority level even though it regularly uses up most of the quantum. A way of dealing with this is to not use quantum expiration to determine whether to demote a process’ priority but rather to count the amount of CPU time that a process used over several time slices. If a process regularly uses up most of its time slice, it is deemed to be a CPU hog and gets demoted to a lower priority level. This is the general approach that the last couple of Linux schedulers have used. Because priority levels are no longer related to the expiration of a time slice, Linux gives interactive tasks a high priority and a longer time slice, with the expectation that they would block before using up the time slice.

Lottery scheduling

The goal of lottery scheduling, known as a proportional-share scheduler, is to grant processes a fixed share of the CPU. For example, one process may be allotted 75% of the CPU, another 15%, and another 10%. Each process is allocated a range of “tickets”. Tickets are numbered sequentially 0…N. At the expiration of each time slice, the scheduler picks a number in that range. The process holding the winning ticket is scheduled to run. The more tickets a process has, the higher its chance of getting scheduled. If a process is given 75% of the tickets it will, on average, be scheduled to run 75% of the time.

There are several difficulties in lottery scheduling. Tickets have to be reallocated as processes come and go (a system will often have hundreds or thousands of processes, not three as in the example above). Figuring out how to assign tickets is difficult. The scheduler will generally have to traverse the list of processes to find one with the valid range: a data structure that keeps processes in the run queue sorted by ticket range would help. Proportional-share schedulers are not used in general-purpose systems but have found use in applications such as scheduling multiple virtual machines. For example, you might have a server running eight instances of Linux and one instance of Windows. Each of these looks like a process to the native operating system and the scheduler might be configured to give each Linux virtual machine 10% of the CPU and give Windows 20% of the CPU.

Multiprocessor scheduling

With multiple processors, the goals in scheduling are load balancing and processor affinity. Processor affinity is the desire to reschedule a process onto the same processor on which it was previously scheduled to take advantage of the fact that the processor may still have memory used by that process in its cache. Hard affinity is the case where the process will always be rescheduled onto the same processor. Soft affinity is the case where the process may be rescheduled onto another processor to keep the load balanced and ensure that all processors have processes to run.

Linux, and most operating systems, maintain a run queue per CPU. Whenever a process running on a specific processor blocks or its quantum expires, the scheduler checks the run queue for that CPU and makes a decision which process should be scheduled next. This simplifies processor affinity and reduces contention for multiple CPUs accessing run queues at the same time.

Load balancing is the process of making sure that all processes have processes to run. Push migration is a periodic examination of the load (number of processes in the run queue) on each CPU. If one processor is more heavily loaded than another, some processes are moved to another CPU’s queue. Pull migration is the case where a scheduler has no ready processes to run on a CPU and has to get get one from another CPU’s queue. Push and pull migration are often used together.

Scheduling across processors isn’t as simple as maintaining processor affinity. In reality, there are levels of affinity. An Intel hyperthreaded processor presents itself as two virtual CPUs. Each CPU shares all levels of cache. In many architectures, processor cores on the same chip share a common cache (typically L3) while faster caches (L1 and L2) are local to the core and not shared. Multiple CPU chips share no caches but may have equally efficient access to memory. A Non-Uniform Memory Access (NUMA) architecture gives all CPUs access to the same memory but each CPU has a region of that memory that it can access faster than other regions.

Linux uses scheduling domains, which allows it model a hierarchy of processor groups. At the lowest level, each CPU is its own group. Two virtual CPUs on the same core form a group above that. Multiple cores on the same chip form a group above that, and so on. Each level in scheduling domain hierarchy has a balancing policy associated with it that defines when and if the the per-CPU queues need to be rebalanced.

Real-time scheduling

Go here for lecture notes

Goal: What are the needs of time-critical processes and how can schedulers accommodate them?

Real-time processes have to deal with providing guaranteed response. A start time is the time at which a process must start in response to some event, such as an interrupt from a sensor. A deadline, also known as stop time is the time at which the task must complete. A hard deadline is one in which there is no value if the deadline is missed. A soft deadline is one where there is decreasing value in the computation as the deadline slips. Processes may be terminating processes, which run and then exit. Their deadline is the maximum amount of time before they exit. Process may also be nonterminating processes. These run for a long time but perform periodic operations, such as encoding and decoding audio and video. Their deadline is not the time to exit but rather the time by which results have to be ready for each period of execution (e.g., decode video at 30 frames per second).

Real-time systems have to be designed to be able to respond to events quickly. Systems that can guarantee response times are called hard real-time systems. Those that cannot are soft real-time systems. Real-time operating systems tend to use priority schedulers (possibly with variations), have a guaranteed maximum time to service interrupts, can ensure that processes are fully loaded into the system’s memory, provide efficient memory allocation, and support multi-threaded execution or preemptable system calls. You do not want your response to an event delayed because the operating system is busy doing something else.

Earliest deadline scheduling

Each process has a deadline. The scheduler simply picks the process that has the nearest deadline (i.e., the process in greatest danger of missing its deadline).

Least slack scheduling

For each process, we know its deadline and required amount of compute time. Slack is determined by how free time is left: time to deadline minus the compute time. This is the most amount of time that we can procrastinate before we devote 100% of our resources to running the process. Least slack scheduling picks the process with the smallest slack: the one we can least procrastinate with.

When we compare the effects of earliest deadline with least slack, we see that all processes will complete on time if there is sufficient compute time available. If there isn’t, earliest deadline scheduling will likely complete some processes on time while others may be significantly late. Least slack scheduling will have the effect of having all processes be late by about the same amount.

Rate monotonic analysis

Rate monotonic analysis is a way to assign priorities to periodic tasks. It simply gives the highest priority to the process with the shortest period (highest frequency).