AdministriviaProject 1 is due February 14, Sunday, by midnight. You can download the details from the class web site. The mid-term is March 12th. I don't have a room number, yet. I have typed up my lecture notes for the fourth lecture. They are available from the class web site. Readings for this weekSections 2.4, 3.1 This lecture covers Tom Anderson's lectures 11 and 12. |
The Scheduler's job is to decide who gets the CPU next.
The algorithm used by the Scheduler to decide who gets the CPU next.
The rules used by the Scheduler to decide who gets the CPU next.
How the Scheduler has implemented its Policy. Note that the mechanism can be parameterized (parameters that can change) or fixed.
Note that some of the goals listed above conflict. As batch jobs tend to be longer than interactive jobs, then minimizing the turnaround time for long batch jobs means that the response time for interactive users may increase. Also, it will more than likely reduce the number of jobs that are completed each hour.
The big questions then is...
Early batch systems ran each job to completion. There was little need for a scheduler, other than being able to detect when a job finished, so that the next job in the queue could be started.
The Scheduler interrupts jobs when they have had their share of the CPU. This is done today because the number of interactive jobs far exceed the number of batch (background) jobs. Most systems have a clock that interrupts the CPU 50 to 60 times a second (or less). Jobs typically receive from 10 to 100 milliseconds of CPU time before they are preempted. You don't want to interrupt the CPU too often as there is overhead associated with context switching.
CPU scheduling was a hot research topic in the 1970s. Because of this, all of the good scheduling algorithms are known (provably optimal). The algorithms typically make the following assumptions:
Let's look at some scheduling algorithms and see how well they meet the goals of the Scheduler.
FIFO was used in early batch systems of the 1960s. It runs a job until it is done. This algorithm is completely fair, as it is like standing in line. The job at the front of the line receives the CPU until it is finished. Then the next job gets the CPU.
For example, say we have 5 jobs. Job 1 takes 24 hours and jobs 2 through 4 take 10 minutes. We run the first job for 24 hours and then run the next 4 after that. The average time to complete all 5 jobs is 24 hours 20 minutes. Not very good!
Round Robin is a preemptive scheduling algorithm that uses a timer to interrupt the CPU after a certain amount of time. The Scheduler maintains a queue of runnable jobs. The first job in the queue is executed until preempted. It is put at the end of the queue and the next job is executed.
This is also a fair algorithm as jobs receive essentially the same amount of CPU time. However, this can be a very inefficient algorithm if you give every process the same amount of time.
Assume that we have 10 jobs, each of which takes 100 sec to execute. Let's compare the average turnaround time for the 10 jobs between FIFO and Round Robin.
| Job Completion Times | ||
| Job # | FIFO | Round Robin |
| 1 | 100 | 991 |
| 2 | 200 | 992 |
| 3 | 300 | 993 |
| 4 | 400 | 994 |
| 5 | 500 | 995 |
| 6 | 600 | 996 |
| 7 | 700 | 997 |
| 8 | 800 | 998 |
| 9 | 900 | 999 |
| 10 | 1000 | 1000 |
| Average turnaround time | 550 sec | 995 sec |
Round Robin is good for handling short jobs. With jobs that are all the same length and for long jobs, Round Robin is not good, as you end up with a lot of context switching and jobs don't complete for a long time because of all the context switching.
You can also see that when all jobs take the same length, FIFO is optimal. However, the chance of this happening is very small.
There are a number of variations to SJF. The most common are shortest time to complete first and shortest remaining time to complete first. The goal of this class of scheduling algorithms is to get the short jobs out of the computer as quickly as possible.
The algorithms are most efficient, but least fair. This is because longer jobs will not be executed if there is a shorter job. In the worst case, the longest jobs would never run. Definitely not a fair algorithm.
We have 3 jobs - A, B, and C. A and B are CPU bound jobs that take 1 week of computer time to execute. C is an I/O bound job that executes in a loop of 1 msec of CPU time and 10 msec of disk I/O. Essentially A and B require 100% CPU time and C requires 10% CPU time.
We run A for 1 week, B for 1 week and then C runs. So the person wanting C to execute won't see any results for more than 2 weeks.
We will use a 100 msec time slice for our scheduler. Since C only requires 1 msec of CPU time, the 3 jobs will each run about 5 times in 1 second (500 msec for A, 500 msec for B, and 5 msec for C - close enough to 1 second for us). Thus, there is only about 5% utilization of the disk (5 disk accesses for 10 msec each during 1000 msec).
C will definitely finish sooner than in the FIFO case, however, A and B will both finish within 100 msec of each other, somewhere around 2 weeks after the jobs were submitted. It would be nice to be able to get a higher utilization of the disk, however.
There is really no need to preempt C, since it is going to interrupt itself every 1 msec of CPU time. In addition, when a disk I/O for C is complete, we can force a hardware interrupt of the CPU, causing A or B to release the CPU. This is what SJF can do for us.
What happens is that when C is ready to run its CPU job, it preempts A or B, runs for 1 msec, posts an I/O request and goes to sleep. One of the long jobs gets the CPU for 10 msec until C's I/O request is fulfilled. This process repeats until C completes.
With this scheme, we are still keeping the CPU 100% busy and we also get 90% usage of our disk. Much better than either of the other methods.
From the previous example, SJF seems to be wonderful. However, there are some problems with a pure SJF scheduler.
IBM used a scheduler on its older mainframes that required you to estimate how long your job would take. It based its scheduling on the value you provided. If you underestimated, your job was dropped and had to start over from the beginning. This led to serious over-estimating to avoid having your job killed. Thus, the optimality was lost because the scheduling was not based on the real time required to complete a job, but on a over-estimated time.
The biggest advantage is that SJF is optimal for average response time and the number of jobs completed in an hour.
We can't always predict how long a job will take, but we can use the past history of the job's execution patterns to derive an estimate of the time required for the current job session.
The job time for the current session of the job is based on past performance. Jobs that only require short amounts of CPU time will tend to continue that pattern and jobs that are CPU bound tend to stay that way. Priority scheduling is based upon the estimates calculated.
Scheduling decisions are based on job prior activity in some fashion. Thus, the priority of a job can change over time.
Multilevel Feedback Queues were first implemented in CTSS (MIT). CTSS was the first real timesharing system. The problem with CTSS was that it could only have one process in memory at a time. This meant that a context switch required the entire running process to be swapped out to memory and the next process loaded in its entirety into memory. Because of this, the designers of CTSS needed to minimize context switching.
They developed an adaptive policy. They have multiple priority queues. Each priority queue used round robing scheduling within the queue. Jobs of lower-level priority would only run if there were no jobs of higher priority waiting to run.
By itself, this doesn't help the context switching problem. What the CTSS designers did that helped was to exponentially increase the time slice as the priority level dropped. The highest priority queue got a time slice of 1 unit. The next highest priority received a time slice quantum of 2. Each lower level priority queue had there time slice quantum doubled.
So how do we decide which priority queue to place jobs? All jobs start in priority level 1. If a job uses up all of its allotted time (gets preempted based on using all of its time slice), it is bumped down one level. Thus, the next time it runs, it gets twice the CPU time. For CPU-bound jobs, they will work their way down the priority levels getting longer and longer pieces of CPU time.
This worked as far as minimizing context switching, however, during busy times, CPU-bound jobs would not get executed. There was no way to force a job to run as long as there were higher priority jobs available.
Let's emphasize fairness over efficiency.
One way to be more fair is to increase the priority level of jobs the longer they take to complete. Thus, big jobs that might start out at a lower priority would have their priority increased over time, giving them a larger share of the CPU time.
The problem with this mechanism is how to determine how fast priorities should increase. If they increase too fast, you will give too much CPU time to long jobs, at the expense of the shorter jobs (like interactive ones). In addition, if the CPU is very busy, all of the jobs will have increased priority so that new jobs entering the computer will be at the bottom of the queue.
Let's give every job some number of lottery tickets (that we don't take away). At each time slice, the Scheduler randomly picks a winning ticket. The priority level of the job is determined by the number of lottery tickets a process has. On average, the amount of CPU time a process gets is proportional to the number of lottery tickets they started with.
How do we assign tickets to jobs?
We can approximate the SJF scheduling algorithm if we give short jobs more tickets than
long jobs. To avoid starvation, every job gets at least one ticket.
The advantages to this method is that as the load increases, long jobs are not delayed as much as in SJF (because they have at least one ticket). In addition, when jobs are added, or complete, the effect on the remaining jobs is spread equally. Everyone has their chance of executing reduced/increased proportionally.
Queueing theory helps to predict response time. Response time is based on the number of users, the speed of the CPU and the speed of the disk drives. You want to buy a faster computer when queueing theory can show you that you will improve response time.
How does response time vary anyway?
Response time in this case decreases linearly with the number of users and increases linearly with a faster CPU.
Response time does not change with increasing users until you reach 100% CPU utilization. Once that is reached, response gets linearly worse like in the worst case.
The goal of an OS is to keep processes and threads from hurting each other and from hurting the OS. Most OSes accomplish this through a protected kernel mode and an unprotected user mode. In kernel mode, you have access to all of physical memory. In user mode, you only have access to the memory allocated to your process. This is called your address space.
To better understand where modern OS protection schemes come from, we need to review some of the older techniques, some of which are still in use in desktop computer OSes.
The OS only allowed one process at a time in memory. Your program could only hurt itself. There was no need for protection (what fun is it to hack your own program, anyway?).
This is also called a Linker-Loader. When a program is copied into memory, all addresses are changed to reference addresses appropriate to where the program is loaded into memory. This takes time to convert all the addresses before you can start running your program. This must be done by software, as well.
To successfully implement a protection scheme, we need help from the computer hardware. If our solution was solely dependent on software, then some condition could arise where a user program could break through the protection and change the protection software. We will also take advantage of kernel mode vs. user mode.
We said earlier that a program's address space is all of the memory that a program has available to it. Protection involves restricting the memory that a program can access. We will use the concept of address translation in our protection scheme.
Address translation will help because there is no way for a program to address any memory outside of its address space. We will use an MMU (Memory Management Unit) that will sit between the CPU and physical memory. Any address requested by a user program will be "translated" by the MMU to a location in physical memory. The MMU will know what is a valid address for a user program and what is not. Any invalid request will cause a trap to the OS, generating an error, or some other response by the OS.
We will see in next week's lecture how address translation can be implemented.
In user mode, there is no access to physical memory. All addresses are translated by the MMU. In kernel mode, there is no translation. Kernel-mode programs (like the OS) have access to all physical memory. Nachos, Unix, and most OSes run in kernel mode.
When the OS executes a user program, it creates a thread to do the following:
How does a user program get back into the kernel so that it can do things like printing
and reading/writing files? There are 2 ways a user program can access kernel mode:
Voluntarily: The program makes a system call. System calls are special
instructions that jump to the OS. Essentially, your program is asking the OS to do something
for you.
Involuntarily: A hardware interrupt or program exceptions cause the kernel to activate.
The activities of a system call are mostly atomic (we generally don't want to interrupt them). The following things are fairly standard for system calls:
If we could interrupt a system call at the right time, we could have our user program run in kernel mode and really have a good time killing other processes and the OS.
When we switch between programs we need to save/restore all of the registers (you have no way of knowing what registers someone else used), save/restore the address translation table and jump back to the value of the program counter.
System calls use CPU registers and can also write to physical memory. User program addresses are translated through the MMU, but kernel addresses are not.
Programs do not share memory (no sharing of address spaces). Thus all communication between processes (inter-process communication) is done through system calls. Once you allow processes to communicate with each other, you allow for the possibility of bugs. Just like system calls must validate the input, when sharing information between processes/threads you must validate the input. Most bugs in commercial software arise because someone assumed the input data would have certain characteristics.
A shell in Unix is a user program that prompts users to enter commands. The shell executes a system call to perform the command entered by the user.
Running a program on Unix causes the following to happen:
OK, what's a fork and an exec in Unix, anyway? A fork (in Unix, anyway) creates an exact copy of the current process (including the data and variables already created); it does not change the program that is to be run. The process executing the fork is stopped and the new process is put on the ready list.
An exec, on the other hand, does not create a new process, it just causes the current process to run a new program.
So why do we care about this, anyway?
Because Unix does not have threads, there is no direct way to communicate between processes. Since we can't share address spaces, we need some other way to share information between processes. The Unix designers decided to split the fork and exec processes to simulate the sharing that can be done with threads. Since a fork creates an exact duplicate, all of the data available to the parent process is available to the child process created by the fork.
The disadvantage to this mechanism is that the child process has no easy way to give its results back to the parent. Since the information is copied, the parent cannot directly see what data was created by the child process. You can pipe the results to the parent (essentially like reading from a file), or physically store the information in a database or separate file.
Since Nachos does support threads, it doesn't need to perform the same trick as Unix does. Nachos combines the fork and exec into a single operation.