Processor Management Approaches in Scheduler Activation, Mach and Prospero Resource Manager.

This article talks about processor allocation strategies employed by 3 systems, namely, Scheduler Activation, Mach and Prospero Resource Manager.


When computing took birth, we had single processor machines running one application/program at a time. Sometime later came around the idea of multi-tasking which gave rise to a very important concept, which is at the center of multi-tasking systems, called scheduling. There are a variety of scheduling algorithms for a traditional multi-tasking system, but most can be broadly classified into two classes:

1) Time-sharing

2) Priority based

 

What multi-tasking gave us was now all the processes were progressing simultaneously but each of the process itself was not running any faster, in fact, could be running slower because now there are competing processes for CPU cycles.

 

After this, came multi-processor machine, which means a machine with multiple processors in it.

On top of multi-processor machines, if we employ old multi-tasking approach, we would get similar results as a uni-processor machine (each process would not run any faster), just that the total time would be divided by n as there are n processors (not exactly 1/n though, in fact it would be significantly higher because of overheads involved such as context-switches etc.).

 

To really run a process faster than it would run on a simple multi-tasking machine, we needed something called as multi-threading.

The concept of multi-threading is based on the fact that sometimes an individual process itself can have concurrency inside it. What we mean by that is there are sections in the execution instruction set which could be executed simultaneously on a multi-processor machine. The degree of such parallelism is called application concurrency. We employ multi-threading to achieve it and the amount we are able to achieve is called as application parallelism.

 

But before we delve into scheduling of threads, complications (how information is distributed and utilized) and choices we have to make there, we need to acknowledge division that exists in the multi-processor systems. Broadly, there are two types of multi-processor systems:

1) Shared Memory Multi-Processors (SMMP): mostly a single chip solution with memories and processors connected through a bus.

2) Distributed Memory Multi-Processor (DMMP): a network of independent pairs (each pair formed of a processor and a memory).

 

We are going to discuss 3 approaches to scheduling in these 2 different environments.

1) Mach

2) Scheduler Activation

3) Prospero Resource Management (PRM)

 

As the paper on Scheduler Activation suggests, threads can be supported either by the operating system kernel or by the user-level library code in the application address space. But neither approach has been fully satisfactory. Let.s briefly look at the reasons.

 

Kernel threads are inherently slower than the user-level threads for two main reasons:

1) First, when context switches happens between threads of same process, it.s expensive.

2) Second, kernel threads usage implies a system-wide scheduling policy would be used. But application is the best entity to take decisions about the scheduling of its own threads, because it has application specific information, and hence any system-wide scheduling policy used would not fare well in a generic environment.

 

Note: The term Process Level Scheduler and User Level Scheduler have been used interchangeably.

 

On the other hand user threads suffer from their own shortcomings. Although user threads would perform excellently well in an ideal world where there are no I/Os, page faults and system calls. But in reality such real world activities can make user level threads perform very poorly.

 

The research community realized this problem started working towards a solution which employs support from both user level and kernel level to achieve the best results. Through these efforts, systems such as Mach and Scheduler Activations came out.

 

Before further delving into details of each one of these, let.s look at the entities involved and information that they withhold which can be put to use while making scheduling decisions. There are three entities involved in the system:

1) Kernel

2) Process level scheduler

3) Application/Programmer/User

 

This is a layered architecture and Kernel being the lowest layer, Process Level Scheduler being the second (on top of Kernel) and Application/Programmer/User is the top most. Just like with any layered system, End-to-End argument would apply here. We would like to do things as high in the layer as possible unless we have enough evidence that we need a particular functionality for all possible implementations above a particular layer and hence we would incorporate that functionality in that particular layer.

 

Under the light of End-to-End argument, if all the scheduling decisions are being taken at the kernel level (as in the case of first generation multi-tasking schedulers), such approach will not work best with all the applications. This is because only Process Level Scheduler has information such as which of its thread has a particular lock or which of its thread is executing in critical region and hence needs more CPU cycles for now. Similarly, there are some cases where application specific or instance specific information is available only to programmer/user which can be extremely useful while making scheduling decisions, for example, applications requiring gang-scheduling.

 

Gang scheduling means simultaneous scheduling of an application.s components/threads.

 

Now let.s look at each of the 3 systems and how they extract and use this information.

Mach

Mach.s processor allocation facility achieves these goals by dividing the responsibility among three 3 entities:

1) Operating system or Kernel, performs the allocation mechanism

2) Server, implements allocation policy

3) Application, requests processors and can control their usage once assigned

 

These 3 entities very well correspond to the 3 entities we discussed before this section. In Mach, the ownership/full control remains with the Server entity. This is required to maintain fairness in the usage by multiple processes and avoid situations where a malicious process hogs up all the CPU resources.

 

Mach introduces 2 new objects, the processor and the processor set. Each application has threads which are associated to one or more processor sets. Each processor set could be assigned processors. When a processor set has processors assigned to it, it executes only those threads assigned to that processor set. From threads perspective, it can only see processor sets and can control the behavior of processor sets. So, if a threads wants to have full control, it can requests for as many processor sets as there are threads in the system and once processors are assigned to the processor sets, it can control their behavior. Having full control has some significant advantages which we will discuss a little later.

 

All the allocation requests go through Server and hence Server has the full control over processor allocation and can take away the processors from a processor set on need basis.

 

If we breakdown this approach to understand it, we would find that the broad level scheduling is done by Server while fine grain scheduling is done by the application. Hence Mach provides mechanisms for both Application and Server to affect the scheduling of processors. Server maintains fairness and assigns idle processors to needy processor sets, while Application, since it has application specific information, can control scheduling of processors once assigned.

 

In case of a pre-mature pre-emption, such as an I/O or a lock, an application can provide something called as Hint to the Mach Operating System Kernel. The hint is based on two pieces of information:

1) Thread can not make progress until synchronization/communication is complete with another thread

2) The thread id of the thread on which it is blocked

 

There are two classes of hints:

1) Discouragement hints: optimistic approach, giving up CPU in the hope that the thread holding the lock would run

2) Hand-off hints: Specifically asking Kernel to schedule the thread which is holding the lock.

 

Hand-off hints performed very well on experiments.

 

Note: Mach.s port mechanism offers location transparency and hence it can work in DMMP environment as well though our discussion was focused to SMMP only, but similar arguments would apply there too. There is one limitation here though, which we would discuss under the discussion of PRM.

Scheduler Activation

The approach taken by Scheduler Activation requires it to have only two entities in the system:

1) User/Process Level Scheduler

2) Kernel

 

While User Level Scheduler controls the overall scheduling behavior once it has been assigned a certain number of processors, the Kernel maintains the rights over processors so as to provide fairness to the overall system. This approach looks very similar to pure User Level Threads approach. But there is a catch. The problem with older User Level Threads approach was when any one of the threads of a process makes as I/O system call, the Kernel blocks the whole process and overall progress of the process gets halted. Such a scenario is avoided in Scheduler Activation by something called as scheduler activation.

 

In scheduler activation, when such an event (I/O blocking) happens, the Kernel instead of just pre-empting the whole process, pre-empts only that thread and makes an Up Call in to the user-level scheduler of the process informing which thread has just been blocked and why. Then the User Level Scheduler can take appropriate action and if there is any run-able thread, it can schedule that. Similarly, an application can inform the kernel when it needs extra processor (through a call like .Add more processors (#).) or if it has an extra processor which is not required (through calls such as .This Processor is idle ().).

 

Note: The 2 calls mentioned above provide ways for a programmer to affect the scheduling.

 

This overall collaboration between User Level Scheduler and Kernel was something which was missing in older User Level Thread mechanisms and it is provided in Scheduler Activation.

 

Such collaboration allows User Level Scheduler to control how many (and which) processors have been allocated to it and it can have complete control over which of its threads are running on those processors. Such control allows User Level Scheduler to explicitly manage cache locality which could be of significant advantage considering cache sizes of today.s processor have increased significantly.

 

Prospero Resource Manager

Some Terminologies:

Node: An independent system with computing capabilities

Job: A collection of inter-related tasks which will be communicating with each other while executing

Task: A collection of threads

 

PRM solves what other systems left out. That is, scalability. PRM employs three types of managers:

1) System Manager

2) Job Manager

3) Node Manager

 

 

Before we discuss System Manager, let discuss Job Manager and Node Manager. Job Manager corresponds to User Level Scheduler which we discussed in the above systems and Node Manager corresponds to Kernel.

 

The Job Manager role is to take care of all the requirements of a particular job. Job Manager acts as the single point of communication for requesting resources for the application. The Job Manager provides the abstraction of a single virtual system to the application executing. Job Managers defines the execution environment. The possible execution environment could be multi-processing only, or fault tolerant or real-time or a combination of one or more. When an application selects a Job Manager, it identifies the application.s requirement and, using Prospero Directory Service/Configuration file; it locates executing entities in the distributed system and schedules the tasks on them.

 

Node Manager on the other hand is responsible for a particular computing resource. It loads the application binaries and executes them once it has received message a authorized Job Manager. The Node Manager notifies the Job Manager about events such as completion or failures of tasks.

 

Now once we have a huge pool of such computing resources, conventional techniques for managing resources perform poorly. Systems, like Mach which support distributed computing through the implementation of location transparent Ports, will not scale beyond a certain number of computing nodes. Identifying the location, maintaining the availability information and providing fault-tolerant services becomes a bottleneck. To solve this problem, PRM introduced another manager called System Manager. System Manager is not a single box. It.s a hierarchical concept very similar to DNS. Every System Manager is responsible for a subset of resources (consisting of Node Managers). And then there are System Managers responsible for a set of such System Managers. This System Manager Layer flowing through large scale distributed systems blends the system so well that it is able to scale up to such numbers.

 

System Manager keeps track of all the management information about all the Node Managers and assists the Job Manager in locating available resources giving a fault-tolerant view to the applications.

 

Since Job Manager is a user level entity, there are a number of different Job Managers available to choose from and applications, depending upon their needs, can choose from this set. This approach gives a great level of flexibility to the applications and at the same time let them choose from a pre-programmed set of Job Managers. If they need a specific Job Manager, programmers have the freedom to code it.

 

To sum it up:

-         Scheduler Activation is a good single system solution.

-         Mach provides things such as gang scheduling, and although it is good for a single system but can scale up to few nodes, owned by a single organization.

-         PRM offers scalability of the order of hundreds of nodes (or more) and will be the choice in case the distributed system spans multiple organizations and ownership.

References:

All the papers I read as part of my cs-555 course: Advanced Operating Systems. List will be available soon.

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees