Chapter 7: Process Synchronization
Subtopics

1. Introduction to Process Synchronization

In modern operating systems, multiple processes often run concurrently, sharing system resources like CPU time, memory, files, and I/O devices. This concurrency can significantly improve system throughput and responsiveness. However, it also introduces complexities, especially when processes need to access shared data or resources.

Imagine two processes trying to update the same bank account balance simultaneously. If their operations are interleaved without control, the final balance could be incorrect. This is a classic example of a race condition.

To prevent such issues, we need process synchronization mechanisms. These mechanisms ensure that cooperating processes execute in an orderly manner, maintaining data consistency and resource integrity. A core concept in synchronization is the Critical Section: a segment of code where a process accesses shared resources. Synchronization techniques aim to ensure that at most one process is executing in its critical section (for a particular shared resource) at any given time.

While synchronization tools like mutexes, semaphores, and monitors help manage access to critical sections, improper use or complex interactions can lead to serious problems. One of the most challenging problems arising from resource contention and synchronization is Deadlock.

2. Deadlock: The Problem

2.1. What is Deadlock?

A Deadlock is a state in a system where a set of two or more processes are permanently blocked, each waiting for a resource held by another process in the same set. None of the processes can proceed, release their resources, or be awakened. This results in a standstill, where the involved processes are effectively frozen, unable to make any progress.

Think of a narrow two-way street where two cars meet head-on. Neither car can move forward because the other is in the way, and neither is willing to back up. This is a physical analogy for a deadlock.

2.2. Deadlock vs. Starvation

It's crucial to distinguish deadlock from a related problem called Starvation (also known as indefinite postponement).

  • Deadlock: A set of processes are blocked for an infinite time. Each process in the set is waiting for an event (resource release) that can only be triggered by another blocked process in the set. There is no external way for them to proceed without intervention. It's a permanent, circular waiting condition.
  • Starvation: A process is blocked for an indefinite (but not necessarily infinite) time. The process is repeatedly denied access to a resource it needs, even though the resource may become available. This can happen due to unfair scheduling policies or resource allocation strategies that consistently favor other processes. The process *could* eventually run, but there's no guarantee when.

Core Distinction: In deadlock, the blocking is permanent and circular among a group. In starvation, a process might be perpetually unlucky in acquiring resources, but the system isn't necessarily in a circular wait. For example, a low-priority process might starve if high-priority processes keep arriving.

Consider a scenario: Process A holds Resource X and waits for Resource Y. Process B holds Resource Y and waits for Resource X. This is a deadlock.

Now, consider Process C, which needs Resource Z. Resource Z is frequently used by high-priority Processes D and E. Even if Z becomes free, D or E might get it before C. C is starving but not necessarily part of a deadlock involving D and E.

3. Necessary Conditions for Deadlock

A deadlock situation can arise if and only if four conditions hold simultaneously in a system. These are known as the Coffman conditions:

3.1. Mutual Exclusion

At least one resource must be held in a non-sharable mode. This means that only one process can use the resource at any given time. If another process requests that resource, the requesting process must be delayed until the resource has been released.

  • Core Idea: Resources like printers, CPU registers, or a file opened in write mode are inherently non-sharable. If resources were infinitely sharable, this condition wouldn't hold, and deadlocks over those resources wouldn't occur.
  • Example: A printer can only print one document at a time. If Process P1 is using the printer, Process P2 must wait.

3.2. Hold and Wait

A process must be holding at least one resource and waiting to acquire additional resources that are currently being held by other processes.

  • Core Idea: Processes don't grab all they need at once. They acquire some, start working, and then request more. This incremental acquisition while holding existing resources is key.
  • Example: Process P1 holds Resource R1 (e.g., a database lock) and is waiting for Resource R2 (e.g., a tape drive), which is currently held by Process P2.

3.3. No Preemption

Resources cannot be preempted. That is, a resource can be released only voluntarily by the process holding it, after that process has completed its task with that resource. No external entity (like the OS) can forcefully take a resource away from a process holding it.

  • Core Idea: Once a process gets a resource, it keeps it until it's done. You can't just "snatch" it away.
  • Example: If Process P1 is writing to a file, the OS typically won't take the file access away from P1 midway through its write operation, as this could lead to data corruption. P1 must release it.

3.4. Circular Wait

There must exist a set {P0, P1, ..., Pn} of waiting processes such that P0 is waiting for a resource held by P1, P1 is waiting for a resource held by P2, ..., Pn-1 is waiting for a resource held by Pn, and Pn is waiting for a resource held by P0.

  • Core Idea: This is the "circular dependency" that seals the deadlock. Each process in the chain holds something the next one needs.
  • Example:
    • Process P0 holds Resource R0 and waits for Resource R1.
    • Process P1 holds Resource R1 and waits for Resource R2.
    • Process P2 holds Resource R2 and waits for Resource R0.
    This forms a circular chain P0 → R1 → P1 → R2 → P2 → R0 → P0.

Crucial Point: All four conditions *must* be present for a deadlock to occur. If even one of these conditions is prevented, deadlock is impossible. This insight forms the basis for deadlock prevention strategies.

4. Resource Types & Resource Allocation Graph (RAG)

Understanding resources is key to understanding deadlocks. Resources can be categorized by the number of instances they have.

4.1. Resource Instances

  • Single Instance Resources: These are resources of which only one unit exists in the system.
    • Examples: A system might have only one physical printer, one tape drive, or a specific critical section of code protected by a single mutex.
    • Notation: Resource Type R_A with 1 instance.
  • Multiple Instance Resources: These are resources of which multiple identical units exist. Any one of these units can satisfy a request for the resource.
    • Examples: A system might have 3 identical CPUs (for scheduling), 5 identical tape drives, or a pool of 10 memory blocks of a certain size. A semaphore initialized to N > 1 can represent N instances of a generic resource.
    • Notation: Resource Type R_A with 5 instances.

4.2. Resource Allocation Graph (RAG)

A Resource Allocation Graph (RAG) is a directed graph used to visually represent the state of resource allocations and requests in a system. It helps in understanding and detecting deadlocks.

A RAG consists of a set of vertices V and a set of edges E.

  • Vertices (V):
    • Processes (P): Represented by circles (e.g., P1, P2).
    • Resource Types (R): Represented by rectangles (e.g., R_A, R_B). Dots inside the rectangle represent the number of instances of that resource type.
  • Edges (E):
    • Request Edge (Pi → Rj): A directed edge from process Pi to resource type Rj indicates that process Pi has requested an instance of resource type Rj and is currently waiting for it. Represented by an arrow from the process circle to the resource rectangle.
    • Assignment Edge (Rj → Pi): A directed edge from resource type Rj to process Pi indicates that an instance of resource type Rj has been allocated to process Pi. Represented by an arrow from a dot within the resource rectangle to the process circle.

Interpreting RAG for Deadlocks:

  • If the RAG contains no cycles: No deadlock exists in the system.
  • If the RAG contains a cycle:
    • If each resource type in the cycle has only a single instance: A deadlock definitely exists. The cycle is a sufficient condition for deadlock.
    • If resource types in the cycle have multiple instances: A cycle is a necessary but *not sufficient* condition for deadlock. A deadlock *may* exist. Further analysis (like a safety algorithm) is needed. A process might be waiting for a resource in the cycle, but there could be another instance of that resource available or held by a process not in the cycle that will eventually release it.

Example of a simple RAG leading to deadlock (single instances):

P1 requests R1 (P1 → R1)
R1 is held by P2 (R1 → P2, if R1 was the specific instance)
Actually, more accurately for RAG representation:
Process P1 holds Resource Instance R_A_i and requests Resource Type R_B.
Process P2 holds Resource Instance R_B_j and requests Resource Type R_A.

Visualized:
  (P1) --request--> [R_B]
    ^                |
    | assigned       | assigned
  [R_A] <--request-- (P2)

Cycle: P1 → R_B (instance held by P2) → P2 → R_A (instance held by P1) → P1. This is a deadlock.
                        

5. Deadlock Handling Strategies: An Overview

Operating systems can deal with the deadlock problem in one of several ways:

  1. Deadlock Prevention or Avoidance: These are proactive approaches. The goal is to ensure that the system will *never* enter a deadlock state.
    • Prevention: Design the system to negate one or more of the four necessary conditions for deadlock. This is often restrictive.
    • Avoidance: Use information about future resource needs of processes to decide if a resource request can be granted safely. If granting a request might lead to a deadlock, the process is made to wait.
  2. Deadlock Detection and Recovery: This is a reactive approach. The system allows deadlocks to occur, then provides mechanisms to detect them and recover from them.
  3. Ignore the Problem (Ostrich Algorithm): Pretend that deadlocks never occur. This is a common approach used by many operating systems (like UNIX and Windows) for general user processes. It's based on the assumption that deadlocks are rare, and the overhead of prevention/avoidance/detection is higher than the cost of an occasional deadlock (which might require a system reboot or manual process termination). This is not suitable for critical systems.

We will now delve into Prevention, Avoidance, Detection, and Recovery in detail.

6. Deadlock Prevention

Deadlock prevention works by ensuring that at least one of the four necessary conditions for deadlock cannot hold. Let's examine how to negate each condition:

6.1. Negating Mutual Exclusion

Idea: Make resources sharable. If no resource requires exclusive access, then mutual exclusion is not required, and deadlock is impossible.

  • Feasibility: This is not generally possible for many types of resources. For example, a printer cannot be shared by multiple processes simultaneously printing different documents. Some resources, like read-only files, are inherently sharable. For others, like writeable data structures, making them "sharable" without proper synchronization leads to race conditions, which is what we were trying to solve in the first place.
  • Consequence: Generally impractical for most critical resources. Spooling (e.g., for printers) can make a dedicated resource appear sharable by queuing requests, but the underlying device is still used exclusively.

6.2. Negating Hold and Wait

To negate this condition, we must ensure that whenever a process requests a resource, it does not hold any other resources.

Two common protocols are:

  1. Request All Resources at Once (All-or-Nothing Allocation): A process must request all its required resources before it begins execution. The system grants all resources or none. If all are granted, the process can run. If not, it waits, holding no resources.
    • Problem (Hold): While this prevents the "wait" part of "hold and wait" *during the request phase*, once the process gets all resources, it *holds* them, potentially for a long time, even if some are needed only much later. This leads to poor resource utilization.
    • Problem (Starvation): A process needing many popular resources might wait indefinitely (starve) if at least one of its requested resources is always in use.
    • Problem (Prediction): Processes often don't know all the resources they'll need in advance.
  2. Release All Resources Before Requesting New Ones: A process must release all resources it currently holds before it can request any new resources.
    • Problem (Starvation/Inefficiency): This can be highly inefficient. A process might have completed significant work with a set of resources, then needs one more. It has to release everything, re-request all (including the new one), and potentially re-do work if intermediate states were lost. This also increases the window for starvation.
  3. Overall: These techniques can solve deadlock but often lead to low resource utilization and potential starvation.

    6.3. Negating No Preemption

    Idea: If a process holding some resources requests another resource that cannot be immediately allocated to it, then all resources currently being held by this process are preempted (taken away).

    Two approaches:

    1. Forceful Preemption: If a process P1 requests a resource held by P2, and P2 is waiting for another resource, the OS could preempt the resource from P2 and give it to P1 (or another high-priority process).
      • Consequence: Complex to implement. Preempting a resource can be problematic if the process was in the middle of an update (e.g., modifying a file). The state of the resource must be saved and restored. This can lead to significant overhead and potential data inconsistency if not handled carefully. Can also lead to starvation if a process's resources are repeatedly preempted.
    2. Self-Resource Preemption (Voluntary Preemption): If a process requests a resource and it's unavailable, the process voluntarily releases all resources it currently holds. It's then put back in the waiting queue for all its original resources plus the new one.
      • Consequence: Similar to the "Release All Resources" strategy under Hold and Wait. Can lead to starvation and inefficiency. A process "releases resources of ourself thinking that others might need them."
    3. Overall: Preemption is often applied to resources whose state can be easily saved and restored (e.g., CPU registers, memory). It's rarely applied to resources like printers or files mid-operation.

      6.4. Negating Circular Wait

      Idea: Impose a total ordering of all resource types, and require that each process requests resources in an increasing (or decreasing) order of enumeration.

      • Mechanism (Linearity in Resources): Assign a unique integer to each resource type (e.g., R1=1, R2=2, R3=3). A process can request resources only in increasing order of their numbers. That is, if a process holds resource R_i, it can only request resources R_j such that j > i.
      • Alternative: If a process currently holds R_i and needs R_k where k < i, it must first release R_i (and any other resources R_m where m > k), then request R_k, and then re-request R_i (and others) in order.
      • Why it works: A circular wait implies P0 waits for R_a held by P1, P1 waits for R_b held by P2, ..., Pn waits for R_z held by P0. If we assign numbers f(R_a), f(R_b), ..., f(R_z) to these resources, then for a circular wait to exist under this protocol, we would need f(R_a) < f(R_b) < ... < f(R_z) < f(R_a). This is a contradiction, so a cycle cannot form.
      • Example: Resources FDD=1, Tape Drive=2, Printer=3.
        • Process P can request FDD, then Tape Drive, then Printer. Valid.
        • If P holds Tape Drive (2) and Printer (3), and now needs FDD (1), it must release Printer, then release Tape Drive, then request FDD, then Tape Drive, then Printer.
      • Consequence: Resource ordering is a practical and widely used prevention technique, especially in database systems for lock ordering. However, it can be inconvenient for programmers and may not always be natural for the problem at hand, potentially leading to reduced resource utilization if resources must be acquired earlier than needed just to maintain order.

7. Deadlock Avoidance

Deadlock avoidance requires that the operating system be given additional information in advance concerning which resources a process will request and use during its lifetime. With this information, the system can decide for each request whether or not the process should wait. It dynamically examines the resource-allocation state to ensure that a circular-wait condition can never exist.

The resource-allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes.

7.1. Safe State vs. Unsafe State

  • Safe State: A state is safe if the system can allocate resources to each process (up to its maximum) in some order and still avoid a deadlock. More formally, a state is safe if there exists a safe sequence of processes such that for each Pi, the resources that Pi can still request can be satisfied by the currently available resources plus the resources held by all Pj, with j < i. If Pi's resource needs are not immediately available, then Pi can wait until all Pj (j < i) have finished and released their resources. When they do, Pi can obtain all of its needed resources, complete its designated task, return its allocated resources, and terminate.
  • Unsafe State: A state that is not safe. An unsafe state is not necessarily a deadlocked state. It's a state that *could* potentially lead to a deadlock if processes make unfortunate sequences of future requests. The OS cannot guarantee that deadlock will be avoided if the system is in an unsafe state.

Key Idea of Avoidance: The system must ensure it never enters an unsafe state. When a process requests resources, the system pretends to grant them, then checks if the resulting state is safe. If yes, grant. If no, the process must wait.

Regarding your note: "Unsafe state: Cycle is present in case of single instance. In case of multiple instance when it is not safe state, its unsafe state." This is slightly nuanced. - For single instance resources, if an allocation leads to a cycle in the RAG, it's not just an unsafe state, it *is* a deadlocked state. Avoidance algorithms for single instance resources (like a RAG cycle detection *before* granting a request) would prevent this cycle formation. - For multiple instance resources, an unsafe state means there's no sequence in which all processes can complete. It doesn't *immediately* mean a deadlock or a cycle (though a deadlock implies an unsafe state). The Banker's algorithm helps navigate this.

7.2. Banker's Algorithm

The Banker's Algorithm is a prominent deadlock avoidance algorithm that works for systems with multiple instances of each resource type. It's named so because it models how a banker might manage a limited amount of money to satisfy customer loan requests without going bankrupt.

Prerequisites:

  1. Each process must declare its maximum possible claim on each resource type. This cannot change during execution.
  2. When a process requests resources, it may have to wait.
  3. When a process gets all its requested resources, it must return them in a finite amount of time.

Data Structures for Banker's Algorithm:

Let 'n' be the number of processes and 'm' be the number of resource types.
  • Available (Vector of length m): Available[j] = k means there are 'k' instances of resource type Rj available.
  • Max (n x m matrix): Max[i][j] = k means process Pi may request at most 'k' instances of resource type Rj.
  • Allocation (n x m matrix): Allocation[i][j] = k means process Pi is currently allocated 'k' instances of resource type Rj.
  • Need (n x m matrix): Need[i][j] = k means process Pi may still need 'k' more instances of resource type Rj to complete its task.
    Need[i][j] = Max[i][j] - Allocation[i][j].

The Banker's Algorithm consists of two main parts:

  1. Safety Algorithm: This algorithm determines if the current system state is safe.
    1. Initialize Work = Available (vector of length m) and Finish[i] = false for all i (vector of length n).
    2. Find an index 'i' such that both:
      a. Finish[i] == false
      b. Need[i] <= Work (vector comparison: Need[i][j] <= Work[j] for all j)
    3. If no such 'i' exists, go to step 4.
    4. Work = Work + Allocation[i] (process Pi releases its resources)
      Finish[i] = true
      Go to step 2.
    5. If Finish[i] == true for all 'i', then the system is in a safe state. Otherwise, it's unsafe.
  2. Resource-Request Algorithm (for process Pi requesting resources `Request_i`):
    1. If Request_i <= Need[i], go to step 2. Otherwise, raise an error (process exceeded its max claim).
    2. If Request_i <= Available, go to step 3. Otherwise, Pi must wait (resources not available).
    3. Pretend to allocate requested resources to Pi:
      Available = Available - Request_i
      Allocation[i] = Allocation[i] + Request_i
      Need[i] = Need[i] - Request_i
    4. Now, run the Safety Algorithm on this new (hypothetical) state.
      • If the state is safe, the resources are actually allocated to Pi.
      • If the state is unsafe, Pi must wait, and the old resource-allocation state is restored (the pretend allocation is rolled back).

Complexity: The Banker's algorithm is O(m * n^2) for the safety check, which can be significant if run frequently. Declaring maximum needs can also be difficult for processes.

8. Deadlock Detection

If a system does not employ either deadlock prevention or deadlock avoidance, then a deadlock situation may occur. In this environment, the system must provide:

  • An algorithm that examines the state of the system to determine whether a deadlock has occurred.
  • An algorithm to recover from the deadlock.

8.1. Symptoms & When to Detect Deadlock

The OS might suspect a deadlock if it observes certain symptoms:

  • Majority of processes are getting blocked: Many processes are in a wait state for extended periods.
  • Lower CPU utilization: If many processes are blocked, the CPU will be idle more often as fewer processes are ready to run.
  • System throughput drops significantly.

When should the detection algorithm be invoked?

  • Periodically: Run the detection algorithm at fixed time intervals (e.g., every hour or when CPU utilization drops below a threshold). This can be too late or too frequent.
  • On resource request: Run it whenever a resource request cannot be immediately granted. This is more targeted but can be computationally expensive if requests are frequent.

8.2. For Single Instance Resources: Wait-For Graph (WFG)

If all resources have only a single instance, we can use a variant of the Resource Allocation Graph called a Wait-For Graph (WFG).

  • Construction: The WFG is obtained from the RAG by removing the resource nodes and collapsing the corresponding edges.
    • Nodes are only processes.
    • An edge Pi → Pj exists in the WFG if process Pi is waiting for a resource currently held by process Pj.
  • Detection: A deadlock exists in the system if and only if there is a cycle in the wait-for graph. Standard graph algorithms (like Depth First Search) can be used to detect cycles.
    If P1 holds R1 and waits for R2,
    and P2 holds R2 and waits for R1.
    
    RAG: (P1) → [R2] ← (P2)
           ↑      ↓
          [R1]----↑ (P2 requests R1)
    
    Wait-For Graph (WFG):
    (P1) → (P2) → (P1)  (Cycle: P1 waits for P2, P2 waits for P1)
    This means P1 is waiting for a resource held by P2, and P2 is waiting for a resource held by P1.
                                    
  • Complexity: If there are 'n' processes, cycle detection is typically O(n+e) where 'e' is the number of edges, or O(n^2) in the worst case for dense graphs.

8.3. For Multiple Instance Resources: Safety Algorithm Variant

When resources have multiple instances, a cycle in the RAG is necessary but not sufficient for deadlock. A detection algorithm similar to the Banker's Algorithm's Safety Algorithm is used.

Data Structures (similar to Banker's, but `Request` instead of `Need`):

  • Available (Vector of length m): Number of available instances of each resource.
  • Allocation (n x m matrix): Number of instances of each resource currently allocated to each process.
  • Request (n x m matrix): The current request of each process. `Request[i][j] = k` means Pi is waiting for 'k' instances of Rj. Processes that are not waiting for anything have `Request[i]` as a zero vector.

Detection Algorithm Steps:

  1. Initialize Work = Available.
    Initialize Finish[i] = false for all processes Pi. For processes Pi that have Allocation[i] == 0 (not holding any resources), set Finish[i] = true initially if they are not requesting anything, or based on whether their current request is zero. More accurately, for any process Pi for which Allocation[i,j] == 0 for all j, set Finish[i] = true. (This step is slightly different from Safety Algo in Banker's as we are checking current state, not future potential).
    More precisely, initialize Work = Available. For i = 1 to n: if Allocation[i] is all zeros (i.e., Allocation[i][j] == 0 for all j), then Finish[i] = true; else Finish[i] = false.
  2. Find an index 'i' such that both:
    a. Finish[i] == false
    b. Request[i] <= Work (Pi's outstanding requests can be satisfied)
  3. If no such 'i' exists, go to step 5.
  4. Work = Work + Allocation[i] (Assume Pi gets its request, finishes, and releases all its resources)
    Finish[i] = true
    Go to step 2.
  5. If Finish[i] == false for any process 'i', then the system is in a deadlocked state. The processes 'Pi' for which Finish[i] == false are the deadlocked processes.

Complexity: O(m * n^2), similar to the Banker's safety algorithm.

8.4. Safety Algorithm (in Detection) vs. Banker's Algorithm (in Avoidance)

Your note is spot on and provides the core distinction:

"Banker’s algo is a deadlock avoidance algorithm. It's used *before* allocating resources. When a process requests a set of resources, the Banker's Algorithm checks if granting that request would lead to an unsafe state. If it would lead to an unsafe state, the request is denied (process waits). It requires prior knowledge of maximum resource needs (Max matrix) for all processes."

"The Safety algorithm used in deadlock detection is run *after* resources have been allocated (or requests have been made and processes are waiting) to see if the system is *currently* in a deadlocked state. It doesn't require prior knowledge of maximum resource needs; it only works with the current allocation (Allocation matrix) and outstanding requests (Request matrix)."

To further clarify:

  • Purpose:
    • Banker's (Avoidance): To decide if a *pending request* can be granted safely without risking future deadlock.
    • Safety for Detection: To determine if the *current system state* is already deadlocked.
  • Input Data:
    • Banker's: Uses `Max`, `Allocation`, `Available`, `Need` (derived from Max and Allocation).
    • Safety for Detection: Uses `Allocation`, `Available`, `Request` (current outstanding requests).
  • Outcome:
    • Banker's: "Yes, grant the request" or "No, process must wait." The goal is to stay in a safe state.
    • Safety for Detection: "System is deadlocked (and identifies which processes)" or "System is not deadlocked."

The "Safety Algorithm" is a common underlying logic pattern. In Banker's, it checks if a *hypothetical future state* (after granting a request) is safe based on *maximum future needs*. In detection, a similar logic checks if the *current state* allows all non-finished processes to complete based on their *current outstanding requests*.

9. Deadlock Recovery

Once a deadlock has been detected, the system must recover. There are two main approaches to recovery:

9.1. Process-Related Recovery: Process Termination

This involves aborting one or more processes to break the circular wait.

  1. Abort All Deadlocked Processes:
    • Description: This is the simplest and most drastic method. It will certainly break the deadlock cycle.
    • Consequence: Highly expensive, as all the computation done by these processes is lost. Processes may need to be restarted from scratch.
  2. Abort One Process at a Time Until the Deadlock Cycle is Eliminated:
    • Description: The system aborts one deadlocked process, reclaims its resources, and then re-runs the deadlock detection algorithm. This is repeated until the deadlock is resolved.
    • Challenge: Requires a rational basis for choosing which process to abort (the "victim"). Factors for selecting a victim include:
      • Process priority.
      • How long the process has computed and how much longer it needs.
      • Resources the process has used and is holding.
      • Number of processes that will need to be terminated.
      • Is the process interactive or batch?
    • Consequence: Less disruptive than aborting all, but still involves loss of work and potential overhead in repeatedly detecting deadlocks and selecting victims. There's also a risk of starvation if the same process is consistently chosen as a victim.

Your note "Abort currently running process" is a bit ambiguous. If a process is "running" (i.e., has the CPU and is not blocked), it's unlikely to be part of the deadlocked set directly, unless it's about to make a request that would complete the cycle. Typically, we abort processes that are *blocked* and part of the identified deadlock.

9.2. Resource-Related Recovery: Resource Preemption

This involves taking resources away from some processes and giving them to others until the deadlock is broken.

  1. Resource Preemption (Forceful Snatching):
    • Description: Successively preempt some resources from processes and give these resources to other processes until the deadlock cycle is broken. This is "kind of forceful snatching so that all the process could run."
    • Issues to Address:
      • Victim Selection: Which process and which resource to preempt? Minimize cost. Factors are similar to process termination victim selection.
      • Rollback: If a resource is preempted from a process, that process cannot continue with its normal execution. It must be rolled back to some safe state before it acquired that resource, so it can restart from that point later. This can be difficult or impossible if the process has modified data that cannot be easily undone.
      • Starvation: How to ensure that a process doesn't starve from having its resources constantly preempted? A process might be chosen as a victim repeatedly. One solution is to include the number of times a process has been preempted in the cost factor for victim selection.
  2. Rollback (as a specific preemption strategy):
    • Description: This is a more systematic form of preemption. The system periodically creates checkpoints (snapshots of process states and resource allocations). When a deadlock is detected, one or more processes are rolled back to a previous checkpoint. This rollback frees up resources held by the rolled-back process since that checkpoint.
    • Your note: "keep pre-empting latest resource and add them to available pool and run safety algo after each preemption until cycle is broken." This is a good iterative approach. After each preemption of a resource (or set of resources by rolling back a process), the system would add these resources to the `Available` pool and then re-run a detection algorithm (like the Safety Algorithm variant or WFG cycle detection) to see if the deadlock is resolved. This continues until the deadlock is broken.
    • Consequence: Requires checkpointing overhead. Determining how far to roll back is crucial. Rolling back too little might not break the deadlock; rolling back too much loses excessive computation.

10. Conclusion

Deadlock is a complex and serious problem in concurrent systems where processes compete for exclusive access to a finite set of resources. Understanding the four necessary conditions—Mutual Exclusion, Hold and Wait, No Preemption, and Circular Wait—is fundamental to comprehending how deadlocks arise and how they can be handled.

Strategies for dealing with deadlocks range from prevention (designing the system so deadlocks are impossible), to avoidance (making judicious resource allocation decisions to sidestep deadlocks), to detection and recovery (letting deadlocks happen and then fixing them). Each strategy has its own trade-offs in terms of system overhead, resource utilization, and implementation complexity.

  • Prevention can be too restrictive, leading to underutilized resources or developer inconvenience.
  • Avoidance, exemplified by the Banker's Algorithm, requires prior knowledge of maximum resource needs, which isn't always available, and can be computationally intensive.
  • Detection and Recovery adds runtime overhead for detection and can be disruptive when recovery actions like process termination or resource preemption are invoked.
  • Many general-purpose operating systems opt to ignore the problem for most user applications, assuming deadlocks are rare enough that the cost of sophisticated handling outweighs the benefits. However, for critical systems or database environments, robust deadlock handling is essential.

A thorough grasp of these concepts allows system designers and programmers to build more robust and reliable concurrent applications, minimizing the risks and impacts of deadlocks.