Toshit Jain

Selftape AI - Audition Automation App

2025-06-29T00:00:00+00:00

SelfTape-AI: AI-Powered Self-Tape Recording App for Actors

SelfTape-AI is a next-generation cross-platform app built for actors, content creators, and filmmakers to streamline the self-taping process. Whether you’re rehearsing scenes or submitting auditions, SelfTape-AI enables a seamless, professional-grade experience—right from your phone or desktop.

Why SelfTape-AI?

Self-taping has become the standard in auditions, but traditional methods can be cumbersome. With SelfTape-AI, creators can now rehearse or record scenes effortlessly by performing their lines while AI handles the rest—making solo practice and self-tapes faster, more immersive, and production-ready.

Key Features

Smart Script Handling
Upload a script and instantly identify characters and dialogues. No manual setup required.
Custom Role Assignment
Choose which roles you’ll play and assign others to AI—ideal for solo rehearsals or audition prep.
Live Scene Playback
The AI delivers lines for its assigned characters in real time, while you perform yours—just like a real scene partner.
Camera Integration
Your performance is captured using the front camera and automatically saved to your device.
Interactive Script Panel
Scroll through and follow your script live, with a clean, distraction-free interface.
Cross-Platform Support
Built with Flutter, SelfTape-AI runs on Android, iOS, and desktop platforms with a consistent, intuitive experience.

What You Need to Get Started

A modern smartphone or desktop
Flutter installed (for contributors or testers)
Optional: a Deepgram API key if contributing to the backend

Project Overview

.
├── backend/
│   └── Core Python server handling script logic and AI integration
└── sceneapp/
    └── Flutter frontend with seamless UI and camera integration

CUDA Based Options Pricing using Monte Carlo Simulation

2025-06-20T00:00:00+00:00

Introduction & Motivation

Hi. So the main drive behind this project was to learn parallel processing through CUDA. At the same time,I was also invested in learning Finance from the book Paul Wilmott Introduces Quantitative Finance. I completed the basics like knowledge about financial quantities and options and I cam across a method to price options which was Monte Carlo Simulation. I learned sometime before that we can parallelize any Monte Carlo Algorithm and so began my project. (Later, I decided to collaborate with one of my friends at IITG who was interested in the project as well).

The implementation of the whole project can be found here : Github Repository

CUDA Basics

For the CUDA Basics, I referred to a couple of videos and resources available online (some videos from IITM Profs, some from youtube, etc.) and learned things like writing CUDA Kernels and what are grids, blocks and things like cudaMemcpy.

Project Overview

Our implementation included the following :

Writing CPU Functions for European, Asian, Basket and American Options (for American used LSM so only CPU).
Writing GPU Kernels for European, Asian and Basket Options.
Make code as modular as possible. (Used Struct Functors for Template).
Benchmarking Class (Used RAII-type Timer class).
Integrate CLI (Used third-party cxxopts).

Payoffs Functor Structs

Following is the code for Payoff Functor Structs. I originally thought of using inheritance but turns out that CUDA Kernel does not support inheritances ( due to virtual tables begin local to the memory of host :( ).

#pragma once
#include 

struct CallPayoff {
  __device__ __host__ double operator()(double S, double K) const {
    return std::fmax(S - K, 0.0);
  }
};

struct PutPayoff {
  __device__ __host__ double operator()(double S, double K) const {
    return std::fmax(K - S, 0.0);
  }
};

European Options

European Options Class and CPU Implementation

We used templated class with functor structs to make the European Options Class. We have some variables (S0, K, sigma…) and two methods : CPU and CPU wrapper (for GPU Kernel)

template <typename Payoff> class EuropeanOption {
private:
  double S0, K, r, sigma, T;

public:
  EuropeanOption(double _S0, double _K, double _r, double _sigma, double _T)
      : S0(_S0), K(_K), r(_r), sigma(_sigma), T(_T) {}

  double europeanOptionCPU(int paths);
  double europeanOptionGPU(int paths);
};

The CPU Implementation uses Black Scholes Equation. The number of paths are taken as input and then calculated. We used fixed seed random variable for reproducable results.

template <typename Payoff>
double EuropeanOption<Payoff>::europeanOptionCPU(int paths) {

  std::mt19937_64 rng(42);
  std::normal_distribution<double> norm(0.0, 1.0);

  double payoff_sum = 0.0;
  Payoff payoff;

  for (int i = 0; i < paths; ++i) {
    double Z = norm(rng);
    double ST =
        S0 * std::exp((r - 0.5 * sigma * sigma) * T + sigma * std::sqrt(T) * Z);
    double curr_payoff = payoff(ST, K);
    payoff_sum += curr_payoff;
  }

  return std::exp(-r * T) * (payoff_sum / paths);
}

European Options GPU Kernel

We have used one GPU Kernel initialized per path (we could have used batches as well but nvm). Standard Black Scholes are used per path. Following is the implementation for the same :

template <typename Payoff>
__global__ void europeanOptionGPUKernel(double S0, double K, double r,
                                        double sigma, double T, int paths,
                                        double *results, Payoff payoff) {
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if (idx >= paths)
    return;

  curandState state;
  curand_init(42ULL, idx, 0, &state);

  double Z = curand_normal_double(&state);
  double ST = S0 * exp((r - 0.5 * sigma * sigma) * T + sigma * sqrt(T) * Z);
  results[idx] = payoff(ST, K);
}

// Kernel Wrapper
template <typename Payoff>
double EuropeanOption<Payoff>::europeanOptionGPU(int paths) {
  double *d_results = nullptr;
  cudaMalloc(&d_results, paths * sizeof(double));

  int blockSize = 256;
  int gridSize = (paths + blockSize - 1) / blockSize;
  Payoff payoff;

  europeanOptionGPUKernel<<<gridSize, blockSize>>>(S0, K, r, sigma, T, paths,
                                                   d_results, payoff);
  cudaDeviceSynchronize();

  std::vector<double> h_results(paths);
  cudaMemcpy(h_results.data(), d_results, paths * sizeof(double),
             cudaMemcpyDeviceToHost);

  cudaFree(d_results);

  double sum = 0.0;
  for (double payoff_results : h_results) {
    sum += payoff_results;
  }

  return exp(-r * T) * (sum / static_cast<double>(paths));
}

Asian Options

For Asian Options, we have used Arithmetic Mean over last X fixings to find the Payoff. The implementation was similar to European just had to make one extra loop inside for all dt = T/tradingDays steps.

The implementation of CPU function, wrapper and kernel is given in the Github Repo.

Basket Options

This was cool! So I came to know about these options from a PS of a Hackathon. So these type of options have a portfolio where multiple assets are considered as underlying with weights assigned to each asset. The problem is that these can be correlated with other and so we use Choleskey Method. I do not really understand how this works and used it as a black box for this project. It uses Lower Triangular Matrix(L) of the Correlation Matrix(R) to change the Normal Random Variable vector (Y) to Correlated Normal Random Variable vector (Z).

The implementation of CPU function, wrapper and kernel is given in the Github Repo.

American Options

The initial method I thought of using was traversing back the Binomial Tree which we build in the Binomial Method. However, I later found out about Least-Mean Square algorithm and read it from the original research paper. Also found a couple of implementations on youtube which made it easier. The main manipulation was evaluating regression equation using the following function :

__host__ __device__ void quadraticRegression(double *X, double *Y, int n,
                                             double &a0, double &a1,
                                             double &a2) {
  double Sx = 0, Sx2 = 0, Sx3 = 0, Sx4 = 0;
  double Sy = 0, Sxy = 0, Sx2y = 0;

  for (int i = 0; i < n; ++i) {
    double x = X[i];
    double x2 = x * x;
    double y = Y[i];

    Sx += x;
    Sx2 += x2;
    Sx3 += x2 * x;
    Sx4 += x2 * x2;
    Sy += y;
    Sxy += x * y;
    Sx2y += x2 * y;
  }

  double D = n * (Sx2 * Sx4 - Sx3 * Sx3) - Sx * (Sx * Sx4 - Sx2 * Sx3) +
             Sx2 * (Sx * Sx3 - Sx2 * Sx2);

  if (D < 0) {
    D *= -1;
  }

  if (D < 1e-12) {
    a0 = 0;
    a1 = 0;
    a2 = 0;
    return;
  }

  double D0 = Sy * (Sx2 * Sx4 - Sx3 * Sx3) - Sx * (Sxy * Sx4 - Sx3 * Sx2y) +
              Sx2 * (Sxy * Sx3 - Sx2 * Sx2y);

  double D1 = n * (Sxy * Sx4 - Sx3 * Sx2y) - Sy * (Sx * Sx4 - Sx2 * Sx3) +
              Sx2 * (Sx * Sx2y - Sxy * Sx2);

  double D2 = n * (Sx2 * Sx2y - Sxy * Sx3) - Sx * (Sx * Sx2y - Sxy * Sx2) +
              Sy * (Sx * Sx3 - Sx2 * Sx2);

  a0 = D0 / D;
  a1 = D1 / D;
  a2 = D2 / D;
}

The implementation of CPU function is given in the Github Repo. The GPU method cannot work here as regression is inherently sequential.

Benchmarking RAII Class

RAII - Resource Allocation is Initialization (done using Constructors and Destructors of Timer Class)

class Timer {
private:
    std::string label;
    std::chrono::high_resolution_clock::time_point start;
    
public:
    Timer(const std::string& label = "") : label(label), start(std::chrono::high_resolution_clock::now()) {}

    double getDuration() const {
      auto end = std::chrono::high_resolution_clock::now();
      return std::chrono::duration<double>(end - start).count();
    }

    ~Timer() {
        auto end = std::chrono::high_resolution_clock::now();
        double duration = std::chrono::duration<double>(end - start).count();
        std::cout << label << " took " << duration << " seconds." << std::endl;
    }
};

// Sample Implementation
{
    Timer timer("CPU Function") // Constructor Called
    // Call to CPU Function
} // Destructor Called

CLI Integration

Used cxxopts for CLI integration. This header was cloned from its Github Repo.

Benchmark Results

The benchmark results are also included in the Github Repo in Results.ipynb. Following were the results :

European Options a speed up of 2.815X on 10^7 paths.
Asian Options a speed up of 190.35X on 10^6 paths.
Basket Options a speed up of 88.28X on 10^7 paths.

Conclusion and Learnings

This project was a great one for learning Multithreading using CUDA and applying it to Finance. I learned not only about CUDA but also about some cool option types, things like RAII and Functor Structs. Overall a very positive learning outcome :)

Chained Unary Minus Resolution

2025-01-30T00:00:00+00:00

Introduction and Motivation

I came to know about Open Source in July-October of 2024 and was really excited to have a contribution for myself as well. I was interested in C++ and thus landed on Clang Based Automatic Differentiation (CLAD) OS repository. I did a contribution in December 2024 to January 2025 which I will be explaining in this blog. You can view the PR here : PR Link Clad Repository : Link

Clang Based Automatic Differentiation

It is a LLVM compiler infrastructure and a plugin for Clang. It enables automatic differentiation for functions in C++. Information about Automatic Differentiation can be found here.

Issue and naive solution

CLAD parses the code presented using Clang’s Abstract Syntax Tree (AST) using LLVM libraries. The given function is parsed at compile time and a derivative is added in the object file. The issue I worked on was resolving repeated use of Unary Minuses. So, the issue was when we used chained minuses, like :

Expression : -(-(-x))
The differentiation should be : -1
However, due to mishandling it displayed - - -1

Naive solution which I presented at first was to resolve it at run time (which was obviously pointed out and was wrong !!). I came to know about Clang and LLVM then and started learning the required stuff as recommended by the maintainer of the repo.

Resolution Function

The following function was added.

 Expr* VisitorBase::ResolveUnaryMinus(Expr* E, SourceLocation OpLoc) {
    if (auto* UO = llvm::dyn_cast<clang::UnaryOperator>(E)) {
      if (UO->getOpcode() == clang::UO_Minus)
        return (UO->getSubExpr())->IgnoreParens();
    }
    Expr* E_LHS = E;
    while (auto* BO = llvm::dyn_cast<BinaryOperator>(E_LHS))
      E_LHS = BO->getLHS();
    if (auto* UO = llvm::dyn_cast<clang::UnaryOperator>(E_LHS->IgnoreCasts())) {
      if (UO->getOpcode() == clang::UO_Minus)
        E = m_Sema.ActOnParenExpr(E->getBeginLoc(), E->getEndLoc(), E).get();
    }
    return m_Sema.BuildUnaryOp(nullptr, OpLoc, clang::UO_Minus, E).get();
  }

The function does the following things :

Recursively traverses down the expression
The first section adds unary minus and removes unary minus if already present, ensuring proper meaning. Example : -(-(-x)) starts as x -> -x -> x -> -x
The second part iteratively travels down the leftmost expression if binary expression and adds Unary Minus after adding parens to the expression. Example : Unary minus on -a+b is -(-a+b) and not - -a+b = a+b (Wrong!)

This function, though seems small, deals with all the edge cases regarding Unary Minus.

Unit testing

The following unit tests were added. The first one deals with the incorrect parsing in independent expressions while the second deals with incorrect parsing for Unary Minus over expressions with Binary Operators.

// RUN: %cladclang %s -I%S/../../include -oUnaryMinus.out 2>&1 | %filecheck %s
// RUN: ./UnaryMinus.out | %filecheck_exec %s
// RUN: %cladclang -Xclang -plugin-arg-clad -Xclang -enable-tbr %s -I%S/../../include -oUnaryMinus.out
// RUN: ./UnaryMinus.out | %filecheck_exec %s

#include "clad/Differentiator/Differentiator.h"

#include "../TestUtils.h"

double f1(double x)
{
    return -(-(-1))*-(-(-x));
}

//CHECK: void f1_grad(double x, double *_d_x) {
//CHECK-NEXT:    *_d_x += -(-1 * 1);
//CHECK-NEXT: }

double f2(double x, double y)
{
    return -2*-(-(-x))*-y - 1*(-y)*(-(-x));
}

//CHECK: void f2_grad(double x, double y, double *_d_x, double *_d_y) {
//CHECK-NEXT:    {
//CHECK-NEXT:        *_d_x += -(-2 * 1 * -y);
//CHECK-NEXT:        *_d_y += -(-2 * -x * 1);
//CHECK-NEXT:        *_d_y += -1 * -1 * x;
//CHECK-NEXT:        *_d_x += 1 * -y * -1;
//CHECK-NEXT:    }
//CHECK-NEXT: }

double dx;
double arr[2] = {};
int main(){

    INIT_GRADIENT(f1);
    INIT_GRADIENT(f2);

    TEST_GRADIENT(f1, 1, 5, &dx); // CHECK-EXEC: 1.00
    TEST_GRADIENT(f2, 2, 3, 4, &arr[0], &arr[1]) // CHECK-EXEC: {-4.00, -3.00}
}

Learnings

This contribution was my first ever to C++ as well as Clad. I learned a lot about Clang and LLVM. Looking forward to more contributions to Clad in the future !!

ThreadsafeQueueLib: Design and Motivation of Concurrent Queues in C++

2025-01-19T00:00:00+00:00

Project Overview

ThreadsafeQueueLib is a project developed under the Coding Club, IIT Guwahati, with the aim of designing and implementing a family of thread-safe, lock-free, and wait-free queue data structures in modern C++. The project is motivated by real-world concurrent systems where classical standard library containers such as std::queue fail under contention due to race conditions and lack of atomicity guarantees.

This document outlines the motivation, background, concurrency issues in existing queues, and the design goals of the ThreadsafeQueueLib project.

Introduction

With the latest advancements in the C++ language and its standard library, there is now substantial scope for developing efficient systems in the domains of multithreading and parallel processing. However, many classical containers in the C++ Standard Library—most notably std::queue—were not designed for true concurrent usage.

As a result, these containers often break under concurrent access, leading to race conditions and undefined behavior. Among these issues, data races are particularly dangerous and form the primary focus of this project.

In the context of queues, there are typically two participants:

Producers, which generate data and push it into the queue
Consumers, which retrieve and process that data

This producer–consumer abstraction appears across numerous domains such as:

Airport queues
Operating system process schedulers
High-frequency trading (HFT) systems

HFT systems, in particular, served as a major inspiration for this project.

The objective is to design and implement a family of lock-free and wait-free queues that can safely support multiple producers and multiple consumers operating concurrently on a shared structure, while avoiding data races and ensuring correctness.

Race Conditions and Data Races

Race Conditions

Consider a real-world example: buying movie tickets at a cinema with multiple cashiers. If two customers attempt to book the last few seats simultaneously, the final outcome depends on who completes the transaction first. This is a classic race condition—the result depends on the relative ordering of independent operations.

In concurrent programming, a race condition refers to any scenario where program behavior depends on the interleaving of operations across multiple threads.

Data Races

The C++ Standard defines a more severe form of race condition known as a data race.

A data race occurs when:

Two or more threads access the same memory location concurrently, and
At least one of those accesses is a write, and
There is no proper synchronization

A data race results in undefined behavior, making the program fundamentally unsafe and unpredictable.

In queue-based systems, data races typically occur when:

Multiple producers modify the queue tail concurrently
A consumer reads an element while a producer is still writing it
Internal pointers or indices are accessed without synchronization

Even a single unsynchronized read–write or write–write overlap can corrupt the logical structure of the queue. This explains why naïve implementations of std::queue or simple linked-list queues cannot be safely shared across threads.

Race Conditions Specific to `std::queue`

The standard std::queue implementation suffers from multiple race conditions in concurrent environments because its interface provides no atomicity guarantees.

Race Between `empty()` and `front()`

A common consumer pattern is:

if (!q.empty()) {
    int value = q.front();
    q.pop();
    do_something(value);
}

In single-threaded code, this construct is perfectly safe: calling front() on an empty queue is undefined behavior, and the preceding empty() check ensures that this does not happen.

However, when multiple consumers operate on the same shared queue, this becomes a classic race condition. Another thread may call pop() in the small window between empty() and front(), removing the last element and causing a second thread to read from an empty queue. Even protecting the queue with a std::mutex does not help, because these operations are not atomic and must be performed as a single indivisible step.

Race Between `front()` and `pop()`

A second race occurs when two consumer threads execute the following pattern concurrently.

If both threads observe the same element at the front of the queue (because nothing modifies the queue between their calls to front()), and both subsequently invoke pop(), then the following issues arise:

One item is processed twice
Another item is discarded without ever being read

This violates the fundamental FIFO invariant of queues and clearly demonstrates why naïve concurrent use of std::queue is unsafe. These cases motivate the need for properly designed threadsafe queues which guarantee atomicity and correctness under contention.

Targets of the Project

Having identified the limitations of std::queue in concurrent settings, our goal is to design a new data structure — the ThreadsafeQueue — that eliminates these issues while providing a flexible and performant interface for both developers and high-throughput systems.

Before diving into implementation details, we outline the features and design goals from a user’s point of view.

Bounded and Unbounded Queues

A robust concurrent queue should support both bounded and unbounded modes.

Bounded Mode

A bounded queue restricts the maximum number of elements it can hold. This is typically implemented using a circular buffer, which offers extremely fast index arithmetic and predictable memory usage.

Bounded queues are commonly used in:

Real-time systems
Embedded applications
Thread pools
High-frequency trading (HFT) pipelines

In such systems, memory must remain under strict control.

Unbounded Mode

An unbounded queue grows as needed, often backed by a linked list or a dynamically resizing container such as std::queue. This mode is more flexible and useful when throughput is high and memory is abundant.

The user should be able to specify, at construction time:

Whether the queue is bounded or unbounded
If bounded, the maximum capacity

This configurability ensures that the ThreadsafeQueue can be deployed across a wide range of environments.

SPSC, MPSC, and MPMC Support

Different applications require different concurrency models. A well-designed concurrent queue must support all three primary configurations:

SPSC — Single Producer, Single Consumer

This is the simplest model and allows aggressive optimizations such as:

Eliminating expensive memory fences
Using cache-friendly ring buffers

MPSC — Multiple Producers, Single Consumer

This model is common in:

Logging systems
Event queues

It requires safe atomic coordination among multiple producer threads.

MPMC — Multiple Producers, Multiple Consumers

This is the most general and complex case. Ensuring correctness under high contention requires careful handling of:

ABA problems
Atomic pointer manipulation
Memory-ordering guarantees

Our goal is to provide a unified interface that works across all these models while maintaining correctness and high performance.

Blocking vs Lock-Free Implementations

The ThreadsafeQueue should allow users to choose between blocking and non-blocking designs.

Blocking Mode

Implemented using std::mutex and std::condition_variable, blocking queues are easier to implement and reason about. However, they may suffer from:

Context-switch overhead
Potential priority inversion
Poor scalability under contention

Lock-Free (Non-Blocking) Mode

Lock-free queues provide progress guarantees such as lock-freedom or wait-freedom. These algorithms rely on:

Atomic operations
Memory-ordering semantics
Careful ABA prevention strategies

Lock-free queues excel in high-performance systems and avoid blocking delays. Due to their superior scalability and suitability for multicore environments, this project primarily focuses on lock-free implementations with a bounded implementation as well for the sake of learning.

Template Metaprogramming for Compile-Time Optimisation

Rather than designing separate runtime classes for SPSC, MPSC, MPMC, bounded, and unbounded variants, which would lead to code duplication and runtime overhead, this project uses C++ template metaprogramming.

By selecting queue characteristics at compile time, the compiler can:

Remove unused branches and modes
Apply aggressive optimizations
Generate highly specialized data structures for each use case

The user specifies the mode, capacity, concurrency type, and blocking behavior through template parameters, enabling maximum performance without sacrificing flexibility.

Teams for the Project and Schedule

This is the GitHub repository (will add hyperlink later) for the project. The repository is currently private and will be made public after TBD. (I am currently figuring some things out about the implementation so once everything is final, I’ll make it public.)

Participants are divided into teams with assigned Points of Contact (POCs). These POCs are solely for the sake of building responsibility for the project.

Team Assignments

Team	Members	POC
Team A	Aryan Gupta, Naveen, Rishit, Keshav	Aryan Gupta
Team B	Ritesh, Abhiraj, Abhigyan, Ritwik, Prabhnoor	Ritesh
Team C	Tushar, Avanish, Mehul, Aryan Chakravorty	Aryan Chakravorty

Session Plans

Date	Time	Session Name	PDF Link	Video Link
22nd January 2026	9 PM to 10 PM	Concurrency, std::thread and more	Link	-
24th January 2026	3 PM to 4:30 PM	Shared data, race conditions and std::mutex	Link	Link
28th January 2025	10.05 PM to 11.30 PM	Synchronization and Condition variables	Link	Link
Till 4th Feb 2026	-	CppCon 2017 “C++ atomics, from basic to advanced. What do they really do?”	-	Link
21st Feb 2026	3.30 PM to 5.00 PM	C++ Memory Model and std::atomic - Part 1	Link	Link
TBD	9 PM to 10 PM	C++ Memory Model and std::atomic - Part 2	Link	Link
TBD	9 PM to 10 PM	Project Description	Link	Link
TBD	9 PM to 10 PM	Buffer	Link	Link
TBD	9 PM to 10 PM	Template Metaprogramming - Part 1 (later)	Link	Link
TBD	9 PM to 10 PM	Template Metaprogramming - Part 2 (later)	Link	Link
TBD	9 PM to 10 PM	Template Metaprogramming - Part 3 (later)	Link	Link

Weekly Targets

Week	Duration	Team Targets
Week 1	8th Dec ’25 – 14th Dec ’25	Complete CPP101 (All Teams)
Week 2	15th Dec ’25 – 21st Dec ’25	Complete CPP101 (All Teams)
Week 3	22nd Dec ’25 – 28th Dec ’25	Complete CPP101 (All Teams)
Week 4	29th Dec ’25 – 4th Jan ’26	Complete CPP101 (All Teams)
Week 5	5th Jan ’26 – 11th Jan ’26	Complete CPP101 (All Teams)
Week 6	12th Jan ’26 – 18th Jan ’26	Complete CPP101 (All Teams)
Week 7	19th Jan ’26 – 25th Jan ’26	Sessions
Week 8	26th Jan ’26 – 1st Feb ’26	Sessions

I’ll add weekly targets when we start the project.

Toshit Jain

Selftape AI - Audition Automation App

SelfTape-AI: AI-Powered Self-Tape Recording App for Actors

Why SelfTape-AI?

Key Features

What You Need to Get Started

Project Overview

CUDA Based Options Pricing using Monte Carlo Simulation

Introduction & Motivation

CUDA Basics

Project Overview

Payoffs Functor Structs

European Options

European Options Class and CPU Implementation

European Options GPU Kernel

Asian Options

Basket Options

American Options

Benchmarking RAII Class

CLI Integration

Benchmark Results

Conclusion and Learnings

Chained Unary Minus Resolution

Introduction and Motivation

Clang Based Automatic Differentiation

Issue and naive solution

Resolution Function

Unit testing

Learnings

ThreadsafeQueueLib: Design and Motivation of Concurrent Queues in C++

Project Overview

Introduction

Race Conditions and Data Races

Race Conditions

Data Races

Race Conditions Specific to std::queue

Race Between empty() and front()

Race Between front() and pop()

Targets of the Project

Bounded and Unbounded Queues

Bounded Mode

Unbounded Mode

SPSC, MPSC, and MPMC Support

SPSC — Single Producer, Single Consumer

MPSC — Multiple Producers, Single Consumer

MPMC — Multiple Producers, Multiple Consumers

Blocking vs Lock-Free Implementations

Blocking Mode

Lock-Free (Non-Blocking) Mode

Template Metaprogramming for Compile-Time Optimisation

Teams for the Project and Schedule

Team Assignments

Session Plans

Weekly Targets

Race Conditions Specific to `std::queue`

Race Between `empty()` and `front()`

Race Between `front()` and `pop()`