<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://toshit3q34.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://toshit3q34.github.io/" rel="alternate" type="text/html" /><updated>2026-02-23T15:12:04+00:00</updated><id>https://toshit3q34.github.io/feed.xml</id><title type="html">Toshit Jain</title><subtitle>An amazing website.</subtitle><author><name>Toshit Jain</name></author><entry><title type="html">Selftape AI - Audition Automation App</title><link href="https://toshit3q34.github.io/Selftape-AI/" rel="alternate" type="text/html" title="Selftape AI - Audition Automation App" /><published>2025-06-29T00:00:00+00:00</published><updated>2025-06-29T00:00:00+00:00</updated><id>https://toshit3q34.github.io/Selftape-AI</id><content type="html" xml:base="https://toshit3q34.github.io/Selftape-AI/"><![CDATA[<h2 id="selftape-ai-ai-powered-self-tape-recording-app-for-actors">SelfTape-AI: AI-Powered Self-Tape Recording App for Actors</h2>

<p><strong>SelfTape-AI</strong> is a next-generation cross-platform app built for actors, content creators, and filmmakers to streamline the self-taping process. Whether you’re rehearsing scenes or submitting auditions, SelfTape-AI enables a seamless, professional-grade experience—right from your phone or desktop.</p>

<hr />

<h2 id="why-selftape-ai">Why SelfTape-AI?</h2>

<p>Self-taping has become the standard in auditions, but traditional methods can be cumbersome. With <strong>SelfTape-AI</strong>, creators can now rehearse or record scenes effortlessly by performing their lines while AI handles the rest—making solo practice and self-tapes faster, more immersive, and production-ready.</p>

<hr />

<h2 id="key-features">Key Features</h2>

<ul>
  <li>
    <p><strong>Smart Script Handling</strong><br />
Upload a script and instantly identify characters and dialogues. No manual setup required.</p>
  </li>
  <li>
    <p><strong>Custom Role Assignment</strong><br />
Choose which roles you’ll play and assign others to AI—ideal for solo rehearsals or audition prep.</p>
  </li>
  <li>
    <p><strong>Live Scene Playback</strong><br />
The AI delivers lines for its assigned characters in real time, while you perform yours—just like a real scene partner.</p>
  </li>
  <li>
    <p><strong>Camera Integration</strong><br />
Your performance is captured using the front camera and automatically saved to your device.</p>
  </li>
  <li>
    <p><strong>Interactive Script Panel</strong><br />
Scroll through and follow your script live, with a clean, distraction-free interface.</p>
  </li>
  <li>
    <p><strong>Cross-Platform Support</strong><br />
Built with Flutter, SelfTape-AI runs on Android, iOS, and desktop platforms with a consistent, intuitive experience.</p>
  </li>
</ul>

<hr />

<h2 id="what-you-need-to-get-started">What You Need to Get Started</h2>

<ul>
  <li>A modern smartphone or desktop</li>
  <li>Flutter installed (for contributors or testers)</li>
  <li>Optional: a Deepgram API key if contributing to the backend</li>
</ul>

<hr />

<h2 id="project-overview">Project Overview</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">.</span>
├── backend/
│   └── Core Python server handling script logic and AI integration
└── sceneapp/
    └── Flutter frontend with seamless UI and camera integration
</code></pre></div></div>]]></content><author><name>Toshit Jain</name></author><category term="Flutter" /><category term="ML/AI" /><category term="Project" /><summary type="html"><![CDATA[SelfTape-AI: AI-Powered Self-Tape Recording App for Actors]]></summary></entry><entry><title type="html">CUDA Based Options Pricing using Monte Carlo Simulation</title><link href="https://toshit3q34.github.io/CUDA-Based-Options-Pricing/" rel="alternate" type="text/html" title="CUDA Based Options Pricing using Monte Carlo Simulation" /><published>2025-06-20T00:00:00+00:00</published><updated>2025-06-20T00:00:00+00:00</updated><id>https://toshit3q34.github.io/CUDA-Based-Options-Pricing</id><content type="html" xml:base="https://toshit3q34.github.io/CUDA-Based-Options-Pricing/"><![CDATA[<h2 id="introduction--motivation">Introduction &amp; Motivation</h2>

<p>Hi. So the main drive behind this project was to learn parallel processing through CUDA. At the same time,I was also invested in learning Finance from the book <em>Paul Wilmott Introduces Quantitative Finance</em>. I completed the basics like knowledge about financial quantities and options and I cam across a method to price
options which was <em>Monte Carlo Simulation</em>. I learned sometime before that we can parallelize any Monte Carlo Algorithm and so began my project. (Later, I decided to collaborate with one of my friends at IITG who was interested in the project as well).</p>

<p>The implementation of the whole project can be found here : <a href="https://github.com/toshit3q34/CUDA-Based-Options-Pricing">Github Repository</a></p>

<h2 id="cuda-basics">CUDA Basics</h2>

<p>For the CUDA Basics, I referred to a couple of videos and resources available online (some videos from IITM Profs, some from youtube, etc.) and learned things like writing CUDA Kernels and what are grids, blocks and things like cudaMemcpy.</p>

<h2 id="project-overview">Project Overview</h2>

<p>Our implementation included the following :</p>
<ul>
  <li>Writing CPU Functions for European, Asian, Basket and American Options (for American used LSM so only CPU).</li>
  <li>Writing GPU Kernels for European, Asian and Basket Options.</li>
  <li>Make code as modular as possible. (Used Struct Functors for Template).</li>
  <li>Benchmarking Class (Used RAII-type Timer class).</li>
  <li>Integrate CLI (Used third-party cxxopts).</li>
</ul>

<h2 id="payoffs-functor-structs">Payoffs Functor Structs</h2>

<p>Following is the code for Payoff Functor Structs. I originally thought of using inheritance but turns out that CUDA Kernel does not support inheritances ( due to virtual tables begin local to the memory of host :( ).</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#pragma once
#include</span> <span class="cpf">&lt;cmath&gt;</span><span class="cp">
</span>
<span class="k">struct</span> <span class="nc">CallPayoff</span> <span class="p">{</span>
  <span class="n">__device__</span> <span class="n">__host__</span> <span class="kt">double</span> <span class="k">operator</span><span class="p">()(</span><span class="kt">double</span> <span class="n">S</span><span class="p">,</span> <span class="kt">double</span> <span class="n">K</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">fmax</span><span class="p">(</span><span class="n">S</span> <span class="o">-</span> <span class="n">K</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">);</span>
  <span class="p">}</span>
<span class="p">};</span>

<span class="k">struct</span> <span class="nc">PutPayoff</span> <span class="p">{</span>
  <span class="n">__device__</span> <span class="n">__host__</span> <span class="kt">double</span> <span class="k">operator</span><span class="p">()(</span><span class="kt">double</span> <span class="n">S</span><span class="p">,</span> <span class="kt">double</span> <span class="n">K</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span>
    <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">fmax</span><span class="p">(</span><span class="n">K</span> <span class="o">-</span> <span class="n">S</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">);</span>
  <span class="p">}</span>
<span class="p">};</span>
</code></pre></div></div>

<h2 id="european-options">European Options</h2>

<h3 id="european-options-class-and-cpu-implementation">European Options Class and CPU Implementation</h3>

<p>We used templated class with functor structs to make the European Options Class. We have some variables (S0, K, sigma…) and two methods : CPU and CPU wrapper (for GPU Kernel)</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Payoff</span><span class="p">&gt;</span> <span class="k">class</span> <span class="nc">EuropeanOption</span> <span class="p">{</span>
<span class="nl">private:</span>
  <span class="kt">double</span> <span class="n">S0</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">T</span><span class="p">;</span>

<span class="nl">public:</span>
  <span class="n">EuropeanOption</span><span class="p">(</span><span class="kt">double</span> <span class="n">_S0</span><span class="p">,</span> <span class="kt">double</span> <span class="n">_K</span><span class="p">,</span> <span class="kt">double</span> <span class="n">_r</span><span class="p">,</span> <span class="kt">double</span> <span class="n">_sigma</span><span class="p">,</span> <span class="kt">double</span> <span class="n">_T</span><span class="p">)</span>
      <span class="o">:</span> <span class="n">S0</span><span class="p">(</span><span class="n">_S0</span><span class="p">),</span> <span class="n">K</span><span class="p">(</span><span class="n">_K</span><span class="p">),</span> <span class="n">r</span><span class="p">(</span><span class="n">_r</span><span class="p">),</span> <span class="n">sigma</span><span class="p">(</span><span class="n">_sigma</span><span class="p">),</span> <span class="n">T</span><span class="p">(</span><span class="n">_T</span><span class="p">)</span> <span class="p">{}</span>

  <span class="kt">double</span> <span class="n">europeanOptionCPU</span><span class="p">(</span><span class="kt">int</span> <span class="n">paths</span><span class="p">);</span>
  <span class="kt">double</span> <span class="n">europeanOptionGPU</span><span class="p">(</span><span class="kt">int</span> <span class="n">paths</span><span class="p">);</span>
<span class="p">};</span>
</code></pre></div></div>

<p>The CPU Implementation uses Black Scholes Equation. The number of paths are taken as input and then calculated. We used fixed seed random variable for reproducable results.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Payoff</span><span class="p">&gt;</span>
<span class="kt">double</span> <span class="n">EuropeanOption</span><span class="o">&lt;</span><span class="n">Payoff</span><span class="o">&gt;::</span><span class="n">europeanOptionCPU</span><span class="p">(</span><span class="kt">int</span> <span class="n">paths</span><span class="p">)</span> <span class="p">{</span>

  <span class="n">std</span><span class="o">::</span><span class="n">mt19937_64</span> <span class="n">rng</span><span class="p">(</span><span class="mi">42</span><span class="p">);</span>
  <span class="n">std</span><span class="o">::</span><span class="n">normal_distribution</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">norm</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>

  <span class="kt">double</span> <span class="n">payoff_sum</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>
  <span class="n">Payoff</span> <span class="n">payoff</span><span class="p">;</span>

  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">paths</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">double</span> <span class="n">Z</span> <span class="o">=</span> <span class="n">norm</span><span class="p">(</span><span class="n">rng</span><span class="p">);</span>
    <span class="kt">double</span> <span class="n">ST</span> <span class="o">=</span>
        <span class="n">S0</span> <span class="o">*</span> <span class="n">std</span><span class="o">::</span><span class="n">exp</span><span class="p">((</span><span class="n">r</span> <span class="o">-</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">sigma</span> <span class="o">*</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">*</span> <span class="n">T</span> <span class="o">+</span> <span class="n">sigma</span> <span class="o">*</span> <span class="n">std</span><span class="o">::</span><span class="n">sqrt</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">*</span> <span class="n">Z</span><span class="p">);</span>
    <span class="kt">double</span> <span class="n">curr_payoff</span> <span class="o">=</span> <span class="n">payoff</span><span class="p">(</span><span class="n">ST</span><span class="p">,</span> <span class="n">K</span><span class="p">);</span>
    <span class="n">payoff_sum</span> <span class="o">+=</span> <span class="n">curr_payoff</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">r</span> <span class="o">*</span> <span class="n">T</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">payoff_sum</span> <span class="o">/</span> <span class="n">paths</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h3 id="european-options-gpu-kernel">European Options GPU Kernel</h3>

<p>We have used one GPU Kernel initialized per path (we could have used batches as well but nvm). Standard Black Scholes are used per path.
Following is the implementation for the same :</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Payoff</span><span class="p">&gt;</span>
<span class="n">__global__</span> <span class="kt">void</span> <span class="nf">europeanOptionGPUKernel</span><span class="p">(</span><span class="kt">double</span> <span class="n">S0</span><span class="p">,</span> <span class="kt">double</span> <span class="n">K</span><span class="p">,</span> <span class="kt">double</span> <span class="n">r</span><span class="p">,</span>
                                        <span class="kt">double</span> <span class="n">sigma</span><span class="p">,</span> <span class="kt">double</span> <span class="n">T</span><span class="p">,</span> <span class="kt">int</span> <span class="n">paths</span><span class="p">,</span>
                                        <span class="kt">double</span> <span class="o">*</span><span class="n">results</span><span class="p">,</span> <span class="n">Payoff</span> <span class="n">payoff</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">int</span> <span class="n">idx</span> <span class="o">=</span> <span class="n">blockIdx</span><span class="p">.</span><span class="n">x</span> <span class="o">*</span> <span class="n">blockDim</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">threadIdx</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
  <span class="k">if</span> <span class="p">(</span><span class="n">idx</span> <span class="o">&gt;=</span> <span class="n">paths</span><span class="p">)</span>
    <span class="k">return</span><span class="p">;</span>

  <span class="n">curandState</span> <span class="n">state</span><span class="p">;</span>
  <span class="n">curand_init</span><span class="p">(</span><span class="mi">42ULL</span><span class="p">,</span> <span class="n">idx</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">);</span>

  <span class="kt">double</span> <span class="n">Z</span> <span class="o">=</span> <span class="n">curand_normal_double</span><span class="p">(</span><span class="o">&amp;</span><span class="n">state</span><span class="p">);</span>
  <span class="kt">double</span> <span class="n">ST</span> <span class="o">=</span> <span class="n">S0</span> <span class="o">*</span> <span class="n">exp</span><span class="p">((</span><span class="n">r</span> <span class="o">-</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">sigma</span> <span class="o">*</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">*</span> <span class="n">T</span> <span class="o">+</span> <span class="n">sigma</span> <span class="o">*</span> <span class="n">sqrt</span><span class="p">(</span><span class="n">T</span><span class="p">)</span> <span class="o">*</span> <span class="n">Z</span><span class="p">);</span>
  <span class="n">results</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="o">=</span> <span class="n">payoff</span><span class="p">(</span><span class="n">ST</span><span class="p">,</span> <span class="n">K</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Kernel Wrapper</span>
<span class="k">template</span> <span class="o">&lt;</span><span class="k">typename</span> <span class="nc">Payoff</span><span class="p">&gt;</span>
<span class="kt">double</span> <span class="n">EuropeanOption</span><span class="o">&lt;</span><span class="n">Payoff</span><span class="o">&gt;::</span><span class="n">europeanOptionGPU</span><span class="p">(</span><span class="kt">int</span> <span class="n">paths</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">double</span> <span class="o">*</span><span class="n">d_results</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
  <span class="n">cudaMalloc</span><span class="p">(</span><span class="o">&amp;</span><span class="n">d_results</span><span class="p">,</span> <span class="n">paths</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">double</span><span class="p">));</span>

  <span class="kt">int</span> <span class="n">blockSize</span> <span class="o">=</span> <span class="mi">256</span><span class="p">;</span>
  <span class="kt">int</span> <span class="n">gridSize</span> <span class="o">=</span> <span class="p">(</span><span class="n">paths</span> <span class="o">+</span> <span class="n">blockSize</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="n">blockSize</span><span class="p">;</span>
  <span class="n">Payoff</span> <span class="n">payoff</span><span class="p">;</span>

  <span class="n">europeanOptionGPUKernel</span><span class="o">&lt;&lt;&lt;</span><span class="n">gridSize</span><span class="p">,</span> <span class="n">blockSize</span><span class="o">&gt;&gt;&gt;</span><span class="p">(</span><span class="n">S0</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">T</span><span class="p">,</span> <span class="n">paths</span><span class="p">,</span>
                                                   <span class="n">d_results</span><span class="p">,</span> <span class="n">payoff</span><span class="p">);</span>
  <span class="n">cudaDeviceSynchronize</span><span class="p">();</span>

  <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">h_results</span><span class="p">(</span><span class="n">paths</span><span class="p">);</span>
  <span class="n">cudaMemcpy</span><span class="p">(</span><span class="n">h_results</span><span class="p">.</span><span class="n">data</span><span class="p">(),</span> <span class="n">d_results</span><span class="p">,</span> <span class="n">paths</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">double</span><span class="p">),</span>
             <span class="n">cudaMemcpyDeviceToHost</span><span class="p">);</span>

  <span class="n">cudaFree</span><span class="p">(</span><span class="n">d_results</span><span class="p">);</span>

  <span class="kt">double</span> <span class="n">sum</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>
  <span class="k">for</span> <span class="p">(</span><span class="kt">double</span> <span class="n">payoff_results</span> <span class="o">:</span> <span class="n">h_results</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">sum</span> <span class="o">+=</span> <span class="n">payoff_results</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">return</span> <span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">r</span> <span class="o">*</span> <span class="n">T</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">sum</span> <span class="o">/</span> <span class="k">static_cast</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">(</span><span class="n">paths</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="asian-options">Asian Options</h2>

<p>For Asian Options, we have used Arithmetic Mean over last <em>X fixings</em> to find the Payoff. The implementation was similar to European just had to make one extra loop inside for all <em>dt = T/tradingDays</em> steps.</p>

<p>The implementation of CPU function, wrapper and kernel is given in the Github Repo.</p>

<h2 id="basket-options">Basket Options</h2>

<p>This was cool! So I came to know about these options from a PS of a Hackathon. So these type of options have a portfolio where multiple assets are considered as underlying with weights assigned to each asset. The problem is that these can be correlated with other and so we use Choleskey Method. I do not really understand how this works and used it as a black box for this project. It uses Lower Triangular Matrix(L) of the Correlation Matrix(R) to change the Normal Random Variable vector (Y) to Correlated Normal Random Variable vector (Z).</p>

<p>The implementation of CPU function, wrapper and kernel is given in the Github Repo.</p>

<h2 id="american-options">American Options</h2>

<p>The initial method I thought of using was traversing back the Binomial Tree which we build in the Binomial Method. However, I later found out about Least-Mean Square algorithm and read it from the original research paper. Also found a couple of implementations on youtube which made it easier. The main manipulation was evaluating regression equation using the following function :</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">__host__</span> <span class="n">__device__</span> <span class="kt">void</span> <span class="nf">quadraticRegression</span><span class="p">(</span><span class="kt">double</span> <span class="o">*</span><span class="n">X</span><span class="p">,</span> <span class="kt">double</span> <span class="o">*</span><span class="n">Y</span><span class="p">,</span> <span class="kt">int</span> <span class="n">n</span><span class="p">,</span>
                                             <span class="kt">double</span> <span class="o">&amp;</span><span class="n">a0</span><span class="p">,</span> <span class="kt">double</span> <span class="o">&amp;</span><span class="n">a1</span><span class="p">,</span>
                                             <span class="kt">double</span> <span class="o">&amp;</span><span class="n">a2</span><span class="p">)</span> <span class="p">{</span>
  <span class="kt">double</span> <span class="n">Sx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">Sx2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">Sx3</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">Sx4</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
  <span class="kt">double</span> <span class="n">Sy</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">Sxy</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">Sx2y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>

  <span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">n</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
    <span class="kt">double</span> <span class="n">x</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
    <span class="kt">double</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
    <span class="kt">double</span> <span class="n">y</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>

    <span class="n">Sx</span> <span class="o">+=</span> <span class="n">x</span><span class="p">;</span>
    <span class="n">Sx2</span> <span class="o">+=</span> <span class="n">x2</span><span class="p">;</span>
    <span class="n">Sx3</span> <span class="o">+=</span> <span class="n">x2</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
    <span class="n">Sx4</span> <span class="o">+=</span> <span class="n">x2</span> <span class="o">*</span> <span class="n">x2</span><span class="p">;</span>
    <span class="n">Sy</span> <span class="o">+=</span> <span class="n">y</span><span class="p">;</span>
    <span class="n">Sxy</span> <span class="o">+=</span> <span class="n">x</span> <span class="o">*</span> <span class="n">y</span><span class="p">;</span>
    <span class="n">Sx2y</span> <span class="o">+=</span> <span class="n">x2</span> <span class="o">*</span> <span class="n">y</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="kt">double</span> <span class="n">D</span> <span class="o">=</span> <span class="n">n</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx4</span> <span class="o">-</span> <span class="n">Sx3</span> <span class="o">*</span> <span class="n">Sx3</span><span class="p">)</span> <span class="o">-</span> <span class="n">Sx</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx</span> <span class="o">*</span> <span class="n">Sx4</span> <span class="o">-</span> <span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx3</span><span class="p">)</span> <span class="o">+</span>
             <span class="n">Sx2</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx</span> <span class="o">*</span> <span class="n">Sx3</span> <span class="o">-</span> <span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx2</span><span class="p">);</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">D</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">D</span> <span class="o">*=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="k">if</span> <span class="p">(</span><span class="n">D</span> <span class="o">&lt;</span> <span class="mf">1e-12</span><span class="p">)</span> <span class="p">{</span>
    <span class="n">a0</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">a1</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="n">a2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">return</span><span class="p">;</span>
  <span class="p">}</span>

  <span class="kt">double</span> <span class="n">D0</span> <span class="o">=</span> <span class="n">Sy</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx4</span> <span class="o">-</span> <span class="n">Sx3</span> <span class="o">*</span> <span class="n">Sx3</span><span class="p">)</span> <span class="o">-</span> <span class="n">Sx</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sxy</span> <span class="o">*</span> <span class="n">Sx4</span> <span class="o">-</span> <span class="n">Sx3</span> <span class="o">*</span> <span class="n">Sx2y</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">Sx2</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sxy</span> <span class="o">*</span> <span class="n">Sx3</span> <span class="o">-</span> <span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx2y</span><span class="p">);</span>

  <span class="kt">double</span> <span class="n">D1</span> <span class="o">=</span> <span class="n">n</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sxy</span> <span class="o">*</span> <span class="n">Sx4</span> <span class="o">-</span> <span class="n">Sx3</span> <span class="o">*</span> <span class="n">Sx2y</span><span class="p">)</span> <span class="o">-</span> <span class="n">Sy</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx</span> <span class="o">*</span> <span class="n">Sx4</span> <span class="o">-</span> <span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx3</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">Sx2</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx</span> <span class="o">*</span> <span class="n">Sx2y</span> <span class="o">-</span> <span class="n">Sxy</span> <span class="o">*</span> <span class="n">Sx2</span><span class="p">);</span>

  <span class="kt">double</span> <span class="n">D2</span> <span class="o">=</span> <span class="n">n</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx2y</span> <span class="o">-</span> <span class="n">Sxy</span> <span class="o">*</span> <span class="n">Sx3</span><span class="p">)</span> <span class="o">-</span> <span class="n">Sx</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx</span> <span class="o">*</span> <span class="n">Sx2y</span> <span class="o">-</span> <span class="n">Sxy</span> <span class="o">*</span> <span class="n">Sx2</span><span class="p">)</span> <span class="o">+</span>
              <span class="n">Sy</span> <span class="o">*</span> <span class="p">(</span><span class="n">Sx</span> <span class="o">*</span> <span class="n">Sx3</span> <span class="o">-</span> <span class="n">Sx2</span> <span class="o">*</span> <span class="n">Sx2</span><span class="p">);</span>

  <span class="n">a0</span> <span class="o">=</span> <span class="n">D0</span> <span class="o">/</span> <span class="n">D</span><span class="p">;</span>
  <span class="n">a1</span> <span class="o">=</span> <span class="n">D1</span> <span class="o">/</span> <span class="n">D</span><span class="p">;</span>
  <span class="n">a2</span> <span class="o">=</span> <span class="n">D2</span> <span class="o">/</span> <span class="n">D</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The implementation of CPU function is given in the Github Repo. The GPU method cannot work here as regression is inherently sequential.</p>

<h2 id="benchmarking-raii-class">Benchmarking RAII Class</h2>

<p>RAII - Resource Allocation is Initialization (done using Constructors and Destructors of Timer Class)</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Timer</span> <span class="p">{</span>
<span class="nl">private:</span>
    <span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">label</span><span class="p">;</span>
    <span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">high_resolution_clock</span><span class="o">::</span><span class="n">time_point</span> <span class="n">start</span><span class="p">;</span>
    
<span class="nl">public:</span>
    <span class="n">Timer</span><span class="p">(</span><span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&amp;</span> <span class="n">label</span> <span class="o">=</span> <span class="s">""</span><span class="p">)</span> <span class="o">:</span> <span class="n">label</span><span class="p">(</span><span class="n">label</span><span class="p">),</span> <span class="n">start</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">high_resolution_clock</span><span class="o">::</span><span class="n">now</span><span class="p">())</span> <span class="p">{}</span>

    <span class="kt">double</span> <span class="n">getDuration</span><span class="p">()</span> <span class="k">const</span> <span class="p">{</span>
      <span class="k">auto</span> <span class="n">end</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">high_resolution_clock</span><span class="o">::</span><span class="n">now</span><span class="p">();</span>
      <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">duration</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">(</span><span class="n">end</span> <span class="o">-</span> <span class="n">start</span><span class="p">).</span><span class="n">count</span><span class="p">();</span>
    <span class="p">}</span>

    <span class="o">~</span><span class="n">Timer</span><span class="p">()</span> <span class="p">{</span>
        <span class="k">auto</span> <span class="n">end</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">high_resolution_clock</span><span class="o">::</span><span class="n">now</span><span class="p">();</span>
        <span class="kt">double</span> <span class="n">duration</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="n">chrono</span><span class="o">::</span><span class="n">duration</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">(</span><span class="n">end</span> <span class="o">-</span> <span class="n">start</span><span class="p">).</span><span class="n">count</span><span class="p">();</span>
        <span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o">&lt;&lt;</span> <span class="n">label</span> <span class="o">&lt;&lt;</span> <span class="s">" took "</span> <span class="o">&lt;&lt;</span> <span class="n">duration</span> <span class="o">&lt;&lt;</span> <span class="s">" seconds."</span> <span class="o">&lt;&lt;</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
    <span class="p">}</span>
<span class="p">};</span>

<span class="c1">// Sample Implementation</span>
<span class="p">{</span>
    <span class="n">Timer</span> <span class="n">timer</span><span class="p">(</span><span class="s">"CPU Function"</span><span class="p">)</span> <span class="c1">// Constructor Called</span>
    <span class="c1">// Call to CPU Function</span>
<span class="p">}</span> <span class="c1">// Destructor Called</span>
</code></pre></div></div>

<h2 id="cli-integration">CLI Integration</h2>

<p>Used cxxopts for CLI integration. This header was cloned from its <a href="https://github.com/jarro2783/cxxopts">Github Repo</a>.</p>

<h2 id="benchmark-results">Benchmark Results</h2>

<p>The benchmark results are also included in the Github Repo in Results.ipynb.
Following were the results :</p>
<ul>
  <li>European Options a speed up of <em>2.815X</em> on <em>10^7</em> paths.</li>
  <li>Asian Options a speed up of <em>190.35X</em> on <em>10^6</em> paths.</li>
  <li>Basket Options a speed up of <em>88.28X</em> on <em>10^7</em> paths.</li>
</ul>

<h2 id="conclusion-and-learnings">Conclusion and Learnings</h2>

<p>This project was a great one for learning Multithreading using CUDA and applying it to Finance. I learned not only about CUDA but also about some cool option types, things like RAII and Functor Structs. Overall a very positive learning outcome :)</p>]]></content><author><name>Toshit Jain</name></author><category term="C++" /><category term="CUDA" /><category term="Project" /><summary type="html"><![CDATA[Introduction &amp; Motivation]]></summary></entry><entry><title type="html">Chained Unary Minus Resolution</title><link href="https://toshit3q34.github.io/Clad-Unary-Minus/" rel="alternate" type="text/html" title="Chained Unary Minus Resolution" /><published>2025-01-30T00:00:00+00:00</published><updated>2025-01-30T00:00:00+00:00</updated><id>https://toshit3q34.github.io/Clad-Unary-Minus</id><content type="html" xml:base="https://toshit3q34.github.io/Clad-Unary-Minus/"><![CDATA[<h2 id="introduction-and-motivation">Introduction and Motivation</h2>

<p>I came to know about Open Source in July-October of 2024 and was really excited to have a contribution for myself as well. I was interested in C++ and thus landed on Clang Based Automatic Differentiation (CLAD) OS repository. I did a contribution in December 2024 to January 2025 which I will be explaining in this blog.
You can view the PR here : <a href="https://github.com/vgvassilev/clad/pull/1180">PR Link</a>
Clad Repository : <a href="https://github.com/vgvassilev/clad">Link</a></p>

<hr />

<h2 id="clang-based-automatic-differentiation">Clang Based Automatic Differentiation</h2>

<p>It is a LLVM compiler infrastructure and a plugin for Clang. It enables automatic differentiation for functions in C++. Information about Automatic Differentiation can be found <a href="https://en.wikipedia.org/wiki/Automatic_differentiation">here</a>.</p>

<hr />

<h2 id="issue-and-naive-solution">Issue and naive solution</h2>

<p>CLAD parses the code presented using Clang’s Abstract Syntax Tree (AST) using LLVM libraries. The given function is parsed at compile time and a derivative is added in the object file. The issue I worked on was resolving repeated use of Unary Minuses. So, the issue was when we used chained minuses, like :</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">Expression</span> <span class="o">:</span> <span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">))</span>
<span class="n">The</span> <span class="n">differentiation</span> <span class="n">should</span> <span class="n">be</span> <span class="o">:</span> <span class="o">-</span><span class="mi">1</span>
<span class="n">However</span><span class="p">,</span> <span class="n">due</span> <span class="n">to</span> <span class="n">mishandling</span> <span class="n">it</span> <span class="n">displayed</span> <span class="o">-</span> <span class="o">-</span> <span class="o">-</span><span class="mi">1</span>
</code></pre></div></div>

<p>Naive solution which I presented at first was to resolve it at run time (which was obviously pointed out and was wrong !!). I came to know about Clang and LLVM then and started learning the required stuff as recommended by the maintainer of the repo.</p>

<hr />

<h2 id="resolution-function">Resolution Function</h2>

<p>The following function was added.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">Expr</span><span class="o">*</span> <span class="n">VisitorBase</span><span class="o">::</span><span class="n">ResolveUnaryMinus</span><span class="p">(</span><span class="n">Expr</span><span class="o">*</span> <span class="n">E</span><span class="p">,</span> <span class="n">SourceLocation</span> <span class="n">OpLoc</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="k">auto</span><span class="o">*</span> <span class="n">UO</span> <span class="o">=</span> <span class="n">llvm</span><span class="o">::</span><span class="n">dyn_cast</span><span class="o">&lt;</span><span class="n">clang</span><span class="o">::</span><span class="n">UnaryOperator</span><span class="o">&gt;</span><span class="p">(</span><span class="n">E</span><span class="p">))</span> <span class="p">{</span>
      <span class="k">if</span> <span class="p">(</span><span class="n">UO</span><span class="o">-&gt;</span><span class="n">getOpcode</span><span class="p">()</span> <span class="o">==</span> <span class="n">clang</span><span class="o">::</span><span class="n">UO_Minus</span><span class="p">)</span>
        <span class="k">return</span> <span class="p">(</span><span class="n">UO</span><span class="o">-&gt;</span><span class="n">getSubExpr</span><span class="p">())</span><span class="o">-&gt;</span><span class="n">IgnoreParens</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="n">Expr</span><span class="o">*</span> <span class="n">E_LHS</span> <span class="o">=</span> <span class="n">E</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="k">auto</span><span class="o">*</span> <span class="n">BO</span> <span class="o">=</span> <span class="n">llvm</span><span class="o">::</span><span class="n">dyn_cast</span><span class="o">&lt;</span><span class="n">BinaryOperator</span><span class="o">&gt;</span><span class="p">(</span><span class="n">E_LHS</span><span class="p">))</span>
      <span class="n">E_LHS</span> <span class="o">=</span> <span class="n">BO</span><span class="o">-&gt;</span><span class="n">getLHS</span><span class="p">();</span>
    <span class="k">if</span> <span class="p">(</span><span class="k">auto</span><span class="o">*</span> <span class="n">UO</span> <span class="o">=</span> <span class="n">llvm</span><span class="o">::</span><span class="n">dyn_cast</span><span class="o">&lt;</span><span class="n">clang</span><span class="o">::</span><span class="n">UnaryOperator</span><span class="o">&gt;</span><span class="p">(</span><span class="n">E_LHS</span><span class="o">-&gt;</span><span class="n">IgnoreCasts</span><span class="p">()))</span> <span class="p">{</span>
      <span class="k">if</span> <span class="p">(</span><span class="n">UO</span><span class="o">-&gt;</span><span class="n">getOpcode</span><span class="p">()</span> <span class="o">==</span> <span class="n">clang</span><span class="o">::</span><span class="n">UO_Minus</span><span class="p">)</span>
        <span class="n">E</span> <span class="o">=</span> <span class="n">m_Sema</span><span class="p">.</span><span class="n">ActOnParenExpr</span><span class="p">(</span><span class="n">E</span><span class="o">-&gt;</span><span class="n">getBeginLoc</span><span class="p">(),</span> <span class="n">E</span><span class="o">-&gt;</span><span class="n">getEndLoc</span><span class="p">(),</span> <span class="n">E</span><span class="p">).</span><span class="n">get</span><span class="p">();</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="n">m_Sema</span><span class="p">.</span><span class="n">BuildUnaryOp</span><span class="p">(</span><span class="nb">nullptr</span><span class="p">,</span> <span class="n">OpLoc</span><span class="p">,</span> <span class="n">clang</span><span class="o">::</span><span class="n">UO_Minus</span><span class="p">,</span> <span class="n">E</span><span class="p">).</span><span class="n">get</span><span class="p">();</span>
  <span class="p">}</span>
</code></pre></div></div>

<p>The function does the following things :</p>
<ul>
  <li>Recursively traverses down the expression</li>
  <li>The first section adds unary minus and removes unary minus if already present, ensuring proper meaning. Example : -(-(-x)) starts as x -&gt; -x -&gt; x -&gt; -x</li>
  <li>The second part iteratively travels down the leftmost expression if binary expression and adds Unary Minus after adding parens to the expression. Example : Unary minus on -a+b is -(-a+b) and not - -a+b = a+b (Wrong!)</li>
</ul>

<p>This function, though seems small, deals with all the edge cases regarding Unary Minus.</p>

<hr />

<h2 id="unit-testing">Unit testing</h2>

<p>The following unit tests were added. The first one deals with the incorrect parsing in independent expressions while the second deals with incorrect parsing for Unary Minus over expressions with Binary Operators.</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// RUN: %cladclang %s -I%S/../../include -oUnaryMinus.out 2&gt;&amp;1 | %filecheck %s</span>
<span class="c1">// RUN: ./UnaryMinus.out | %filecheck_exec %s</span>
<span class="c1">// RUN: %cladclang -Xclang -plugin-arg-clad -Xclang -enable-tbr %s -I%S/../../include -oUnaryMinus.out</span>
<span class="c1">// RUN: ./UnaryMinus.out | %filecheck_exec %s</span>

<span class="cp">#include</span> <span class="cpf">"clad/Differentiator/Differentiator.h"</span><span class="cp">
</span>
<span class="cp">#include</span> <span class="cpf">"../TestUtils.h"</span><span class="cp">
</span>
<span class="kt">double</span> <span class="nf">f1</span><span class="p">(</span><span class="kt">double</span> <span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span><span class="o">*-</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">//CHECK: void f1_grad(double x, double *_d_x) {</span>
<span class="c1">//CHECK-NEXT:    *_d_x += -(-1 * 1);</span>
<span class="c1">//CHECK-NEXT: }</span>

<span class="kt">double</span> <span class="n">f2</span><span class="p">(</span><span class="kt">double</span> <span class="n">x</span><span class="p">,</span> <span class="kt">double</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">return</span> <span class="o">-</span><span class="mi">2</span><span class="o">*-</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">))</span><span class="o">*-</span><span class="n">y</span> <span class="o">-</span> <span class="mi">1</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="n">y</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">));</span>
<span class="p">}</span>

<span class="c1">//CHECK: void f2_grad(double x, double y, double *_d_x, double *_d_y) {</span>
<span class="c1">//CHECK-NEXT:    {</span>
<span class="c1">//CHECK-NEXT:        *_d_x += -(-2 * 1 * -y);</span>
<span class="c1">//CHECK-NEXT:        *_d_y += -(-2 * -x * 1);</span>
<span class="c1">//CHECK-NEXT:        *_d_y += -1 * -1 * x;</span>
<span class="c1">//CHECK-NEXT:        *_d_x += 1 * -y * -1;</span>
<span class="c1">//CHECK-NEXT:    }</span>
<span class="c1">//CHECK-NEXT: }</span>

<span class="kt">double</span> <span class="n">dx</span><span class="p">;</span>
<span class="kt">double</span> <span class="n">arr</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{};</span>
<span class="kt">int</span> <span class="n">main</span><span class="p">(){</span>

    <span class="n">INIT_GRADIENT</span><span class="p">(</span><span class="n">f1</span><span class="p">);</span>
    <span class="n">INIT_GRADIENT</span><span class="p">(</span><span class="n">f2</span><span class="p">);</span>

    <span class="n">TEST_GRADIENT</span><span class="p">(</span><span class="n">f1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">dx</span><span class="p">);</span> <span class="c1">// CHECK-EXEC: 1.00</span>
    <span class="n">TEST_GRADIENT</span><span class="p">(</span><span class="n">f2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">arr</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="o">&amp;</span><span class="n">arr</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="c1">// CHECK-EXEC: {-4.00, -3.00}</span>
<span class="p">}</span>
</code></pre></div></div>

<h2 id="learnings">Learnings</h2>

<p>This contribution was my first ever to C++ as well as Clad. I learned a lot about Clang and LLVM. Looking forward to more contributions to Clad in the future !!</p>]]></content><author><name>Toshit Jain</name></author><category term="Open Source" /><category term="C++" /><summary type="html"><![CDATA[Introduction and Motivation]]></summary></entry><entry><title type="html">ThreadsafeQueueLib: Design and Motivation of Concurrent Queues in C++</title><link href="https://toshit3q34.github.io/Threadsafe-Queue-Lib/" rel="alternate" type="text/html" title="ThreadsafeQueueLib: Design and Motivation of Concurrent Queues in C++" /><published>2025-01-19T00:00:00+00:00</published><updated>2025-01-19T00:00:00+00:00</updated><id>https://toshit3q34.github.io/Threadsafe-Queue-Lib</id><content type="html" xml:base="https://toshit3q34.github.io/Threadsafe-Queue-Lib/"><![CDATA[<h2 id="project-overview">Project Overview</h2>

<p><strong>ThreadsafeQueueLib</strong> is a project developed under the Coding Club, IIT Guwahati, with the aim of designing and implementing a family of <strong>thread-safe, lock-free, and wait-free queue data structures</strong> in modern C++. The project is motivated by real-world concurrent systems where classical standard library containers such as <code class="language-plaintext highlighter-rouge">std::queue</code> fail under contention due to race conditions and lack of atomicity guarantees.</p>

<p>This document outlines the motivation, background, concurrency issues in existing queues, and the design goals of the ThreadsafeQueueLib project.</p>

<hr />

<h2 id="introduction">Introduction</h2>

<p>With the latest advancements in the C++ language and its standard library, there is now substantial scope for developing efficient systems in the domains of <strong>multithreading and parallel processing</strong>. However, many classical containers in the C++ Standard Library—most notably <code class="language-plaintext highlighter-rouge">std::queue</code>—were <strong>not designed for true concurrent usage</strong>.</p>

<p>As a result, these containers often <strong>break under concurrent access</strong>, leading to race conditions and undefined behavior. Among these issues, <strong>data races</strong> are particularly dangerous and form the primary focus of this project.</p>

<p>In the context of queues, there are typically two participants:</p>

<ul>
  <li><strong>Producers</strong>, which generate data and push it into the queue</li>
  <li><strong>Consumers</strong>, which retrieve and process that data</li>
</ul>

<p>This producer–consumer abstraction appears across numerous domains such as:</p>

<ul>
  <li>Airport queues</li>
  <li>Operating system process schedulers</li>
  <li>High-frequency trading (HFT) systems</li>
</ul>

<p>HFT systems, in particular, served as a major inspiration for this project.</p>

<p>The objective is to design and implement a family of <strong>lock-free and wait-free queues</strong> that can safely support <strong>multiple producers and multiple consumers</strong> operating concurrently on a shared structure, while avoiding data races and ensuring correctness.</p>

<hr />

<h2 id="race-conditions-and-data-races">Race Conditions and Data Races</h2>

<h3 id="race-conditions">Race Conditions</h3>

<p>Consider a real-world example: buying movie tickets at a cinema with multiple cashiers. If two customers attempt to book the last few seats simultaneously, the final outcome depends on <strong>who completes the transaction first</strong>. This is a classic <strong>race condition</strong>—the result depends on the relative ordering of independent operations.</p>

<p>In concurrent programming, a race condition refers to any scenario where program behavior depends on the <strong>interleaving of operations across multiple threads</strong>.</p>

<h3 id="data-races">Data Races</h3>

<p>The C++ Standard defines a more severe form of race condition known as a <strong>data race</strong>.</p>

<p>A <strong>data race</strong> occurs when:</p>
<ul>
  <li>Two or more threads access the same memory location concurrently, and</li>
  <li>At least one of those accesses is a write, and</li>
  <li>There is no proper synchronization</li>
</ul>

<p>A data race results in <strong>undefined behavior</strong>, making the program fundamentally unsafe and unpredictable.</p>

<p>In queue-based systems, data races typically occur when:</p>
<ul>
  <li>Multiple producers modify the queue tail concurrently</li>
  <li>A consumer reads an element while a producer is still writing it</li>
  <li>Internal pointers or indices are accessed without synchronization</li>
</ul>

<p>Even a single unsynchronized read–write or write–write overlap can corrupt the logical structure of the queue. This explains why naïve implementations of <code class="language-plaintext highlighter-rouge">std::queue</code> or simple linked-list queues cannot be safely shared across threads.</p>

<hr />

<h2 id="race-conditions-specific-to-stdqueue">Race Conditions Specific to <code class="language-plaintext highlighter-rouge">std::queue</code></h2>

<p>The standard <code class="language-plaintext highlighter-rouge">std::queue</code> implementation suffers from multiple race conditions in concurrent environments because its interface provides <strong>no atomicity guarantees</strong>.</p>

<h3 id="race-between-empty-and-front">Race Between <code class="language-plaintext highlighter-rouge">empty()</code> and <code class="language-plaintext highlighter-rouge">front()</code></h3>

<p>A common consumer pattern is:</p>

<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">q</span><span class="p">.</span><span class="n">empty</span><span class="p">())</span> <span class="p">{</span>
    <span class="kt">int</span> <span class="n">value</span> <span class="o">=</span> <span class="n">q</span><span class="p">.</span><span class="n">front</span><span class="p">();</span>
    <span class="n">q</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
    <span class="n">do_something</span><span class="p">(</span><span class="n">value</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In single-threaded code, this construct is perfectly safe: calling <code class="language-plaintext highlighter-rouge">front()</code> on an empty queue is undefined behavior, and the preceding <code class="language-plaintext highlighter-rouge">empty()</code> check ensures that this does not happen.</p>

<p>However, when multiple consumers operate on the same shared queue, this becomes a <strong>classic race condition</strong>. Another thread may call <code class="language-plaintext highlighter-rouge">pop()</code> in the small window between <code class="language-plaintext highlighter-rouge">empty()</code> and <code class="language-plaintext highlighter-rouge">front()</code>, removing the last element and causing a second thread to read from an empty queue. Even protecting the queue with a <code class="language-plaintext highlighter-rouge">std::mutex</code> does not help, because these operations are not atomic and must be performed as a single indivisible step.</p>

<hr />

<h3 id="race-between-front-and-pop">Race Between <code class="language-plaintext highlighter-rouge">front()</code> and <code class="language-plaintext highlighter-rouge">pop()</code></h3>

<p>A second race occurs when two consumer threads execute the following pattern concurrently.</p>

<p>If both threads observe the same element at the front of the queue (because nothing modifies the queue between their calls to <code class="language-plaintext highlighter-rouge">front()</code>), and both subsequently invoke <code class="language-plaintext highlighter-rouge">pop()</code>, then the following issues arise:</p>

<ul>
  <li>One item is <strong>processed twice</strong></li>
  <li>Another item is <strong>discarded without ever being read</strong></li>
</ul>

<p>This violates the fundamental FIFO invariant of queues and clearly demonstrates why naïve concurrent use of <code class="language-plaintext highlighter-rouge">std::queue</code> is unsafe. These cases motivate the need for properly designed <strong>threadsafe queues</strong> which guarantee atomicity and correctness under contention.</p>

<hr />

<h2 id="targets-of-the-project">Targets of the Project</h2>

<p>Having identified the limitations of <code class="language-plaintext highlighter-rouge">std::queue</code> in concurrent settings, our goal is to design a new data structure — the <strong>ThreadsafeQueue</strong> — that eliminates these issues while providing a flexible and performant interface for both developers and high-throughput systems.</p>

<p>Before diving into implementation details, we outline the features and design goals from a user’s point of view.</p>

<hr />

<h2 id="bounded-and-unbounded-queues">Bounded and Unbounded Queues</h2>

<p>A robust concurrent queue should support both bounded and unbounded modes.</p>

<h3 id="bounded-mode">Bounded Mode</h3>

<p>A bounded queue restricts the maximum number of elements it can hold. This is typically implemented using a <strong>circular buffer</strong>, which offers extremely fast index arithmetic and predictable memory usage.</p>

<p>Bounded queues are commonly used in:</p>

<ul>
  <li>Real-time systems</li>
  <li>Embedded applications</li>
  <li>Thread pools</li>
  <li>High-frequency trading (HFT) pipelines</li>
</ul>

<p>In such systems, memory must remain under strict control.</p>

<h3 id="unbounded-mode">Unbounded Mode</h3>

<p>An unbounded queue grows as needed, often backed by a linked list or a dynamically resizing container such as <code class="language-plaintext highlighter-rouge">std::queue</code>. This mode is more flexible and useful when throughput is high and memory is abundant.</p>

<p>The user should be able to specify, at construction time:</p>

<ul>
  <li>Whether the queue is bounded or unbounded</li>
  <li>If bounded, the maximum capacity</li>
</ul>

<p>This configurability ensures that the ThreadsafeQueue can be deployed across a wide range of environments.</p>

<hr />

<h2 id="spsc-mpsc-and-mpmc-support">SPSC, MPSC, and MPMC Support</h2>

<p>Different applications require different concurrency models. A well-designed concurrent queue must support all three primary configurations:</p>

<h3 id="spsc--single-producer-single-consumer">SPSC — Single Producer, Single Consumer</h3>

<p>This is the simplest model and allows aggressive optimizations such as:</p>

<ul>
  <li>Eliminating expensive memory fences</li>
  <li>Using cache-friendly ring buffers</li>
</ul>

<h3 id="mpsc--multiple-producers-single-consumer">MPSC — Multiple Producers, Single Consumer</h3>

<p>This model is common in:</p>

<ul>
  <li>Logging systems</li>
  <li>Event queues</li>
</ul>

<p>It requires safe atomic coordination among multiple producer threads.</p>

<h3 id="mpmc--multiple-producers-multiple-consumers">MPMC — Multiple Producers, Multiple Consumers</h3>

<p>This is the most general and complex case. Ensuring correctness under high contention requires careful handling of:</p>

<ul>
  <li>ABA problems</li>
  <li>Atomic pointer manipulation</li>
  <li>Memory-ordering guarantees</li>
</ul>

<p>Our goal is to provide a <strong>unified interface</strong> that works across all these models while maintaining correctness and high performance.</p>

<hr />

<h2 id="blocking-vs-lock-free-implementations">Blocking vs Lock-Free Implementations</h2>

<p>The ThreadsafeQueue should allow users to choose between blocking and non-blocking designs.</p>

<h3 id="blocking-mode">Blocking Mode</h3>

<p>Implemented using <code class="language-plaintext highlighter-rouge">std::mutex</code> and <code class="language-plaintext highlighter-rouge">std::condition_variable</code>, blocking queues are easier to implement and reason about. However, they may suffer from:</p>

<ul>
  <li>Context-switch overhead</li>
  <li>Potential priority inversion</li>
  <li>Poor scalability under contention</li>
</ul>

<h3 id="lock-free-non-blocking-mode">Lock-Free (Non-Blocking) Mode</h3>

<p>Lock-free queues provide progress guarantees such as <strong>lock-freedom</strong> or <strong>wait-freedom</strong>. These algorithms rely on:</p>

<ul>
  <li>Atomic operations</li>
  <li>Memory-ordering semantics</li>
  <li>Careful ABA prevention strategies</li>
</ul>

<p>Lock-free queues excel in high-performance systems and avoid blocking delays. Due to their superior scalability and suitability for multicore environments, this project primarily focuses on <strong>lock-free implementations</strong> with a <strong>bounded implementation</strong> as well for the sake of learning.</p>

<hr />

<h2 id="template-metaprogramming-for-compile-time-optimisation">Template Metaprogramming for Compile-Time Optimisation</h2>

<p>Rather than designing separate runtime classes for SPSC, MPSC, MPMC, bounded, and unbounded variants, which would lead to code duplication and runtime overhead, this project uses <strong>C++ template metaprogramming</strong>.</p>

<p>By selecting queue characteristics at compile time, the compiler can:</p>

<ul>
  <li>Remove unused branches and modes</li>
  <li>Apply aggressive optimizations</li>
  <li>Generate highly specialized data structures for each use case</li>
</ul>

<p>The user specifies the mode, capacity, concurrency type, and blocking behavior through template parameters, enabling maximum performance without sacrificing flexibility.</p>

<hr />

<h2 id="teams-for-the-project-and-schedule">Teams for the Project and Schedule</h2>

<p>This is the GitHub repository (will add hyperlink later) for the project. The repository is currently private and will be made public after <strong>TBD</strong>. (I am currently figuring some things out about the implementation so once everything is final, I’ll make it public.)</p>

<p>Participants are divided into teams with assigned Points of Contact (POCs). These POCs are solely for the sake of building responsibility for the project.</p>

<h3 id="team-assignments">Team Assignments</h3>

<table>
  <thead>
    <tr>
      <th>Team</th>
      <th>Members</th>
      <th>POC</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Team A</td>
      <td>Aryan Gupta, Naveen, Rishit, Keshav</td>
      <td>Aryan Gupta</td>
    </tr>
    <tr>
      <td>Team B</td>
      <td>Ritesh, Abhiraj, Abhigyan, Ritwik, Prabhnoor</td>
      <td>Ritesh</td>
    </tr>
    <tr>
      <td>Team C</td>
      <td>Tushar, Avanish, Mehul, Aryan Chakravorty</td>
      <td>Aryan Chakravorty</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="session-plans">Session Plans</h2>

<table>
  <thead>
    <tr>
      <th>Date</th>
      <th>Time</th>
      <th>Session Name</th>
      <th>PDF Link</th>
      <th>Video Link</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>22nd January 2026</td>
      <td>9 PM to 10 PM</td>
      <td>Concurrency, std::thread and more</td>
      <td><a href="https://drive.google.com/file/d/11gFZUBynv-fPv76cTk4gVV4K1qPlXqie/view?usp=sharing">Link</a></td>
      <td>-</td>
    </tr>
    <tr>
      <td>24th January 2026</td>
      <td>3 PM to 4:30 PM</td>
      <td>Shared data, race conditions and std::mutex</td>
      <td><a href="https://drive.google.com/file/d/1Z2fe-kMtei0qSuh0_Cl4pnsMBKS3FWSG/view?usp=sharing">Link</a></td>
      <td><a href="https://drive.google.com/file/d/1aWfwzUj9FXg3cLgam6bMdRSHNm6anTr_/view?usp=sharing">Link</a></td>
    </tr>
    <tr>
      <td>28th January 2025</td>
      <td>10.05 PM to 11.30 PM</td>
      <td>Synchronization and Condition variables</td>
      <td><a href="https://drive.google.com/file/d/1UfJikRmcpD7pZXdVaVLGIEW_3_KSDoPz/view?usp=sharing">Link</a></td>
      <td><a href="https://drive.google.com/file/d/1iWKUjQOzADzqEfxXT3rEqYes8lwHwSBA/view?usp=sharing">Link</a></td>
    </tr>
    <tr>
      <td>Till 4th Feb 2026</td>
      <td>-</td>
      <td>CppCon 2017 “C++ atomics, from basic to advanced. What do they really do?”</td>
      <td>-</td>
      <td><a href="https://www.youtube.com/watch?v=ZQFzMfHIxng">Link</a></td>
    </tr>
    <tr>
      <td>21st Feb 2026</td>
      <td>3.30 PM to 5.00 PM</td>
      <td>C++ Memory Model and std::atomic - Part 1</td>
      <td><a href="https://drive.google.com/file/d/1DXF0yfzV6Y7xqhvWU7t_eUg8v16Jl3Ji/view?usp=sharing">Link</a></td>
      <td><a href="https://drive.google.com/file/d/1N_qSr-fl-aBynR5yfkntmMPyqeRC_8Y2/view?usp=sharing">Link</a></td>
    </tr>
    <tr>
      <td>TBD</td>
      <td>9 PM to 10 PM</td>
      <td>C++ Memory Model and std::atomic - Part 2</td>
      <td>Link</td>
      <td>Link</td>
    </tr>
    <tr>
      <td>TBD</td>
      <td>9 PM to 10 PM</td>
      <td>Project Description</td>
      <td>Link</td>
      <td>Link</td>
    </tr>
    <tr>
      <td>TBD</td>
      <td>9 PM to 10 PM</td>
      <td>Buffer</td>
      <td>Link</td>
      <td>Link</td>
    </tr>
    <tr>
      <td>TBD</td>
      <td>9 PM to 10 PM</td>
      <td>Template Metaprogramming - Part 1 (later)</td>
      <td>Link</td>
      <td>Link</td>
    </tr>
    <tr>
      <td>TBD</td>
      <td>9 PM to 10 PM</td>
      <td>Template Metaprogramming - Part 2 (later)</td>
      <td>Link</td>
      <td>Link</td>
    </tr>
    <tr>
      <td>TBD</td>
      <td>9 PM to 10 PM</td>
      <td>Template Metaprogramming - Part 3 (later)</td>
      <td>Link</td>
      <td>Link</td>
    </tr>
  </tbody>
</table>

<h3 id="weekly-targets">Weekly Targets</h3>

<table>
  <thead>
    <tr>
      <th>Week</th>
      <th>Duration</th>
      <th>Team Targets</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Week 1</td>
      <td>8th Dec ’25 – 14th Dec ’25</td>
      <td>Complete CPP101 (All Teams)</td>
    </tr>
    <tr>
      <td>Week 2</td>
      <td>15th Dec ’25 – 21st Dec ’25</td>
      <td>Complete CPP101 (All Teams)</td>
    </tr>
    <tr>
      <td>Week 3</td>
      <td>22nd Dec ’25 – 28th Dec ’25</td>
      <td>Complete CPP101 (All Teams)</td>
    </tr>
    <tr>
      <td>Week 4</td>
      <td>29th Dec ’25 – 4th Jan ’26</td>
      <td>Complete CPP101 (All Teams)</td>
    </tr>
    <tr>
      <td>Week 5</td>
      <td>5th Jan ’26 – 11th Jan ’26</td>
      <td>Complete CPP101 (All Teams)</td>
    </tr>
    <tr>
      <td>Week 6</td>
      <td>12th Jan ’26 – 18th Jan ’26</td>
      <td>Complete CPP101 (All Teams)</td>
    </tr>
    <tr>
      <td>Week 7</td>
      <td>19th Jan ’26 – 25th Jan ’26</td>
      <td>Sessions</td>
    </tr>
    <tr>
      <td>Week 8</td>
      <td>26th Jan ’26 – 1st Feb ’26</td>
      <td>Sessions</td>
    </tr>
  </tbody>
</table>

<p>I’ll add weekly targets when we start the project.</p>

<hr />]]></content><author><name>Toshit Jain</name></author><category term="C++" /><category term="Multithreading" /><category term="Lock-Free" /><category term="Data Structures" /><category term="Systems" /><summary type="html"><![CDATA[Project Overview]]></summary></entry></feed>