Theoretical Parallel Algorithms and Data Structures

Ryuto Kitagawa

Committee: Michael Goodrich (Chair), Michael Dillencourt, Michael Shindler

University of California, Irvine

Ryuto Kitagawa UCI November December 1, 2025
Background
  • Goal: Increase efficiency with more threads
  • Binary Fork-Join Model
    • Assume each processor shares memory
    • Threads may perform a fork operation
    • Later performs a join operation for sync
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background
center
Ryuto Kitagawa UCI November December 1, 2025
Background: Work-Span
  • Represent algorithm's computation as a DAG
    • Each node represents tasks, each edge is a dependency
  • Span: Longest path within the graph
  • Work: Total number of nodes in the graph
Ryuto Kitagawa UCI November December 1, 2025
Background: Work-Span
center
Ryuto Kitagawa UCI November December 1, 2025
Background: Work-Span
center
Ryuto Kitagawa UCI November December 1, 2025
Overview
  1. Parallel Peeling IBLT in One/Two Rounds
    1. IBLT
    2. Peeling Analysis
  2. Dynamic Accountable Storage
    1. Accountable Storage Protocol
    2. IBLT Tree
  3. Joinable B-Tree
    1. Fork-Join I/O Model
    2. Multi-way Join
Ryuto Kitagawa UCI November December 1, 2025

Parallel Peeling IBLT in One/Two Rounds

Ryuto Kitagawa UCI November December 1, 2025
Invertible Bloom Lookup Table (IBLT)

center

Ryuto Kitagawa UCI November December 1, 2025
Invertible Bloom Lookup Table (IBLT)
  • Applications:
    • Set Reconciliation of Distributed Computing
    • Block Chain Reconciliation
    • Network Reconciliation
Ryuto Kitagawa UCI November December 1, 2025
Invertible Bloom Lookup Table (IBLT)
  • Will be focusing on insertion and peeling
    • Find difference operation simulated by pure insertions
  • Peeling: Listing all elements from the IBLT destructively
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
How IBLTs Work
center
Ryuto Kitagawa UCI November December 1, 2025
Peeling in Rounds
  • Analysis Jiang, Mitzenmacher, and Thaler
  • rounds with hash functions
  • Probability and round analysis relies on a constant
Ryuto Kitagawa UCI November December 1, 2025
Constant Round Peeling Setup
  • hash functions
  • size of IBLT
    • Partition IBLT into subtables
    • Prevents same element collision
Ryuto Kitagawa UCI November December 1, 2025
Constant Round Peeling Setup
center
Ryuto Kitagawa UCI November December 1, 2025
Probability of 1 Round Peeling

Theorem 1

For a table of cells using hash functions, where is a constant, an IBLT peels in one round with probability at least where

  • An element is unpeelable if all locations have collisions
    • Collisions occur in a subtable with probability at most
    • Across all subtables:
  • Union bound across all elements gives
Ryuto Kitagawa UCI November December 1, 2025
Probability of 2 Round Peeling

Theorem 2

For a table of cells using hash functions, where is a constant, an IBLT peels in two rounds with probability at least

  • Prior proof analyzed the probability of a cell being single
Ryuto Kitagawa UCI November December 1, 2025
Probability of 2 Round Peeling

center

  • with probability
  • Probability of not being peeled in round =
  • Probability of not peeled in round from subtable =
  • Dependencies on other subtables complicates the analysis
Ryuto Kitagawa UCI November December 1, 2025
Dependencies on Subtables

center

  • Round: 1, 2
  • Round: 3, 4
  • Round: 5
Ryuto Kitagawa UCI November December 1, 2025
Probability of Round Peeling

Theorem 3

For a table of cells using hash functions, where is a positive integer constant and is a constant, an IBLT peels in rounds with probability at least

Ryuto Kitagawa UCI November December 1, 2025

Dynamic Accountable Storage

Ryuto Kitagawa UCI November December 1, 2025
Motivation
  • Entrust our data to cloud storage providers
    • Amazon S3 advertises losing of one out of 100 billion objects per year
    • S3 currently stores hundreds of trillions of data
    • When data is lost or corrupted, we want a method of recovery
Ryuto Kitagawa UCI November December 1, 2025
Prior Work: Static Accountable Storage
  • Ateniese et al. developed a static protocol
  • Recover data with IBLT, charge for lost data
  • Homomorphic Tags
    • Ties blocks of data to tags
    • Users may verify server has all data they claim to have
  • Segment Tree
    • Quickly generate IBLT from server's stored data
Ryuto Kitagawa UCI November December 1, 2025
Accountable Storage Framework: Setup
center
Ryuto Kitagawa UCI November December 1, 2025
Accountable Storage Framework: Accountability Challenge
center
Ryuto Kitagawa UCI November December 1, 2025
Issues with Static Accountable Storage
  • Client data is often dynamic in the real world
  • Model assumes a malicious server
    • Unrealistic since users are already entrusting the service with their data
    • Simple, unavoidable method of thwarting the protocol
Ryuto Kitagawa UCI November December 1, 2025
Thwarting Static Accountable Storage
  • Static Accountable Storage charges server based on recovered data
    • If data unrecoverable, assume the worst and charge for everything lost
  • Malicious server avoids significant amounts of responsibility
    • Generate IBLT at the start
    • Server recovers any lost data
    • Server would only be punished when data is unrecoverable
Ryuto Kitagawa UCI November December 1, 2025
Contributions of Work
Work Cryptographic Model Client Space Server Overhead Supported Operations
Ateniese et al. (2017) Malicious Init, ClientAudit
Our Work Honest But Curious Init, Put, Delete, ClientAudit, ServerAudit
Ryuto Kitagawa UCI November December 1, 2025
IBLT Tree
  • IBLT Tree: BST Dynamic "Segment Tree"
  • Bucket leaves, saves significant amount of space
  • Let be the size of the bucket per leaf node
    • Leaf nodes contain an IBLT of elements
    • Internal nodes contain the union of their children
Ryuto Kitagawa UCI November December 1, 2025
IBLT Tree
center
Ryuto Kitagawa UCI November December 1, 2025
IBLT Tree Cost
  • Space:
  • Time cost depends on BST implementation
  • Protected Cartesian Tree as implementation
    • Initialize Tree:
    • Insertion/Deletion:
    • Construct IBLT:
Ryuto Kitagawa UCI November December 1, 2025
Dynamic Accountable Storage Framework
center
Ryuto Kitagawa UCI November December 1, 2025
Dynamic Accountable Storage Framework
center
Ryuto Kitagawa UCI November December 1, 2025
Dynamic Accountable Storage Framework
center
Ryuto Kitagawa UCI November December 1, 2025
Dynamic vs Static Accountable Storage Cost
Work Client Space Server Overhead Init Time Put Server Time Delete Server Time
Ateniese et al. (2017) N/A N/A
Our Work
Our Work
Ryuto Kitagawa UCI November December 1, 2025
Dynamic vs Static Accountable Storage Cost
Work Audit Client Time Audit Server Time Audit Proof Size
Ateniese et al.
(2017)
Our Work
Our Work
Ryuto Kitagawa UCI November December 1, 2025

Parallel Joinable B-Tree in Fork-Join I/O Model

Ryuto Kitagawa UCI November December 1, 2025
Parallel Computational Model for Cached Memory
  • Modern computers are typically bottlenecked by memory accesses
    • Captured sequentially by the External Memory Model
    • PRAM has the parallel equivalent, PEM
    • Binary Fork Join model does not
Ryuto Kitagawa UCI November December 1, 2025
Fork-Join I/O Model
  • Each fork and join spawns and syncs at most threads
  • Each thread may access a block of data in a single memory access
Ryuto Kitagawa UCI November December 1, 2025
Join Based Balanced BST
  • Join-based framework
    • Supports efficient set operations
    • Core methods are Join and Split
center
Ryuto Kitagawa UCI November December 1, 2025
B-Trees
  • Multi-way cache efficient self-balancing tree
  • Each node in the tree contains keys and children
  • Focusing on B-Way Join
    • Extended to the Multi-Way Join
Ryuto Kitagawa UCI November December 1, 2025
B-Trees
  • General steps for the algorithm
    • Grouping
    • Recurse and Subtree Replace
    • Concatenate and Rebalance
Ryuto Kitagawa UCI November December 1, 2025
B-Trees
center
Ryuto Kitagawa UCI November December 1, 2025
B-Trees

Theorem 4

Let be a set of B-trees, with the largest tree height and the shortest tree height , and be a set of separator keys, where . The join operation can be performed in I/O span and I/O work.

  • Leads to a span-inefficient solution
  • Union with above primitive leads to I/O Span
Ryuto Kitagawa UCI November December 1, 2025
B-Way Join Improved
  • Grouping
  • Fuse within Groups
  • Divide Large Nodes
  • Divide Smaller Nodes
  • Concatenate and Rebalance
Ryuto Kitagawa UCI November December 1, 2025
Group and Fuse
  • Recall the grouping of tall trees
    • Do not recurse into subgroups
    • Fuse keys and nodes at their heights directly
center
Ryuto Kitagawa UCI November December 1, 2025
Group and Fuse
  • Maintain list of pointers along the left and right spines
  • Finding the node a root merges with takes I/O span
Ryuto Kitagawa UCI November December 1, 2025
Dividing Large Nodes
  • Suppose a node contains keys
    • When create nodes
    • Push keys to parent node
  • After, each node contains at most keys
    • Each subsequent dividing of nodes will push up at most 1 key
Ryuto Kitagawa UCI November December 1, 2025
Dividing Smaller Nodes
center
Ryuto Kitagawa UCI November December 1, 2025
Multi-way Join I/O Span and Work

Theorem 5

We can join B-trees and keys together in parallel with I/O span and I/O work, where is in the Fork-Join I/O model.

Ryuto Kitagawa UCI November December 1, 2025
Multi-way Split I/O Span and Work

Theorem 6

We can split a B-tree by keys in parallel with I/O span and I/O work, where is in the Fork-Join I/O Model.

Ryuto Kitagawa UCI November December 1, 2025
Union Operation
  • Divide each tree into sections
    • Use the Multi-Split operation
  • Perform a union on each pair of subtrees
  • Use Multi-Join to combine all elements in the tree
    • Minor modifications may be made to Multi-Join to get Intersection and Difference operations
Ryuto Kitagawa UCI November December 1, 2025
Union Operation

Theorem 7

Given two B-trees with sizes and , there exists a parallel algorithm that returns a new B-tree containing the union, intersection, and set difference of the two input trees in and I/O work, I/O span, where is the block size.

Ryuto Kitagawa UCI November December 1, 2025
Future Work
  • Turning to
  • Removing factor entirely
  • Apply algorithm to adaptive sorting settings
  • Use Fork-Join I/O model on other algorithms
Ryuto Kitagawa UCI November December 1, 2025

Thank you!

Ryuto Kitagawa UCI November December 1, 2025

---

<!-- header: 'Introduction'

- Parallel Algorithms: 1950s

<img src="Attachments/examples-of-parallel.svg" alt="center" style="height: 512px;">

---

<!-- header: 'Background: Concurrent Memory Access'

<img src="Attachments/memory-conflicts.svg" alt="center" style="height: 32em;">

---

<!-- header: 'Background: Exclusive Read or Write'

<img src="Attachments/exclusive-memory.svg" alt="center" style="height: 32em;">

---

<!-- header: 'Background: Concurrent Read'

<img src="Attachments/concurrent-read.svg" alt="center" style="height: 32em;">

<!-- header: 'Example: Summation'

<img src="Attachments/prefix-sum-0.svg" alt="center" style="height: 32em;">

---

<!-- paginate: true

<!-- header: 'Example: Prefix Sum'

<img src="Attachments/prefix-sum-2.svg" alt="center" style="height:512px;">

---

<!-- paginate: hold

<img src="Attachments/prefix-sum-3.svg" alt="center" style="height:512px;">

---

<img src="Attachments/prefix-sum-4.svg" alt="center" style="height:512px;">

---

<img src="Attachments/prefix-sum-5.svg" alt="center" style="height:512px;">

---

<img src="Attachments/prefix-sum-6.svg" alt="center" style="height:512px;">

---

<!-- header: 'Parallel Model'

<!-- paginate: true

* Analyzing the round complexity of the operation

* Identify all the single elements in the IBLT

* Remove all single elements

---

<!-- paginate: hold

<img src="Attachments/iblt-explanation-08.svg" alt="center" style="max-width: 100%; width: 1024px;">