Parallel Joinable B-Tree in Fork-Join I/O Model

Michael T. Goodrich, Yan Gu, Ryuto Kitagawa, Yihan Sun

University of California, Irvine; Riverside

Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Outline
  • Preliminaries
  • Results
    • Fork-Join Model
    • B-Way Join Algorithm
  • What's Left Out
  • Future Work
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Parallel Computational Model for Cached Memory
  • PRAM and Binary Fork Join model
  • Runtime is measured by the number of operations
    • Span: Number of operations in parallel
    • Work: Number of total operations
  • Modern computers are typically bottlenecked by memory accesses
    • Captured sequentially by the External Memory Model
    • PRAM has the parallel equivalent, PEM
    • Binary Fork Join model does not
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Join Based Balanced BST
  • Join-based framework
    • Supports efficient set operations on search trees
    • Core operations are the Join and Split operations

center

Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
B-Trees
  • Multi-way cache efficient self-balancing tree
  • Each node in the tree contains keys and subtrees
    • except the root
    • and
    • All leaves are on the same level of the tree
    • Each tree must be a valid tree
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Results
  • Introduce the Fork-Join I/O Model
  • Develop Multiway Join and Multiway Split
    • I/O Work:
    • I/O Span:
  • Develop a union (and intersection and difference) operation
    • I/O Work:
    • I/O Span:
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Fork-Join I/O Model
  • Two main primitives for the Fork Join model
    • Fork: Spawns threads
    • Join: Synchronizes threads
  • Each fork and join spawns and syncs at most threads
  • Each thread may access a block of data in a single memory access
  • All threads share external memory
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Multiway Join
  • I/O Span:
  • Goal: Create B-way join of span
  • Focus on joining just two trees

center

Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Multiway Join
  • Maintain list of pointers along the left and right spines
    • Takes I/O span to generate
    • Use a B-Tree for just the nodes along the spine
  • Finding the node a root merges with takes I/O span
  • We now look to generalize this to merging trees
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
General Algorithm
  • General steps for the algorithm
    • Grouping
    • Fusing Within Groups
    • Dividing Large Nodes
    • Dividing Smaller Nodes
    • Concatenate and Rebalance
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Grouping and Fusing
  • Find all the tallest trees, called a tall tree
  • All other trees are grouped with the closest tall tree to their left
  • Then fuse keys and nodes at the same height
  • I/O Span

center

Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Dividing Large Nodes
  • Suppose a node contains keys
    • When create nodes
    • Push keys to parent node
  • All threads should be synchronized before pushing to avoid write conflicts
  • After, each node contains at most keys at the root
    • Each subsequent dividing of nodes will push up at most 1 node
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Dividing Smaller Nodes

center

Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Concatenate and Rebalance
  • Final step, left with the tall trees that were fused with the smaller trees
  • They will then be concatenated together and rebalanced
    • The heights will differ by at most one
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
What is Left Out
  • Details on a Multiway Split Algorithm
    • Simpler, but requires more background
  • Leveraging the Multiway Join Algorithm
    • Develop an I/O work optimal, I/O span efficient union operation
    • Can be expanded to the intersection and difference operations
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025
Future Work
  • Analyze and develop more algorithms with this model
    • Sorting
    • Buffer Tree
    • Prefix Sum
  • Removing the term entirely from the arbitrary way join
Goodrich, Gu, Kitagawa, Sun UCI May 30, 2025