# Rope Data Structure (C++ Implementation)
This library provides an efficient **rope** implementation in C++ for large, mutable text.
It supports fast insertion, deletion, concatenation, iteration, and regex-based search.
The design uses fixed-size leaf chunks and automatically maintains a balanced binary tree.
## Features
* Efficient handling of large text buffers
* Log-time insertion, deletion, and concatenation
* Optional tunable chunk sizes
* Line, leaf, and byte iterators
* Regex search using PCRE2 (DFA mode)
* Regex using `noose` can also be done but it is much slower for now, I am working on making it faster.
* Zero-copy leaf iteration (safe immutable reads)
* Fully heap-allocated, manual memory management
* Deterministic balancing strategy (AVL-style depth tracking)
## ⚡ Performance Benchmarks
Performance comparison between this Rope implementation and the standard C++ `std::string`.
**Test Context:**
* **Dataset:** ~1 GiB generated text (1,073,741,856 bytes).
* **Goal:** Simulating heavy text-editing operations.
### Key Highlights
When handling large datasets, the structural advantages of the Rope become massive:
* **Appending Large Text:** **2.4 Million x faster** (0.001ms vs 1831ms)
* **Splitting Text:** **162,000x faster**
* **Insert (Middle):** **87,000x faster**
* **Erase:** **15,000x faster**
### Numbers
| Operation | Rope Time | String Time | String to Rope Ratio | Winner |
| :--- | :--- | :--- | :--- | :--- |
| **Concat (Append Large)** | **0.001 ms** | 1831.666 ms | **2,429,265x** | ✅ **Rope** |
| **Split (Half)** | **0.008 ms** | 1353.086 ms | **162,240x** | ✅ **Rope** |
| **Insert (Middle)** | **0.011 ms** | 976.105 ms | **87,206x** | ✅ **Rope** |
| **Erase (5KB)** | **0.013 ms** | 207.760 ms | **15,953x** | ✅ **Rope** |
| **Iterate 1000 Lines** | **0.063 ms** | 5.723 ms | **90x** | ✅ **Rope** |
| **Concat (Append small)** | 0.002 ms | **0.000 ms** | 0.1 | ❌ String |
| **Load / Create** | 1512.449 ms | **871.330 ms** | 0.58x | ❌ String |
| **Free / Destruct** | 194.167 ms | **153.365 ms** | 0.79x | ❌ String |
| **Read / Substr (1KiB)** | 0.008 ms | **0.002 ms** | 0.24x | ❌ String |
| **Search (Regex)** | 6417.719 ms | **1.526 ms** | ~0x | ❌ String |
**Why is the Rope faster?**
Standard strings require contiguous memory. When you insert text into the middle of a 1GB `std::string`, the CPU must shift all subsequent bytes in memory ($O(n)$ complexity). This Rope implementation uses a tree structure, allowing insertions and deletions by simply modifying pointers ($O(\log n)$ complexity).
Also for concatenation ropes have amortized constant time vs strings with linear time.
So smaller concatenation strings are faster but for any larger concatenation the rope is best.
**Why is the String faster for Regex/Read?**
`std::string` is just a flat array of bytes, which is extremely cache-friendly for linear scanning and regex engines. Ropes require tree traversal to read sequential data, resulting in higher overhead for read-only operations.
I have also tested against `ropey` the rust library and rust strings.
Ropey has much faster load times but is slightly slower than my implementation in other operations.
I am not including the tests here as my test might not be in the most optimal condition and so have different results.
The tests included here are the ones that I have done myself with clang++ compiling both tests in the same binary. (see: `tests/benchmark.cpp`)
Also on a side note rust strings are faster than c++ strings apart from reading.
## Rope Node Structure
Each rope node (`Knot`) is either:
* an **internal node** with `left` and `right` children, or
* a **leaf node** containing text data of size `chunk_size`.
Metadata stored per node:
* `depth` – subtree height
* `chunk_size` – leaf capacity
* `line_count` – number of `\n` newline characters
* `char_count` – total byte length covered by the subtree
```c
typedef struct Knot {
Knot *left;
Knot *right;
uint8_t depth;
uint32_t chunk_size;
uint32_t line_count;
uint32_t char_count;
char data[];
} Knot;
```
## Chunk Size
The library supports arbitrary positive chunk sizes, but provides:
```c
uint32_t optimal_chunk_size(uint64_t length);
```
This suggests a chunk size based on the target input size.
Valid range is:
* **MIN_CHUNK_SIZE:** 64 bytes
* **MAX_CHUNK_SIZE:** 8192 bytes
You may choose any value; all ropes participating in concat operations **must use the same chunk size**.
# API Overview
## Construction and Loading
```c
load(char *str, uint32_t len, uint32_t chunk_size)
```
Builds a rope from a raw byte buffer.
`str` is not consumed and may be freed after loading.
## Structural Operations
```c
Knot *concat(Knot *left, Knot *right)
```
Concatenates two ropes (must share chunk size).
Both input roots are invalid after the call.
```c
Knot *insert(Knot *node, uint32_t offset, char *str, uint32_t len)
```
Inserts text at a byte offset.
Returns a new rope.
The original node becomes invalid.
```c
Knot *erase(Knot *node, uint32_t offset, uint32_t len)
```
Deletes a byte range.
Returns a new rope.
The original node becomes invalid.
```c
void split(Knot *node, uint32_t offset, Knot **left, Knot **right)
```
Splits a rope into two ropes at the given byte offset.
The original node becomes invalid.
```c
char *read(Knot *root, uint32_t offset, uint32_t len)
```
Extracts a substring.
Returns a null-terminated buffer; caller must free it.
## Line & Byte Mapping
```c
uint32_t byte_to_line(Knot *root, uint32_t offset)
```
Converts a byte offset into a line index.
```c
uint32_t line_to_byte(Knot *root, uint32_t line, uint32_t *out_len)
```
Returns the byte offset of the beginning of a line and outputs that line’s length.
## Iterators
### Line Iterator
```c
LineIterator *begin_l_iter(Knot *root, uint32_t start_line);
char *next_line(LineIterator *it); // caller frees result
```
### Leaf Iterator
```c
LeafIterator *begin_k_iter(Knot *root);
char *next_leaf(LeafIterator *it); // DO NOT free result
```
### Byte Iterator
```c
ByteIterator *begin_b_iter(Knot *root);
char next_byte(ByteIterator *it);
```
All iterators must be freed by the caller after use.
> For `ByteIterator`, `ByteIterator.it` must be freed before the iterator is freed.
## Searching
```cpp
std::vector> search_rope(Knot *root, const char *pattern)
```
Searches the rope using PCRE2 in DFA mode.
Returns a vector of `(start_offset, length)` pairs.
Only deterministic patterns are supported (no backtracking).
## Memory Management
```c
void free_rope(Knot *root)
```
Recursively frees all nodes in the rope.
Must be called once when the rope is no longer needed.
# Example Usage
```c
uint32_t chunk_size = optimal_chunk_size(strlen(input));
Knot *r = load(input, strlen(input), chunk_size);
r = insert(r, 5, "hello", 5);
r = erase(r, 20, 3);
char *sub = read(r, 10, 25);
printf("%s\n", sub);
free(sub);
free_rope(r);
```