# Rope Data Structure (C++ Implementation) This library provides an efficient **rope** implementation in C++ for large, mutable text. It supports fast insertion, deletion, concatenation, iteration, and regex-based search. The design uses fixed-size leaf chunks and automatically maintains a balanced binary tree. ## Features * Efficient handling of large text buffers * Log-time insertion, deletion, and concatenation * Optional tunable chunk sizes * Line, leaf, and byte iterators * Regex search using PCRE2 (DFA mode) * Regex using `noose` can also be done but it is much slower for now, I am working on making it faster. * Zero-copy leaf iteration (safe immutable reads) * Fully heap-allocated, manual memory management * Deterministic balancing strategy (AVL-style depth tracking) ## ⚡ Performance Benchmarks Performance comparison between this Rope implementation and the standard C++ `std::string`. **Test Context:** * **Dataset:** ~1 GiB generated text (1,073,741,856 bytes). * **Goal:** Simulating heavy text-editing operations. ### Key Highlights When handling large datasets, the structural advantages of the Rope become massive: * **Appending Large Text:** **2.4 Million x faster** (0.001ms vs 1831ms) * **Splitting Text:** **162,000x faster** * **Insert (Middle):** **87,000x faster** * **Erase:** **15,000x faster** ### Numbers | Operation | Rope Time | String Time | String to Rope Ratio | Winner | | :--- | :--- | :--- | :--- | :--- | | **Concat (Append Large)** | **0.001 ms** | 1831.666 ms | **2,429,265x** | ✅ **Rope** | | **Split (Half)** | **0.008 ms** | 1353.086 ms | **162,240x** | ✅ **Rope** | | **Insert (Middle)** | **0.011 ms** | 976.105 ms | **87,206x** | ✅ **Rope** | | **Erase (5KB)** | **0.013 ms** | 207.760 ms | **15,953x** | ✅ **Rope** | | **Iterate 1000 Lines** | **0.063 ms** | 5.723 ms | **90x** | ✅ **Rope** | | **Concat (Append small)** | 0.002 ms | **0.000 ms** | 0.1 | ❌ String | | **Load / Create** | 1512.449 ms | **871.330 ms** | 0.58x | ❌ String | | **Free / Destruct** | 194.167 ms | **153.365 ms** | 0.79x | ❌ String | | **Read / Substr (1KiB)** | 0.008 ms | **0.002 ms** | 0.24x | ❌ String | | **Search (Regex)** | 6417.719 ms | **1.526 ms** | ~0x | ❌ String | **Why is the Rope faster?**
Standard strings require contiguous memory. When you insert text into the middle of a 1GB `std::string`, the CPU must shift all subsequent bytes in memory ($O(n)$ complexity). This Rope implementation uses a tree structure, allowing insertions and deletions by simply modifying pointers ($O(\log n)$ complexity). Also for concatenation ropes have amortized constant time vs strings with linear time.
So smaller concatenation strings are faster but for any larger concatenation the rope is best.
**Why is the String faster for Regex/Read?**
`std::string` is just a flat array of bytes, which is extremely cache-friendly for linear scanning and regex engines. Ropes require tree traversal to read sequential data, resulting in higher overhead for read-only operations. I have also tested against `ropey` the rust library and rust strings.
Ropey has much faster load times but is slightly slower than my implementation in other operations.
I am not including the tests here as my test might not be in the most optimal condition and so have different results.
The tests included here are the ones that I have done myself with clang++ compiling both tests in the same binary. (see: `tests/benchmark.cpp`)
Also on a side note rust strings are faster than c++ strings apart from reading. ## Rope Node Structure Each rope node (`Knot`) is either: * an **internal node** with `left` and `right` children, or * a **leaf node** containing text data of size `chunk_size`. Metadata stored per node: * `depth` – subtree height * `chunk_size` – leaf capacity * `line_count` – number of `\n` newline characters * `char_count` – total byte length covered by the subtree ```c typedef struct Knot { Knot *left; Knot *right; uint8_t depth; uint32_t chunk_size; uint32_t line_count; uint32_t char_count; char data[]; } Knot; ``` ## Chunk Size The library supports arbitrary positive chunk sizes, but provides: ```c uint32_t optimal_chunk_size(uint64_t length); ``` This suggests a chunk size based on the target input size. Valid range is: * **MIN_CHUNK_SIZE:** 64 bytes * **MAX_CHUNK_SIZE:** 8192 bytes You may choose any value; all ropes participating in concat operations **must use the same chunk size**. # API Overview ## Construction and Loading ```c load(char *str, uint32_t len, uint32_t chunk_size) ``` Builds a rope from a raw byte buffer. `str` is not consumed and may be freed after loading. ## Structural Operations ```c Knot *concat(Knot *left, Knot *right) ``` Concatenates two ropes (must share chunk size). Both input roots are invalid after the call. ```c Knot *insert(Knot *node, uint32_t offset, char *str, uint32_t len) ``` Inserts text at a byte offset. Returns a new rope. The original node becomes invalid. ```c Knot *erase(Knot *node, uint32_t offset, uint32_t len) ``` Deletes a byte range. Returns a new rope. The original node becomes invalid. ```c void split(Knot *node, uint32_t offset, Knot **left, Knot **right) ``` Splits a rope into two ropes at the given byte offset. The original node becomes invalid. ```c char *read(Knot *root, uint32_t offset, uint32_t len) ``` Extracts a substring. Returns a null-terminated buffer; caller must free it. ## Line & Byte Mapping ```c uint32_t byte_to_line(Knot *root, uint32_t offset) ``` Converts a byte offset into a line index. ```c uint32_t line_to_byte(Knot *root, uint32_t line, uint32_t *out_len) ``` Returns the byte offset of the beginning of a line and outputs that line’s length. ## Iterators ### Line Iterator ```c LineIterator *begin_l_iter(Knot *root, uint32_t start_line); char *next_line(LineIterator *it); // caller frees result ``` ### Leaf Iterator ```c LeafIterator *begin_k_iter(Knot *root); char *next_leaf(LeafIterator *it); // DO NOT free result ``` ### Byte Iterator ```c ByteIterator *begin_b_iter(Knot *root); char next_byte(ByteIterator *it); ``` All iterators must be freed by the caller after use. > For `ByteIterator`, `ByteIterator.it` must be freed before the iterator is freed. ## Searching ```cpp std::vector> search_rope(Knot *root, const char *pattern) ``` Searches the rope using PCRE2 in DFA mode. Returns a vector of `(start_offset, length)` pairs. Only deterministic patterns are supported (no backtracking). ## Memory Management ```c void free_rope(Knot *root) ``` Recursively frees all nodes in the rope. Must be called once when the rope is no longer needed. # Example Usage ```c uint32_t chunk_size = optimal_chunk_size(strlen(input)); Knot *r = load(input, strlen(input), chunk_size); r = insert(r, 5, "hello", 5); r = erase(r, 20, 3); char *sub = read(r, 10, 25); printf("%s\n", sub); free(sub); free_rope(r); ```