diff --git a/README.md b/README.md index af17d8e..769ce9d 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ Copyright 2025 Syed Daanish Crib is a TUI based text editor built primaririly for personal use.
Crib has a vim-style editor modes system but navigation and shortcuts are very different.
-It supports tree-sitter based text highlighting.
+It supports superfast incremental syntax highlighting.
And LSP for auto-completion, diagnostics, hover docs etc.
It aims to be complete general purpose IDE.
(It is still very much a work in progress so a lot of things may seem incomplete)
@@ -16,7 +16,7 @@ For now it is just a single file editor. I plan to add a multi-file support with ### Get started -Make sure the repo is cloned with submodules to get most of the dependencies. +Make sure the repo is cloned with submodules to get `libgrapheme`. ```bash git clone --recurse-submodules https://git.syedm.dev/SyedM/crib.git @@ -26,7 +26,7 @@ git clone --recurse-submodules https://git.syedm.dev/SyedM/crib.git #### System-wide libraries -Make sure you have the following dependencies installed: +Make sure you have the following dependencies installed (apart from the standard C++ libraries): * **[nlohmann/json](https://github.com/nlohmann/json)** Install it via your package manager. Once installed, the header should be available as: @@ -34,31 +34,26 @@ Make sure you have the following dependencies installed: #include ``` -* **libmagic** - Install it so that you can include it in your code: - ```cpp - #include - ``` - * **[PCRE2](https://github.com/PCRE2Project/pcre2)** Install the library to use its headers: ```cpp #include ``` -It also uses `xclip` at runtime for copying/pasting *(TODO: make it portable)*. +* **libmagic** + Install it so that you can include it in your code (most *nix systems have it installed): + ```cpp + #include + ``` + +It also uses `xclip` at runtime for copying/pasting *(TODO: make it os portable)*. And any modern terminal should work fine - preferably `kitty` or `wezterm`.
#### `./libs` folder -Some other dependancies like `libgrapheme` and `tree-sitter*` and `unicode_width` are added as submodules or copied.
-`unicode_width` is compiled by the makefile so nothing to do there.
-`libgrapheme` needs to be compiled using `make` in it's folder.
-`tree-sitter` needs to be compiled using `make` in it's folder.
-For other tree-sitter grammars, run `make` in their folders except some for which `npm install` needs to be used (see their README.md)
-For any problems with `npm install` make sure to have older versions of node installed.
-For some even manual clang or gcc compilation may be required.
-*TODO: Make a detailed list of how to do compile each*
+Some other dependancies like `libgrapheme` and `unicode_width` are added as submodules or copied.
+ `unicode_width` is compiled by the makefile so nothing to do there.
+ `libgrapheme` needs to be compiled using `make` in it's folder.
#### LSPs @@ -93,8 +88,7 @@ The following lsp's are supported and can be installed anywhere in your `$PATH`< #### Compiler `g++` and `clang++` should both work fine but `c++20+` is required. -The makefile has been set to use g++ if made with `make test` and clang++ if made with `make release`
-This can be changed but I have found clang++ builds to be slightly faster - also test builds do not have the flags needed to be used system wide or any optimizations.
+The makefile uses `clang++` by default.
Can remove `ccache` if you want from the makefile.
#### Compliling @@ -105,8 +99,8 @@ make release ### Running -Preferably add `bin` folder to PATH or move `bin/crib` to somewhere in PATH.
-But make sure that `scripts/` and `grammar/` are at `../` relative to the binary or it will crash.
+Preferably add the `bin` folder to PATH or move `bin/crib` to somewhere in PATH.
+But make sure that `scripts/` are at `../` relative to the binary or it will crash.
`scripts/init.sh` and `scripts/exit.sh` can be used to add hooks to the editor on startup and exit (Make sure to remove my `kitty` hooks from them if you want).
For some LSP's to work properly `crib` needs to be run from the root folder of the project. *To be fixed*
@@ -256,37 +250,38 @@ Activated by `:` or `;`. - hooks jumping (bookmarking) - color hex code highlighting - current line highlighting -- current word under cursor highlighting + -#### Tree-sitter syntax highlighting and filetype detection (using extention or libmagic) for: -- bash -- c/cpp (and headers) -- css -- fish -- go/gomod -- haskell -- html/erb -- javascript -- typescript/tsx -- json/jsonc +#### syntax highlighting and filetype detection (using extention or libmagic) for: - ruby -- lua -- python -- rust -- php -- markdown -- nginx -- toml -- yaml -- sql -- make -- gdscript -- man pages -- diff/patch -- gitattributes/gitignore -- tree-sitter queries -- regex -- ini + + + + + + + + + + + + + + + + + + + + + + + + + + + + #### LSP-powered features: - diagnostics diff --git a/TODO.md b/TODO.md index acd0c2b..f518bdb 100644 --- a/TODO.md +++ b/TODO.md @@ -8,18 +8,17 @@ Copyright 2025 Syed Daanish * [ ] **LSP Bug:** Check why `fish-lsp` is behaving so off with completions filtering. * [ ] **File IO:** Normalize/validate unicode on file open (enforce UTF-8, handle other types gracefully). -* [ ] **Critical Crash:** Fix bug where closing immediately while LSP is still loading hangs and then segfaults (especially on slow ones like fish-lsp). -* [ ] **Navigation Bug:** Fix bug where `Alt+Up` at EOF adds an extra line. -* [ ] **Modularize handle_events functions:** The function is over 700 lines with a lot of repeating blocks. Split into smaller functions. -* [ ] **Editor Indentation Fix:** +* [ ] **Critical Crash:** Fix bug where closing immediately while LSP is still loading hangs and then segfaults (especially on slow ones like fish-lsp where quick edits and exit can hang). +* [ ] **Line move:** fix the move line functions to work without the calculations from folds as folds are removed. +* [ ] **Modularize handle_events and renderer functions:** The function is over 700 lines with a lot of repeating blocks. Split into smaller functions. +* [ ] **Editor Indentation Fix:** - Main : merger indentation with the parser for more accurate results. * [ ] Keep cache of language maps in engine to reduce lookup time. * [ ] In indents add function to support tab which indents if before any content and inserts a pure \t otherwise. * [ ] And backspace which undents if before any content. * [ ] Add block indentation support. - * [ ] Ignore comments/strings (maybe as-set by tree-sitter) when auto-indenting. - * [ ] Just use span cursor to avoid strings/comments.. And use another map for c-style single line block and add stuff like operators to it. + * [ ] Ignore comments/strings from parser when auto-indenting. * [ ] These will dedent when the block immediately after them is dedented - * [ ] Dont dedent is ending is valid starting is invalid but also empty + * [ ] Dont dedent if ending is valid starting is invalid but also empty * [ ] Just leave asis if starting is empty * [ ] **Readme:** Update readme to show indetation mechanics. * [ ] **LSP Bug:** Try to find out why emojis are breaking lsp edits. (check the ruby sample) @@ -35,8 +34,9 @@ Copyright 2025 Syed Daanish * make it faster for line inserts/deletes too (treeify the vector) * Try to make all functions better now that folds have been purged * Cleanup syntax and renderer files +* Fix ruby regexp not living across lines when edits are made -### Core Editing Mechanics +* for ruby regex use hueristic where is a space is seen after the / it is not a regexp * [ ] **Undo/Redo:** Add support for undo/redo history. diff --git a/include/syntax/decl.h b/include/syntax/decl.h index b3f28aa..bbf1c75 100644 --- a/include/syntax/decl.h +++ b/include/syntax/decl.h @@ -73,19 +73,76 @@ struct Highlight { uint8_t flags; }; -inline static const std::unordered_map highlight_map = { - {0, {0xFFFFFF, 0, 0}}, {1, {0xAAAAAA, 0, CF_ITALIC}}, - {2, {0xAAD94C, 0, 0}}, {3, {0xFFFFFF, 0, CF_ITALIC}}, - {4, {0xFF8F40, 0, 0}}, {5, {0xFFB454, 0, 0}}, - {6, {0xD2A6FF, 0, 0}}, {7, {0x95E6CB, 0, 0}}, - {8, {0xF07178, 0, 0}}, {9, {0xE6C08A, 0, 0}}, - {10, {0x7dcfff, 0, 0}}, +enum struct TokenKind : uint8_t { +#define ADD(name) name, +#include "syntax/tokens.def" +#undef ADD + Count }; +constexpr size_t TOKEN_KIND_COUNT = static_cast(TokenKind::Count); + +const std::unordered_map kind_map = { +#define ADD(name) {#name, TokenKind::name}, +#include "syntax/tokens.def" +#undef ADD +}; + +extern std::array highlights; + +inline void load_theme(std::string filename) { + uint32_t len = 0; + char *raw = load_file(filename.c_str(), &len); + if (!raw) + return; + std::string data(raw, len); + free(raw); + json j = json::parse(data); + Highlight default_hl = {0xFFFFFF, 0, 0}; + if (j.contains("Default")) { + auto def = j["Default"]; + if (def.contains("fg") && def["fg"].is_string()) + default_hl.fg = HEX(def["fg"]); + if (def.contains("bg") && def["bg"].is_string()) + default_hl.bg = HEX(def["bg"]); + if (def.contains("italic") && def["italic"].get()) + default_hl.flags |= CF_ITALIC; + if (def.contains("bold") && def["bold"].get()) + default_hl.flags |= CF_BOLD; + if (def.contains("underline") && def["underline"].get()) + default_hl.flags |= CF_UNDERLINE; + if (def.contains("strikethrough") && def["strikethrough"].get()) + default_hl.flags |= CF_STRIKETHROUGH; + } + for (auto &hl : highlights) + hl = default_hl; + for (auto &[key, value] : j.items()) { + if (key == "Default") + continue; + auto it = kind_map.find(key); + if (it == kind_map.end()) + continue; + Highlight hl = {0xFFFFFF, 0, 0}; + if (value.contains("fg") && value["fg"].is_string()) + hl.fg = HEX(value["fg"]); + if (value.contains("bg") && value["bg"].is_string()) + hl.bg = HEX(value["bg"]); + if (value.contains("italic") && value["italic"].get()) + hl.flags |= CF_ITALIC; + if (value.contains("bold") && value["bold"].get()) + hl.flags |= CF_BOLD; + if (value.contains("underline") && value["underline"].get()) + hl.flags |= CF_UNDERLINE; + if (value.contains("strikethrough") && value["strikethrough"].get()) + hl.flags |= CF_STRIKETHROUGH; + highlights[static_cast(it->second)] = hl; + } +} + struct Token { uint32_t start; uint32_t end; - uint8_t type; + TokenKind type; }; struct LineData { diff --git a/include/syntax/langs.h b/include/syntax/langs.h index 3ce5519..6e6b27a 100644 --- a/include/syntax/langs.h +++ b/include/syntax/langs.h @@ -10,9 +10,27 @@ bool name##_state_match(std::shared_ptr state_1, \ std::shared_ptr state_2); -#define LANG_A(name) {name##_parse, name##_state_match} +#define LANG_A(name) \ + { \ + #name, { name##_parse, name##_state_match } \ + } + +template +inline std::shared_ptr ensure_state(std::shared_ptr state) { + using U = typename T::full_state_type; + if (!state) + state = std::make_shared(); + if (!state.unique()) + state = std::make_shared(*state); + if (!state->full_state) + state->full_state = std::make_shared(); + else if (!state->full_state.unique()) + state->full_state = std::make_shared(*state->full_state); + return state; +} DEF_LANG(ruby); +DEF_LANG(bash); inline static const std::unordered_map< std::string, @@ -22,7 +40,8 @@ inline static const std::unordered_map< bool (*)(std::shared_ptr state_1, std::shared_ptr state_2)>> parsers = { - {"ruby", LANG_A(ruby)}, + LANG_A(ruby), + LANG_A(bash), }; #endif diff --git a/include/syntax/line_tree.h b/include/syntax/line_tree.h index f9076ff..cdc4810 100644 --- a/include/syntax/line_tree.h +++ b/include/syntax/line_tree.h @@ -1,212 +1,233 @@ -// #include "syntax/decl.h" -// -// struct LineTree { -// void clear() { -// clear_node(root); -// root = nullptr; -// stack_size = 0; -// } -// void build(uint32_t x) { root = build_node(x); } -// LineData *at(uint32_t x) { -// LineNode *n = root; -// while (n) { -// uint32_t left_size = n->left ? n->left->size : 0; -// if (x < left_size) { -// n = n->left; -// } else if (x < left_size + n->data.size()) { -// return &n->data[x - left_size]; -// } else { -// x -= left_size + n->data.size(); -// n = n->right; -// } -// } -// return nullptr; -// } -// LineData *start_iter(uint32_t x) { -// stack_size = 0; -// LineNode *n = root; -// while (n) { -// uint32_t left_size = n->left ? n->left->size : 0; -// if (x < left_size) { -// push(n, 0); -// n = n->left; -// } else if (x < left_size + n->data.size()) { -// push(n, x - left_size + 1); -// return &n->data[x - left_size]; -// } else { -// x -= left_size + n->data.size(); -// push(n, UINT32_MAX); -// n = n->right; -// } -// } -// return nullptr; -// } -// void end_iter() { stack_size = 0; } -// LineData *next() { -// while (stack_size) { -// auto &f = stack[stack_size - 1]; -// LineNode *n = f.node; -// if (f.index < n->data.size()) -// return &n->data[f.index++]; -// stack_size--; -// if (n->right) { -// n = n->right; -// while (n) { -// push(n, 0); -// if (!n->left) -// break; -// n = n->left; -// } -// return &stack[stack_size - 1].node->data[0]; -// } -// } -// return nullptr; -// } -// void insert(uint32_t x, uint32_t y) { root = insert_node(root, x, y); } -// void erase(uint32_t x, uint32_t y) { root = erase_node(root, x, y); } -// uint32_t count() { return subtree_size(root); } -// ~LineTree() { clear(); } -// -// private: -// struct LineNode { -// LineNode *left = nullptr; -// LineNode *right = nullptr; -// uint8_t depth = 1; -// uint32_t size = 0; -// std::vector data; -// }; -// struct Frame { -// LineNode *node; -// uint32_t index; -// }; -// void push(LineNode *n, uint32_t x) { -// stack[stack_size].node = n; -// stack[stack_size].index = x; -// stack_size++; -// } -// static void clear_node(LineNode *n) { -// if (!n) -// return; -// clear_node(n->left); -// clear_node(n->right); -// delete n; -// } -// LineNode *root = nullptr; -// Frame stack[32]; -// uint8_t stack_size = 0; -// static constexpr uint32_t LEAF_TARGET = 256; -// LineTree::LineNode *erase_node(LineNode *n, uint32_t x, uint32_t y) { -// if (!n) -// return nullptr; -// if (!n->left && !n->right) { -// n->data.erase(n->data.begin() + x, n->data.begin() + x + y); -// fix(n); -// return n; -// } -// uint32_t left_size = subtree_size(n->left); -// if (x < left_size) -// n->left = erase_node(n->left, x, y); -// else -// n->right = erase_node(n->right, x - left_size - n->data.size(), y); -// if (n->left && n->right && -// subtree_size(n->left) + subtree_size(n->right) < 256) { -// return merge(n->left, n->right); -// } -// return rebalance(n); -// } -// LineTree::LineNode *insert_node(LineNode *n, uint32_t x, uint32_t y) { -// if (!n) { -// auto *leaf = new LineNode(); -// leaf->data.resize(y); -// leaf->size = y; -// return leaf; -// } -// if (!n->left && !n->right) { -// n->data.insert(n->data.begin() + x, y, LineData{}); -// fix(n); -// if (n->data.size() > 512) -// return split_leaf(n); -// return n; -// } -// uint32_t left_size = subtree_size(n->left); -// if (x <= left_size) -// n->left = insert_node(n->left, x, y); -// else -// n->right = insert_node(n->right, x - left_size - n->data.size(), y); -// return rebalance(n); -// } -// LineNode *build_node(uint32_t count) { -// if (count <= LEAF_TARGET) { -// auto *n = new LineNode(); -// n->data.resize(count); -// n->size = count; -// return n; -// } -// uint32_t left_count = count / 2; -// uint32_t right_count = count - left_count; -// auto *n = new LineNode(); -// n->left = build_node(left_count); -// n->right = build_node(right_count); -// fix(n); -// return n; -// } -// static LineNode *split_leaf(LineNode *n) { -// auto *right = new LineNode(); -// size_t mid = n->data.size() / 2; -// right->data.assign(n->data.begin() + mid, n->data.end()); -// n->data.resize(mid); -// fix(n); -// fix(right); -// auto *parent = new LineNode(); -// parent->left = n; -// parent->right = right; -// fix(parent); -// return parent; -// } -// static LineNode *merge(LineNode *a, LineNode *b) { -// a->data.insert(a->data.end(), b->data.begin(), b->data.end()); -// delete b; -// fix(a); -// return a; -// } -// static void fix(LineNode *n) { -// n->depth = 1 + MAX(height(n->left), height(n->right)); -// n->size = subtree_size(n->left) + n->data.size() + -// subtree_size(n->right); -// } -// static LineNode *rotate_right(LineNode *y) { -// LineNode *x = y->left; -// LineNode *T2 = x->right; -// x->right = y; -// y->left = T2; -// fix(y); -// fix(x); -// return x; -// } -// static LineNode *rotate_left(LineNode *x) { -// LineNode *y = x->right; -// LineNode *T2 = y->left; -// y->left = x; -// x->right = T2; -// fix(x); -// fix(y); -// return y; -// } -// static LineNode *rebalance(LineNode *n) { -// fix(n); -// int balance = int(height(n->left)) - int(height(n->right)); -// if (balance > 1) { -// if (height(n->left->left) < height(n->left->right)) -// n->left = rotate_left(n->left); -// return rotate_right(n); -// } -// if (balance < -1) { -// if (height(n->right->right) < height(n->right->left)) -// n->right = rotate_right(n->right); -// return rotate_left(n); -// } -// return n; -// } -// static uint8_t height(LineNode *n) { return n ? n->depth : 0; } -// static uint32_t subtree_size(LineNode *n) { return n ? n->size : 0; } -// }; +#ifndef LINE_TREE_H +#define LINE_TREE_H + +#include "syntax/decl.h" + +struct LineTree { + void clear() { + std::unique_lock lock(mtx); + clear_node(root); + root = nullptr; + stack_size = 0; + } + void build(uint32_t x) { + std::unique_lock lock(mtx); + root = build_node(x); + } + LineData *at(uint32_t x) { + std::shared_lock lock(mtx); + LineNode *n = root; + while (n) { + uint32_t left_size = n->left ? n->left->size : 0; + if (x < left_size) { + n = n->left; + } else if (x < left_size + n->data.size()) { + return &n->data[x - left_size]; + } else { + x -= left_size + n->data.size(); + n = n->right; + } + } + return nullptr; + } + LineData *start_iter(uint32_t x) { + std::shared_lock lock(mtx); + stack_size = 0; + LineNode *n = root; + while (n) { + uint32_t left_size = n->left ? n->left->size : 0; + if (x < left_size) { + push(n, 0); + n = n->left; + } else if (x < left_size + n->data.size()) { + push(n, x - left_size + 1); + return &n->data[x - left_size]; + } else { + x -= left_size + n->data.size(); + push(n, UINT32_MAX); + n = n->right; + } + } + return nullptr; + } + void end_iter() { stack_size = 0; } + LineData *next() { + std::shared_lock lock(mtx); + while (stack_size) { + auto &f = stack[stack_size - 1]; + LineNode *n = f.node; + if (f.index < n->data.size()) + return &n->data[f.index++]; + stack_size--; + if (n->right) { + n = n->right; + while (n) { + push(n, 0); + if (!n->left) + break; + n = n->left; + } + return &stack[stack_size - 1].node->data[0]; + } + } + return nullptr; + } + void insert(uint32_t x, uint32_t y) { + std::unique_lock lock(mtx); + root = insert_node(root, x, y); + } + void erase(uint32_t x, uint32_t y) { + std::unique_lock lock(mtx); + root = erase_node(root, x, y); + } + uint32_t count() { + std::shared_lock lock(mtx); + return subtree_size(root); + } + ~LineTree() { clear(); } + +private: + struct LineNode { + LineNode *left = nullptr; + LineNode *right = nullptr; + uint8_t depth = 1; + uint32_t size = 0; + std::vector data; + }; + struct Frame { + LineNode *node; + uint32_t index; + }; + void push(LineNode *n, uint32_t x) { + stack[stack_size].node = n; + stack[stack_size].index = x; + stack_size++; + } + static void clear_node(LineNode *n) { + if (!n) + return; + clear_node(n->left); + clear_node(n->right); + delete n; + } + LineNode *root = nullptr; + Frame stack[32]; + std::atomic stack_size = 0; + std::shared_mutex mtx; + static constexpr uint32_t LEAF_TARGET = 256; + LineTree::LineNode *erase_node(LineNode *n, uint32_t x, uint32_t y) { + if (!n) + return nullptr; + if (!n->left && !n->right) { + n->data.erase(n->data.begin() + x, n->data.begin() + x + y); + fix(n); + return n; + } + uint32_t left_size = subtree_size(n->left); + if (x < left_size) + n->left = erase_node(n->left, x, y); + else + n->right = erase_node(n->right, x - left_size - n->data.size(), y); + if (n->left && n->right && + subtree_size(n->left) + subtree_size(n->right) < 256) { + return merge(n->left, n->right); + } + return rebalance(n); + } + LineTree::LineNode *insert_node(LineNode *n, uint32_t x, uint32_t y) { + if (!n) { + auto *leaf = new LineNode(); + leaf->data.resize(y); + leaf->size = y; + return leaf; + } + if (!n->left && !n->right) { + n->data.insert(n->data.begin() + x, y, LineData{}); + fix(n); + if (n->data.size() > 512) + return split_leaf(n); + return n; + } + uint32_t left_size = subtree_size(n->left); + if (x <= left_size) + n->left = insert_node(n->left, x, y); + else + n->right = insert_node(n->right, x - left_size - n->data.size(), y); + return rebalance(n); + } + LineNode *build_node(uint32_t count) { + if (count <= LEAF_TARGET) { + auto *n = new LineNode(); + n->data.resize(count); + n->size = count; + return n; + } + uint32_t left_count = count / 2; + uint32_t right_count = count - left_count; + auto *n = new LineNode(); + n->left = build_node(left_count); + n->right = build_node(right_count); + fix(n); + return n; + } + static LineNode *split_leaf(LineNode *n) { + auto *right = new LineNode(); + size_t mid = n->data.size() / 2; + right->data.assign(n->data.begin() + mid, n->data.end()); + n->data.resize(mid); + fix(n); + fix(right); + auto *parent = new LineNode(); + parent->left = n; + parent->right = right; + fix(parent); + return parent; + } + static LineNode *merge(LineNode *a, LineNode *b) { + a->data.insert(a->data.end(), b->data.begin(), b->data.end()); + delete b; + fix(a); + return a; + } + static void fix(LineNode *n) { + n->depth = 1 + MAX(height(n->left), height(n->right)); + n->size = subtree_size(n->left) + n->data.size() + subtree_size(n->right); + } + static LineNode *rotate_right(LineNode *y) { + LineNode *x = y->left; + LineNode *T2 = x->right; + x->right = y; + y->left = T2; + fix(y); + fix(x); + return x; + } + static LineNode *rotate_left(LineNode *x) { + LineNode *y = x->right; + LineNode *T2 = y->left; + y->left = x; + x->right = T2; + fix(x); + fix(y); + return y; + } + static LineNode *rebalance(LineNode *n) { + fix(n); + int balance = int(height(n->left)) - int(height(n->right)); + if (balance > 1) { + if (height(n->left->left) < height(n->left->right)) + n->left = rotate_left(n->left); + return rotate_right(n); + } + if (balance < -1) { + if (height(n->right->right) < height(n->right->left)) + n->right = rotate_right(n->right); + return rotate_left(n); + } + return n; + } + static uint8_t height(LineNode *n) { return n ? n->depth : 0; } + static uint32_t subtree_size(LineNode *n) { return n ? n->size : 0; } +}; + +#endif diff --git a/include/syntax/parser.h b/include/syntax/parser.h index bf5db3b..d30cd8a 100644 --- a/include/syntax/parser.h +++ b/include/syntax/parser.h @@ -1,4 +1,8 @@ +#ifndef SYNTAX_PARSER_H +#define SYNTAX_PARSER_H + #include "syntax/decl.h" +#include "syntax/line_tree.h" struct Parser { Knot *root; @@ -12,7 +16,7 @@ struct Parser { std::atomic scroll_max{UINT32_MAX - 2048}; std::mutex mutex; std::mutex data_mutex; - std::vector line_data; + LineTree line_tree; std::set dirty_lines; Parser(Knot *n_root, std::shared_mutex *n_knot_mutex, std::string n_lang, @@ -21,13 +25,6 @@ struct Parser { uint32_t new_end_line); void work(); void scroll(uint32_t line); - uint8_t get_type(Coord c) { - if (c.row >= line_data.size()) - return 0; - const LineData &line = line_data[c.row]; - for (const Token &t : line.tokens) - if (t.start <= c.col && c.col < t.end) - return t.type; - return 0; - } }; + +#endif diff --git a/include/syntax/tokens.def b/include/syntax/tokens.def new file mode 100644 index 0000000..9c143ff --- /dev/null +++ b/include/syntax/tokens.def @@ -0,0 +1,51 @@ +ADD(Data) +ADD(Comment) +ADD(String) +ADD(Escape) +ADD(Interpolation) +ADD(Regexp) +ADD(Number) +ADD(True) +ADD(False) +ADD(Char) +ADD(Keyword) +ADD(KeywordOperator) +ADD(Operator) +ADD(Function) +ADD(Type) +ADD(Constant) +ADD(VariableInstance) +ADD(VariableGlobal) +ADD(Annotation) +ADD(Directive) +ADD(Label) +ADD(Brace1) +ADD(Brace2) +ADD(Brace3) +ADD(Brace4) +ADD(Brace5) +ADD(Heading1) +ADD(Heading2) +ADD(Heading3) +ADD(Heading4) +ADD(Heading5) +ADD(Heading6) +ADD(Blockquote) +ADD(List) +ADD(ListItem) +ADD(Code) +ADD(LanguageName) +ADD(LinkLabel) +ADD(ImageLabel) +ADD(Link) +ADD(Table) +ADD(TableHeader) +ADD(Italic) +ADD(Bold) +ADD(Underline) +ADD(Strikethrough) +ADD(HorixontalRule) +ADD(Tag) +ADD(Attribute) +ADD(CheckDone) +ADD(CheckNotDone) diff --git a/include/utils/utils.h b/include/utils/utils.h index 9105be9..322c666 100644 --- a/include/utils/utils.h +++ b/include/utils/utils.h @@ -71,6 +71,14 @@ struct Language { #define UNUSED(x) (void)(x) #define USING(x) UNUSED(sizeof(x)) +inline uint32_t HEX(const std::string &s) { + if (s.empty()) + return 0xFFFFFF; + size_t start = (s.front() == '#') ? 1 : 0; + return static_cast(std::stoul(s.substr(start), nullptr, 16)); +} + +bool compare(const char *a, const char *b, size_t n); std::string clean_text(const std::string &input); std::string percent_encode(const std::string &s); std::string percent_decode(const std::string &s); diff --git a/samples/ruby.rb b/samples/ruby.rb index dcebc78..bbeefe8 100644 --- a/samples/ruby.rb +++ b/samples/ruby.rb @@ -22,13 +22,29 @@ cjk_samples = [ ] # Ruby regex with unicode -$unicode_regex = /[一-龯ぁ-んァ-ヶー々〆〤]/ +$unicode_regex = /[一-龯ぁ-ん#{0x3000}ァ +\-ヶー +s wow + +々〆〤]/ + +UNICORE = %r{ + + {#{}} + + } + +UNINITCORE = %{ + + {{#{}}} + + } # Unicode identifiers (valid in Ruby) 变量 = 0x5_4eddaee π = 3.14_159e+2, ?\u0234, ?\,, ?\x0A, ?s -挨拶 = -> { "こんにちは" } +挨拶 = -> { "こんに \n ちは" } # Method using unicode variable names def math_test @@ -53,7 +69,7 @@ multi = <lang.name != "unknown") editor->parser = new Parser(editor->root, &editor->knot_mtx, editor->lang.name, size.row + 5); - // if (len <= (1024 * 28)) - // request_add_to_lsp(editor->lang, editor); + if (len <= (1024 * 28)) + request_add_to_lsp(editor->lang, editor); editor->indents.compute_indent(editor); return editor; } diff --git a/src/editor/renderer.cc b/src/editor/renderer.cc index 8cdcf19..832a48b 100644 --- a/src/editor/renderer.cc +++ b/src/editor/renderer.cc @@ -1,5 +1,7 @@ #include "editor/editor.h" #include "main.h" +#include "syntax/decl.h" +#include "syntax/parser.h" void render_editor(Editor *editor) { uint32_t sel_start = 0, sel_end = 0; @@ -23,6 +25,15 @@ void render_editor(Editor *editor) { std::unique_lock lock; if (editor->parser) lock = std::unique_lock(editor->parser->mutex); + LineData *line_data = nullptr; + auto get_type = [&](uint32_t col) { + if (!line_data) + return 0; + for (auto const &token : line_data->tokens) + if (token.start <= col && token.end > col) + return (int)token.type; + return 0; + }; std::shared_lock knot_lock(editor->knot_mtx); if (editor->selection_active) { Coord start, end; @@ -82,6 +93,10 @@ void render_editor(Editor *editor) { while (rendered_rows < editor->size.row) { uint32_t line_len; char *line = next_line(it, &line_len); + if (line_data) + line_data = editor->parser->line_tree.next(); + else + line_data = editor->parser->line_tree.start_iter(line_index); if (!line) break; if (line_len > 0 && line[line_len - 1] == '\n') @@ -140,9 +155,8 @@ void render_editor(Editor *editor) { uint32_t absolute_byte_pos = global_byte_offset + current_byte_offset + local_render_offset; const Highlight *hl = nullptr; - if (editor->parser && editor->parser->line_data.size() > line_index) - hl = &highlight_map.at(editor->parser->get_type( - {line_index, current_byte_offset + local_render_offset})); + if (editor->parser) + hl = &highlights[get_type(current_byte_offset + local_render_offset)]; uint32_t fg = hl ? hl->fg : 0xFFFFFF; uint32_t bg = hl ? hl->bg : 0; uint8_t fl = hl ? hl->flags : 0; diff --git a/src/lsp/process.cc b/src/lsp/process.cc index d93165e..bedc886 100644 --- a/src/lsp/process.cc +++ b/src/lsp/process.cc @@ -16,19 +16,11 @@ static bool init_lsp(std::shared_ptr lsp) { if (pid == 0) { dup2(in_pipe[0], STDIN_FILENO); dup2(out_pipe[1], STDOUT_FILENO); -#ifdef __clang__ int devnull = open("/dev/null", O_WRONLY); if (devnull >= 0) { dup2(devnull, STDERR_FILENO); close(devnull); } -#else - int log = open("/tmp/lsp.log", O_WRONLY | O_CREAT | O_TRUNC, 0644); - if (log >= 0) { - dup2(log, STDERR_FILENO); - close(log); - } -#endif close(in_pipe[0]); close(in_pipe[1]); close(out_pipe[0]); diff --git a/src/main.cc b/src/main.cc index a406bb0..a3bfc53 100644 --- a/src/main.cc +++ b/src/main.cc @@ -2,6 +2,7 @@ #include "editor/editor.h" #include "io/sysio.h" #include "lsp/lsp.h" +#include "syntax/decl.h" #include "ui/bar.h" #include "utils/utils.h" @@ -61,6 +62,8 @@ int main(int argc, char *argv[]) { system(("bash " + get_exe_dir() + "/../scripts/init.sh").c_str()); + load_theme(get_exe_dir() + "/../themes/default.json"); + Editor *editor = new_editor(filename, {0, 0}, {screen.row - 2, screen.col}); Bar bar(screen); diff --git a/src/syntax/bash.cc b/src/syntax/bash.cc new file mode 100644 index 0000000..e7ce9e3 --- /dev/null +++ b/src/syntax/bash.cc @@ -0,0 +1,73 @@ +#include "syntax/decl.h" +#include "syntax/langs.h" +#include "utils/utils.h" + +struct BashFullState { + int brace_level = 0; + + enum : uint8_t { NONE, STRING, HEREDOC }; + uint8_t in_state = BashFullState::NONE; + + bool line_cont = false; + + struct Lit { + std::string delim = ""; + int brace_level = 1; + bool allow_interp = false; + + bool operator==(const BashFullState::Lit &other) const { + return delim == other.delim && brace_level == other.brace_level && + allow_interp == other.allow_interp; + } + } lit; + + bool operator==(const BashFullState &other) const { + return in_state == other.in_state && lit == other.lit && + brace_level == other.brace_level && line_cont == other.line_cont; + } +}; + +struct BashState { + using full_state_type = BashFullState; + + int interp_level = 0; + std::stack> interp_stack; + std::shared_ptr full_state; + + bool operator==(const BashState &other) const { + return interp_level == other.interp_level && + interp_stack == other.interp_stack && + ((full_state && other.full_state && + *full_state == *other.full_state)); + } +}; + +bool bash_state_match(std::shared_ptr state_1, + std::shared_ptr state_2) { + if (!state_1 || !state_2) + return false; + return *std::static_pointer_cast(state_1) == + *std::static_pointer_cast(state_2); +} + +std::shared_ptr bash_parse(std::vector *tokens, + std::shared_ptr in_state, + const char *text, uint32_t len) { + static bool keywords_trie_init = false; + if (!keywords_trie_init) { + keywords_trie_init = true; + } + tokens->clear(); + auto state = ensure_state(std::static_pointer_cast(in_state)); + uint32_t i = 0; + while (len > 0 && (text[len - 1] == '\n' || text[len - 1] == '\r' || + text[len - 1] == '\t' || text[len - 1] == ' ')) + len--; + if (len == 0) + return state; + bool heredoc_first = false; + while (i < len) { + i += utf8_codepoint_width(text[i]); + } + return state; +} diff --git a/src/syntax/syntax.cc b/src/syntax/parser.cc similarity index 73% rename from src/syntax/syntax.cc rename to src/syntax/parser.cc index 682ebdf..16d9ac4 100644 --- a/src/syntax/syntax.cc +++ b/src/syntax/parser.cc @@ -1,12 +1,14 @@ +#include "syntax/parser.h" #include "io/knot.h" #include "main.h" +#include "syntax/decl.h" #include "syntax/langs.h" -#include "syntax/parser.h" + +std::array highlights = {}; Parser::Parser(Knot *n_root, std::shared_mutex *n_knot_mutex, std::string n_lang, uint32_t n_scroll_max) { scroll_max = n_scroll_max; - line_data.reserve(n_root->line_count + 1); knot_mutex = n_knot_mutex; lang = n_lang; auto pair = parsers.find(n_lang); @@ -24,11 +26,9 @@ void Parser::edit(Knot *n_root, uint32_t start_line, uint32_t old_end_line, std::lock_guard lock(data_mutex); root = n_root; if (((int64_t)old_end_line - (int64_t)start_line) > 0) - line_data.erase(line_data.begin() + start_line, - line_data.begin() + start_line + old_end_line - start_line); + line_tree.erase(start_line + 1, old_end_line - start_line); if (((int64_t)new_end_line - (int64_t)old_end_line) > 0) - line_data.insert(line_data.begin() + start_line, - new_end_line - old_end_line, LineData{}); + line_tree.insert(start_line + 1, new_end_line - start_line); dirty_lines.insert(start_line); } @@ -42,16 +42,18 @@ void Parser::work() { tmp_dirty.swap(dirty_lines); lock_data.unlock(); std::set remaining_dirty; + std::unique_lock lock(mutex); + lock.unlock(); for (uint32_t c_line : tmp_dirty) { if (c_line > scroll_max) { remaining_dirty.insert(c_line); continue; } - std::unique_lock lock(mutex); - uint32_t line_count = (uint32_t)line_data.size(); + uint32_t line_count = line_tree.count(); + lock_data.lock(); std::shared_ptr prev_state = - (c_line > 0) ? line_data[c_line - 1].out_state : nullptr; - lock.unlock(); + (c_line > 0) ? line_tree.at(c_line - 1)->out_state : nullptr; + lock_data.unlock(); while (c_line < line_count) { if (!running.load(std::memory_order_relaxed)) { free(text); @@ -70,14 +72,17 @@ void Parser::work() { if (c_line < scroll_max && ((scroll_max > 100 && c_line > scroll_max - 100) || c_line < 100)) lock.lock(); + if (line_tree.count() < c_line) { + if (lock.owns_lock()) + lock.unlock(); + continue; + } lock_data.lock(); + LineData *line_data = line_tree.at(c_line); std::shared_ptr new_state = - parse_func(&line_data[c_line].tokens, prev_state, text, r_len); - lock_data.unlock(); - line_data[c_line].in_state = prev_state; - line_data[c_line].out_state = new_state; - if (lock.owns_lock()) - lock.unlock(); + parse_func(&line_data->tokens, prev_state, text, r_len); + line_data->in_state = prev_state; + line_data->out_state = new_state; if (!running.load(std::memory_order_relaxed)) { free(text); return; @@ -85,16 +90,24 @@ void Parser::work() { prev_state = new_state; c_line++; if (c_line < line_count && c_line > scroll_max + 50) { + lock_data.unlock(); + if (lock.owns_lock()) + lock.unlock(); if (c_line > 0) remaining_dirty.insert(c_line - 1); remaining_dirty.insert(c_line); break; } - lock.lock(); if (c_line < line_count && - state_match_func(prev_state, line_data[c_line].in_state)) + state_match_func(prev_state, line_tree.at(c_line)->in_state)) { + lock_data.unlock(); + if (lock.owns_lock()) + lock.unlock(); break; - lock.unlock(); + } + lock_data.unlock(); + if (lock.owns_lock()) + lock.unlock(); } if (!running.load(std::memory_order_relaxed)) { free(text); @@ -110,20 +123,20 @@ void Parser::scroll(uint32_t line) { if (line != scroll_max) { scroll_max = line; uint32_t c_line = line > 100 ? line - 100 : 0; - if (line_data.size() < c_line) + if (line_tree.count() < c_line) return; - if (line_data[c_line].in_state || line_data[c_line].out_state) + std::unique_lock lock_data(data_mutex); + if (line_tree.at(c_line)->in_state || line_tree.at(c_line)->out_state) return; + lock_data.unlock(); std::shared_lock k_lock(*knot_mutex); k_lock.unlock(); uint32_t capacity = 256; char *text = (char *)calloc((capacity + 1), sizeof(char)); - std::unique_lock lock_data(data_mutex); - lock_data.unlock(); + uint32_t line_count = line_tree.count(); std::unique_lock lock(mutex); - uint32_t line_count = (uint32_t)line_data.size(); std::shared_ptr prev_state = - (c_line > 0) ? line_data[c_line - 1].out_state : nullptr; + (c_line > 0) ? line_tree.at(c_line - 1)->out_state : nullptr; lock.unlock(); while (c_line < line_count) { if (!running.load(std::memory_order_relaxed)) { @@ -143,12 +156,18 @@ void Parser::scroll(uint32_t line) { if (c_line < scroll_max && ((scroll_max > 100 && c_line > scroll_max - 100) || c_line < 100)) lock.lock(); + if (line_tree.count() < c_line) { + if (lock.owns_lock()) + lock.unlock(); + continue; + } lock_data.lock(); + LineData *line_data = line_tree.at(c_line); std::shared_ptr new_state = - parse_func(&line_data[c_line].tokens, prev_state, text, r_len); + parse_func(&line_data->tokens, prev_state, text, r_len); + line_data->in_state = nullptr; + line_data->out_state = new_state; lock_data.unlock(); - line_data[c_line].in_state = nullptr; - line_data[c_line].out_state = new_state; if (lock.owns_lock()) lock.unlock(); if (!running.load(std::memory_order_relaxed)) { diff --git a/src/syntax/ruby.cc b/src/syntax/ruby.cc index 3ea3c19..a67672e 100644 --- a/src/syntax/ruby.cc +++ b/src/syntax/ruby.cc @@ -1,24 +1,28 @@ +#include "syntax/decl.h" #include "syntax/langs.h" const static std::vector base_keywords = { - // style 4 - "if", "else", "elsif", "case", "rescue", "ensure", "do", "for", - "while", "until", "def", "class", "module", "begin", "end", "unless", + "class", "module", "begin", "end", "else", "rescue", "ensure", "do", "when", +}; + +const static std::vector expecting_keywords = { + "if", "elsif", "case", "for", "while", "until", "unless", }; const static std::vector operator_keywords = { - // style 5 - "alias", "and", "BEGIN", "break", "catch", "defined?", "in", "next", - "not", "or", "redo", "rescue", "retry", "return", "super", "yield", - "self", "nil", "true", "false", "undef", "when", + "alias", "BEGIN", "break", "catch", "defined?", "in", "next", + "redo", "rescue", "retry", "super", "self", "nil", "undef", +}; + +const static std::vector expecting_operators = { + "and", "return", "not", "yield", "or", }; const static std::vector operators = { - "+", "-", "*", "/", "%", "**", "==", "!=", "===", - "<=>", ">", ">=", "<", "<=", "&&", "||", "!", "&", - "|", "^", "~", "<<", ">>", "=", "+=", "-=", "*=", - "/=", "%=", "**=", "&=", "|=", "^=", "<<=", ">>=", "..", - "...", "===", "=", "=>", "&.", "[]", "[]=", "`", "->", + "+", "-", "*", "/", "%", "**", "==", "!=", "===", "<=>", ">", + ">=", "<", "<=", "&&", "||", "!", "&", "|", "^", "~", "<<", + ">>", "=", "+=", "-=", "*=", "/=", "%=", "**=", "&=", "|=", "^=", + "<<=", ">>=", "..", "...", "===", "=", "=>", "&", "`", "->", "=~", }; struct HeredocInfo { @@ -34,19 +38,16 @@ struct HeredocInfo { }; struct RubyFullState { - // TODO: use this to highlight each level seperaletly like vscode colored - // braces extention thingy does int brace_level = 0; - int paren_level = 0; - int bracket_level = 0; enum : uint8_t { NONE, STRING, REGEXP, COMMENT, HEREDOC, END }; uint8_t in_state = RubyFullState::NONE; + bool expecting_expr = false; + struct Lit { char delim_start = '\0'; char delim_end = '\0'; - // For stuff like %Q{ { these braces are valid } this part is still str } int brace_level = 1; bool allow_interp = false; @@ -60,12 +61,13 @@ struct RubyFullState { bool operator==(const RubyFullState &other) const { return in_state == other.in_state && lit == other.lit && brace_level == other.brace_level && - paren_level == other.paren_level && - bracket_level == other.bracket_level; + expecting_expr == other.expecting_expr; } }; struct RubyState { + using full_state_type = RubyFullState; + int interp_level = 0; std::stack> interp_stack; std::shared_ptr full_state; @@ -80,32 +82,16 @@ struct RubyState { } }; -inline std::shared_ptr -ensure_state(std::shared_ptr state) { - if (!state) - state = std::make_shared(); - if (state.unique()) - return state; - return std::make_shared(*state); -} - -inline std::shared_ptr -ensure_full_state(std::shared_ptr state) { - state = ensure_state(state); - if (!state->full_state) - state->full_state = std::make_shared(); - else if (!state->full_state.unique()) - state->full_state = std::make_shared(*state->full_state); - return state; -} - -bool identifier_start_char(char c) { +inline static bool identifier_start_char(char c) { return !isascii(c) || isalpha(c) || c == '_'; } -bool identifier_char(char c) { return !isascii(c) || isalnum(c) || c == '_'; } +inline static bool identifier_char(char c) { + return !isascii(c) || isalnum(c) || c == '_'; +} -uint32_t get_next_word(const char *text, uint32_t i, uint32_t len) { +inline static uint32_t get_next_word(const char *text, uint32_t i, + uint32_t len) { if (i >= len || !identifier_start_char(text[i])) return 0; uint32_t width = 1; @@ -116,12 +102,12 @@ uint32_t get_next_word(const char *text, uint32_t i, uint32_t len) { return width; } -bool compare(const char *a, const char *b, size_t n) { - size_t i = 0; - for (; i < n; ++i) - if (a[i] != b[i]) - return false; - return true; +bool ruby_state_match(std::shared_ptr state_1, + std::shared_ptr state_2) { + if (!state_1 || !state_2) + return false; + return *std::static_pointer_cast(state_1) == + *std::static_pointer_cast(state_2); } std::shared_ptr ruby_parse(std::vector *tokens, @@ -129,21 +115,20 @@ std::shared_ptr ruby_parse(std::vector *tokens, const char *text, uint32_t len) { static bool keywords_trie_init = false; static Trie base_keywords_trie; + static Trie expecting_keywords_trie; static Trie operator_keywords_trie; + static Trie expecting_operators_trie; static Trie operator_trie; if (!keywords_trie_init) { base_keywords_trie.build(base_keywords); + expecting_keywords_trie.build(expecting_keywords); operator_keywords_trie.build(operator_keywords); + expecting_operators_trie.build(expecting_operators); operator_trie.build(operators); keywords_trie_init = true; } tokens->clear(); - if (!in_state) - in_state = std::make_shared(); - std::shared_ptr state = - std::static_pointer_cast(in_state); - if (!state->full_state) - state->full_state = std::make_shared(); + auto state = ensure_state(std::static_pointer_cast(in_state)); uint32_t i = 0; while (len > 0 && (text[len - 1] == '\n' || text[len - 1] == '\r' || text[len - 1] == '\t' || text[len - 1] == ' ')) @@ -152,15 +137,12 @@ std::shared_ptr ruby_parse(std::vector *tokens, return state; bool heredoc_first = false; while (i < len) { - if (state->full_state->in_state == RubyFullState::END) { - tokens->clear(); + if (state->full_state->in_state == RubyFullState::END) return state; - } if (state->full_state->in_state == RubyFullState::COMMENT) { - tokens->push_back({i, len, 1}); + tokens->push_back({i, len, TokenKind::Comment}); if (i == 0 && len == 4 && text[i] == '=' && text[i + 1] == 'e' && text[i + 2] == 'n' && text[i + 3] == 'd') { - state = ensure_full_state(state); state->full_state->in_state = RubyFullState::NONE; } return state; @@ -175,32 +157,32 @@ std::shared_ptr ruby_parse(std::vector *tokens, if (len - start == state->heredocs.front().delim.length() && compare(text + start, state->heredocs.front().delim.c_str(), state->heredocs.front().delim.length())) { - state = ensure_full_state(state); state->heredocs.pop_front(); if (state->heredocs.empty()) state->full_state->in_state = RubyFullState::NONE; - tokens->push_back({i, len, 10}); + tokens->push_back({i, len, TokenKind::Annotation}); return state; } } uint32_t start = i; if (!state->heredocs.front().allow_interpolation) { - tokens->push_back({i, len, 2}); + tokens->push_back({i, len, TokenKind::String}); return state; } else { while (i < len) { if (text[i] == '\\') { - // TODO: highlight the escape character + tokens->push_back({start, i, TokenKind::String}); + start = i; i++; if (i < len) i++; + tokens->push_back({start, i, TokenKind::Escape}); continue; } if (text[i] == '#' && i + 1 < len && text[i + 1] == '{') { - tokens->push_back({start, i, 2}); - tokens->push_back({i, i + 2, 10}); + tokens->push_back({start, i, TokenKind::String}); + tokens->push_back({i, i + 2, TokenKind::Interpolation}); i += 2; - state = ensure_state(state); state->interp_stack.push(state->full_state); state->full_state = std::make_shared(); state->interp_level = 1; @@ -209,7 +191,7 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; } if (i == len) - tokens->push_back({start, len, 2}); + tokens->push_back({start, len, TokenKind::String}); continue; } } @@ -217,19 +199,20 @@ std::shared_ptr ruby_parse(std::vector *tokens, uint32_t start = i; while (i < len) { if (text[i] == '\\') { - // TODO: highlight the escape character - need to make priority work - // and this have higher + tokens->push_back({start, i, TokenKind::String}); + start = i; i++; if (i < len) i++; + tokens->push_back({start, i, TokenKind::Escape}); + continue; continue; } if (state->full_state->lit.allow_interp && text[i] == '#' && i + 1 < len && text[i + 1] == '{') { - tokens->push_back({start, i, 2}); - tokens->push_back({i, i + 2, 10}); + tokens->push_back({start, i, TokenKind::String}); + tokens->push_back({i, i + 2, TokenKind::Interpolation}); i += 2; - state = ensure_state(state); state->interp_stack.push(state->full_state); state->full_state = std::make_shared(); state->interp_level = 1; @@ -238,23 +221,23 @@ std::shared_ptr ruby_parse(std::vector *tokens, if (text[i] == state->full_state->lit.delim_start && state->full_state->lit.delim_start != state->full_state->lit.delim_end) { - state = ensure_full_state(state); state->full_state->lit.brace_level++; } if (text[i] == state->full_state->lit.delim_end) { - state = ensure_full_state(state); if (state->full_state->lit.delim_start == state->full_state->lit.delim_end) { i++; - tokens->push_back({start, i, 2}); + tokens->push_back({start, i, TokenKind::String}); state->full_state->in_state = RubyFullState::NONE; + state->full_state->expecting_expr = false; break; } else { state->full_state->lit.brace_level--; if (state->full_state->lit.brace_level == 0) { i++; - tokens->push_back({start, i, 2}); + tokens->push_back({start, i, TokenKind::String}); state->full_state->in_state = RubyFullState::NONE; + state->full_state->expecting_expr = false; break; } } @@ -262,15 +245,67 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; } if (i == len) - tokens->push_back({start, len, 2}); + tokens->push_back({start, len, TokenKind::String}); + continue; + } + if (state->full_state->in_state == RubyFullState::REGEXP) { + uint32_t start = i; + while (i < len) { + if (text[i] == '\\') { + tokens->push_back({start, i, TokenKind::Regexp}); + ; + start = i; + i++; + if (i < len) + i++; + tokens->push_back({start, i, TokenKind::Escape}); + continue; + } + if (text[i] == '#' && i + 1 < len && text[i + 1] == '{') { + tokens->push_back({start, i, TokenKind::Regexp}); + tokens->push_back({i, i + 2, TokenKind::Interpolation}); + i += 2; + state->interp_stack.push(state->full_state); + state->full_state = std::make_shared(); + state->interp_level = 1; + break; + } + if (text[i] == state->full_state->lit.delim_start && + state->full_state->lit.delim_start != + state->full_state->lit.delim_end) { + state->full_state->lit.brace_level++; + } + if (text[i] == state->full_state->lit.delim_end) { + if (state->full_state->lit.delim_start == + state->full_state->lit.delim_end) { + i += 1; + tokens->push_back({start, i, TokenKind::Regexp}); + state->full_state->in_state = RubyFullState::NONE; + state->full_state->expecting_expr = false; + break; + } else { + state->full_state->lit.brace_level--; + if (state->full_state->lit.brace_level == 0) { + i += 1; + tokens->push_back({start, i, TokenKind::Regexp}); + state->full_state->in_state = RubyFullState::NONE; + state->full_state->expecting_expr = false; + break; + } + } + } + i++; + } + if (i == len) + tokens->push_back({start, len, TokenKind::Regexp}); continue; } if (i == 0 && len == 6) { if (text[i] == '=' && text[i + 1] == 'b' && text[i + 2] == 'e' && text[i + 3] == 'g' && text[i + 4] == 'i' && text[i + 5] == 'n') { - state = ensure_full_state(state); state->full_state->in_state = RubyFullState::COMMENT; - tokens->push_back({0, len, 1}); + state->full_state->expecting_expr = false; + tokens->push_back({0, len, TokenKind::Comment}); return state; } } @@ -278,9 +313,9 @@ std::shared_ptr ruby_parse(std::vector *tokens, if (text[i] == '_' && text[i + 1] == '_' && text[i + 2] == 'E' && text[i + 3] == 'N' && text[i + 4] == 'D' && text[i + 5] == '_' && text[i + 6] == '_') { - state = ensure_full_state(state); tokens->clear(); state->full_state->in_state = RubyFullState::END; + state->full_state->expecting_expr = false; return state; } } @@ -291,7 +326,7 @@ std::shared_ptr ruby_parse(std::vector *tokens, indented = true; if (text[j] == '~' || text[j] == '-') j++; - tokens->push_back({i, j, 10}); + tokens->push_back({i, j, TokenKind::Operator}); if (j >= len) continue; std::string delim; @@ -304,12 +339,15 @@ std::shared_ptr ruby_parse(std::vector *tokens, while (j < len && text[j] != q) delim += text[j++]; } else { - while (j < len && identifier_char(text[j])) + if (j < len && identifier_start_char(text[j])) { delim += text[j++]; + while (j < len && identifier_char(text[j])) + delim += text[j++]; + } } + state->full_state->expecting_expr = false; if (!delim.empty()) { - tokens->push_back({s, j, 10}); - state = ensure_full_state(state); + tokens->push_back({s, j, TokenKind::Annotation}); state->heredocs.push_back({delim, interpolation, indented}); state->full_state->in_state = RubyFullState::HEREDOC; heredoc_first = true; @@ -317,18 +355,47 @@ std::shared_ptr ruby_parse(std::vector *tokens, i = j; continue; } - if (text[i] == '#') { - tokens->push_back({i, len, 1}); + if (text[i] == '/' && state->full_state->expecting_expr) { + tokens->push_back({i, i + 1, TokenKind::Regexp}); + state->full_state->in_state = RubyFullState::REGEXP; + state->full_state->expecting_expr = false; + state->full_state->lit.delim_start = '/'; + state->full_state->lit.delim_end = '/'; + state->full_state->lit.allow_interp = true; + i++; + continue; + } else if (text[i] == '#') { + tokens->push_back({i, len, TokenKind::Comment}); + state->full_state->expecting_expr = false; return state; + } else if (text[i] == '.') { + uint32_t start = i; + i++; + if (i < len && text[i] == '.') { + i++; + if (i < len && text[i] == '.') { + i++; + } + } + tokens->push_back({start, i, TokenKind::Operator}); + state->full_state->expecting_expr = false; + continue; } else if (text[i] == ':') { + state->full_state->expecting_expr = false; uint32_t start = i; i++; if (i >= len) { - tokens->push_back({start, i, 3}); + tokens->push_back({start, i, TokenKind::Operator}); + state->full_state->expecting_expr = true; + continue; + } + if (text[i] == ':') { + i++; continue; } if (text[i] == '\'' || text[i] == '"') { - tokens->push_back({start, i, 6}); + tokens->push_back({start, i, TokenKind::Operator}); + state->full_state->expecting_expr = true; continue; } if (text[i] == '$' || text[i] == '@') { @@ -338,24 +405,25 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; while (i < len && identifier_char(text[i])) i++; - tokens->push_back({start, i, 6}); + tokens->push_back({start, i, TokenKind::Label}); continue; } uint32_t op_len = operator_trie.match(text, i, len, identifier_char); if (op_len > 0) { - tokens->push_back({start, i + op_len, 6}); + tokens->push_back({start, i + op_len, TokenKind::Label}); i += op_len; continue; } if (identifier_start_char(text[i])) { uint32_t word_len = get_next_word(text, i, len); - tokens->push_back({start, i + word_len, 6}); + tokens->push_back({start, i + word_len, TokenKind::Label}); i += word_len; continue; } - tokens->push_back({start, i, 3}); + tokens->push_back({start, i, TokenKind::Operator}); continue; } else if (text[i] == '@') { + state->full_state->expecting_expr = false; uint32_t start = i; i++; if (i >= len) @@ -368,9 +436,10 @@ std::shared_ptr ruby_parse(std::vector *tokens, continue; while (i < len && identifier_char(text[i])) i++; - tokens->push_back({start, i, 7}); + tokens->push_back({start, i, TokenKind::VariableInstance}); continue; } else if (text[i] == '$') { + state->full_state->expecting_expr = false; uint32_t start = i; i++; if (i >= len) @@ -390,9 +459,10 @@ std::shared_ptr ruby_parse(std::vector *tokens, } else { continue; } - tokens->push_back({start, i, 8}); + tokens->push_back({start, i, TokenKind::VariableGlobal}); continue; } else if (text[i] == '?') { + state->full_state->expecting_expr = false; uint32_t start = i; i++; if (i < len && text[i] == '\\') { @@ -405,7 +475,7 @@ std::shared_ptr ruby_parse(std::vector *tokens, continue; if (i < len && isxdigit(text[i])) i++; - tokens->push_back({start, i, 7}); + tokens->push_back({start, i, TokenKind::Char}); continue; } else if (i < len && text[i] == 'u') { i++; @@ -425,42 +495,81 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; else continue; - tokens->push_back({start, i, 7}); + tokens->push_back({start, i, TokenKind::Char}); continue; } else if (i < len) { i++; - tokens->push_back({start, i, 7}); + tokens->push_back({start, i, TokenKind::Char}); continue; } } else if (i < len && text[i] != ' ') { i++; - tokens->push_back({start, i, 7}); + tokens->push_back({start, i, TokenKind::Char}); continue; } else { - tokens->push_back({start, i, 3}); + state->full_state->expecting_expr = true; + tokens->push_back({start, i, TokenKind::Operator}); continue; } } else if (text[i] == '{') { - tokens->push_back({i, i + 1, 3}); - state = ensure_state(state); + state->full_state->expecting_expr = true; + uint8_t brace_color = + (uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5); + tokens->push_back({i, i + 1, (TokenKind)brace_color}); state->interp_level++; + state->full_state->brace_level++; i++; continue; } else if (text[i] == '}') { - state = ensure_full_state(state); + state->full_state->expecting_expr = false; state->interp_level--; if (state->interp_level == 0 && !state->interp_stack.empty()) { state->full_state = state->interp_stack.top(); state->interp_stack.pop(); - tokens->push_back({i, i + 1, 10}); + tokens->push_back({i, i + 1, TokenKind::Interpolation}); } else { - tokens->push_back({i, i + 1, 3}); + state->full_state->brace_level--; + uint8_t brace_color = + (uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5); + tokens->push_back({i, i + 1, (TokenKind)brace_color}); } i++; continue; + } else if (text[i] == '(') { + state->full_state->expecting_expr = true; + uint8_t brace_color = + (uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5); + tokens->push_back({i, i + 1, (TokenKind)brace_color}); + state->full_state->brace_level++; + i++; + continue; + } else if (text[i] == ')') { + state->full_state->expecting_expr = false; + state->full_state->brace_level--; + uint8_t brace_color = + (uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5); + tokens->push_back({i, i + 1, (TokenKind)brace_color}); + i++; + continue; + } else if (text[i] == '[') { + state->full_state->expecting_expr = true; + uint8_t brace_color = + (uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5); + tokens->push_back({i, i + 1, (TokenKind)brace_color}); + state->full_state->brace_level++; + i++; + continue; + } else if (text[i] == ']') { + state->full_state->expecting_expr = false; + state->full_state->brace_level--; + uint8_t brace_color = + (uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5); + tokens->push_back({i, i + 1, (TokenKind)brace_color}); + i++; + continue; } else if (text[i] == '\'') { - tokens->push_back({i, i + 1, 2}); - state = ensure_full_state(state); + state->full_state->expecting_expr = false; + tokens->push_back({i, i + 1, TokenKind::String}); state->full_state->in_state = RubyFullState::STRING; state->full_state->lit.delim_start = '\''; state->full_state->lit.delim_end = '\''; @@ -468,8 +577,8 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; continue; } else if (text[i] == '"') { - tokens->push_back({i, i + 1, 2}); - state = ensure_full_state(state); + state->full_state->expecting_expr = false; + tokens->push_back({i, i + 1, TokenKind::String}); state->full_state->in_state = RubyFullState::STRING; state->full_state->lit.delim_start = '"'; state->full_state->lit.delim_end = '"'; @@ -477,8 +586,8 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; continue; } else if (text[i] == '`') { - tokens->push_back({i, i + 1, 2}); - state = ensure_full_state(state); + state->full_state->expecting_expr = false; + tokens->push_back({i, i + 1, TokenKind::String}); state->full_state->in_state = RubyFullState::STRING; state->full_state->lit.delim_start = '`'; state->full_state->lit.delim_end = '`'; @@ -486,6 +595,7 @@ std::shared_ptr ruby_parse(std::vector *tokens, i++; continue; } else if (text[i] == '%') { + state->full_state->expecting_expr = false; if (i + 1 >= len) { i++; continue; @@ -495,15 +605,24 @@ std::shared_ptr ruby_parse(std::vector *tokens, char delim_end = '\0'; bool allow_interp = true; int prefix_len = 1; + bool is_regexp = false; switch (type) { + case 'r': + is_regexp = true; + allow_interp = true; + prefix_len = 2; + break; case 'Q': case 'x': + case 'I': + case 'W': allow_interp = true; prefix_len = 2; break; case 'w': case 'q': case 'i': + case 's': allow_interp = false; prefix_len = 2; break; @@ -539,9 +658,10 @@ std::shared_ptr ruby_parse(std::vector *tokens, delim_end = delim_start; break; } - tokens->push_back({i, i + prefix_len + 1, 2}); - state = ensure_full_state(state); - state->full_state->in_state = RubyFullState::STRING; + tokens->push_back({i, i + prefix_len + 1, + (is_regexp ? TokenKind::Regexp : TokenKind::String)}); + state->full_state->in_state = + is_regexp ? RubyFullState::REGEXP : RubyFullState::STRING; state->full_state->lit.delim_start = delim_start; state->full_state->lit.delim_end = delim_end; state->full_state->lit.allow_interp = allow_interp; @@ -549,6 +669,7 @@ std::shared_ptr ruby_parse(std::vector *tokens, i += prefix_len + 1; continue; } else if (isdigit(text[i])) { + state->full_state->expecting_expr = false; uint32_t start = i; if (text[i] == '0') { i++; @@ -646,85 +767,137 @@ std::shared_ptr ruby_parse(std::vector *tokens, i--; } } - tokens->push_back({start, i, 9}); + tokens->push_back({start, i, TokenKind::Number}); continue; } else if (identifier_start_char(text[i])) { + state->full_state->expecting_expr = false; uint32_t length; - if ((length = base_keywords_trie.match(text, i, len, identifier_char)) > - 0) { - tokens->push_back({i, i + length, 4}); + if ((length = base_keywords_trie.match(text, i, len, identifier_char))) { + tokens->push_back({i, i + length, TokenKind::Keyword}); + i += length; + continue; + } else if ((length = expecting_keywords_trie.match(text, i, len, + identifier_char))) { + state->full_state->expecting_expr = true; + tokens->push_back({i, i + length, TokenKind::Keyword}); i += length; continue; } else if ((length = operator_keywords_trie.match(text, i, len, - identifier_char)) > 0) { - tokens->push_back({i, i + length, 5}); + identifier_char))) { + tokens->push_back({i, i + length, TokenKind::KeywordOperator}); + i += length; + continue; + } else if ((length = expecting_operators_trie.match( + text, i, len, identifier_char)) > 0) { + state->full_state->expecting_expr = true; + tokens->push_back({i, i + length, TokenKind::KeywordOperator}); i += length; continue; } else if (text[i] >= 'A' && text[i] <= 'Z') { uint32_t start = i; i += get_next_word(text, i, len); - tokens->push_back({start, i, 10}); + tokens->push_back({start, i, TokenKind::Constant}); continue; } else { uint32_t start = i; + if (i + 4 < len && text[i] == 't' && text[i + 1] == 'r' && + text[i + 2] == 'u' && text[i + 3] == 'e') { + i += 4; + tokens->push_back({start, i, TokenKind::True}); + continue; + } + if (i + 5 < len && text[i] == 'f' && text[i + 1] == 'a' && + text[i + 2] == 'l' && text[i + 3] == 's' && text[i + 4] == 'e') { + i += 5; + tokens->push_back({start, i, TokenKind::False}); + continue; + } + if (i + 3 < len && text[i] == 'd' && text[i + 1] == 'e' && + text[i + 2] == 'f') { + i += 3; + tokens->push_back({start, i, TokenKind::Keyword}); + while (i < len && (text[i] == ' ' || text[i] == '\t')) + i++; + while (i < len) { + if (identifier_start_char(text[i])) { + uint32_t width = get_next_word(text, i, len); + if (text[i] >= 'A' && text[i] <= 'Z') + tokens->push_back({i, i + width, TokenKind::Constant}); + else if (width == 4 && (text[i] >= 's' && text[i + 1] == 'e' && + text[i + 2] == 'l' && text[i + 3] == 'f')) + tokens->push_back({i, i + width, TokenKind::Keyword}); + i += width; + if (i < len && text[i] == '.') { + i++; + continue; + } + tokens->push_back({i - width, i, TokenKind::Function}); + break; + } else { + break; + } + } + continue; + } while (i < len && identifier_char(text[i])) i++; if (i < len && text[i] == ':') { i++; - tokens->push_back({start, i, 6}); + tokens->push_back({start, i, TokenKind::Label}); continue; } else if (i < len && (text[i] == '!' || text[i] == '?')) { i++; + tokens->push_back({start, i, TokenKind::Function}); + } else { + uint32_t tmp = i; + if (tmp < len && (text[tmp] == '(' || text[tmp] == '{')) { + tokens->push_back({start, i, TokenKind::Function}); + continue; + } else if (tmp < len && (text[tmp] == ' ' || text[tmp] == '\t')) { + tmp++; + } else { + continue; + } + while (tmp < len && (text[tmp] == ' ' || text[tmp] == '\t')) + tmp++; + if (tmp >= len) + continue; + if (!isascii(text[tmp])) { + tokens->push_back({start, i, TokenKind::Function}); + continue; + } else if (text[tmp] == '-' || text[tmp] == '&' || text[tmp] == '%' || + text[tmp] == ':') { + if (tmp + 1 >= len || + (text[tmp + 1] == ' ' || text[tmp + 1] == '>')) + continue; + } else if (text[tmp] == ']' || text[tmp] == '}' || text[tmp] == ')' || + text[tmp] == ',' || text[tmp] == ';' || text[tmp] == '.' || + text[tmp] == '+' || text[tmp] == '*' || text[tmp] == '/' || + text[tmp] == '=' || text[tmp] == '?' || text[tmp] == '|' || + text[tmp] == '^' || text[tmp] == '<' || text[tmp] == '>') { + continue; + } + tokens->push_back({start, i, TokenKind::Function}); } continue; } } else { uint32_t op_len; - if ((op_len = operator_trie.match(text, i, len, - [](char) { return false; })) > 0) { - tokens->push_back({i, i + op_len, 3}); + if ((op_len = + operator_trie.match(text, i, len, [](char) { return false; }))) { + tokens->push_back({i, i + op_len, TokenKind::Operator}); i += op_len; + state->full_state->expecting_expr = true; + continue; + } else { + i += utf8_codepoint_width(text[i]); continue; } } - i += utf8_codepoint_width(text[i]); } return state; } -bool ruby_state_match(std::shared_ptr state_1, - std::shared_ptr state_2) { - if (!state_1 || !state_2) - return false; - return *std::static_pointer_cast(state_1) == - *std::static_pointer_cast(state_2); -} - -// function calls matched with alphanumeric names followed immediately by ! -// or ? or `(` immediately or siwth space or are followed by a non-keyword -// or non-operator (some operators like - for negating and ! for not or { -// for block might be allowed?) -// a word following :: or . is matched as a property -// and any random word is matched as a variable name -// or as a class/module name if it starts with a capital letter -// -// regex are matched as text within / and / as long as -// the first / is not -// following a literal (int/float/string) or variable or brace close -// and is following a keyword or operator liek return /regex/ or x = -// /regex/ . so maybe add feild expecting_expr to state that is true right -// after keyword or some operators like = , =~ , `,` etc? -// -// (left to implement) - -// -// words - breaks up into these submatches -// - Constants that start with a capital letter -// - a word following :: or . is matched as a property -// - function call if ending with ! or ? or ( or are followed by a -// non-keyword or non-operator . ill figure it out -// -// regex (and distinguish between / for division and / for regex) and -// %r{} ones too -// -// Matching brace colors by brace depth -// +// TODO: Add trie's for builtins and highlight them separately liek (Array / +// self etc) +// And in regex better highlighting of regex structures diff --git a/src/utils/text.cc b/src/utils/text.cc index e0d9f31..093c89f 100644 --- a/src/utils/text.cc +++ b/src/utils/text.cc @@ -1,5 +1,13 @@ #include "utils/utils.h" +bool compare(const char *a, const char *b, size_t n) { + size_t i = 0; + for (; i < n; ++i) + if (a[i] != b[i]) + return false; + return true; +} + std::string percent_decode(const std::string &s) { std::string out; out.reserve(s.size()); diff --git a/themes/default.json b/themes/default.json new file mode 100644 index 0000000..36e6b83 --- /dev/null +++ b/themes/default.json @@ -0,0 +1,82 @@ +{ + "Default": { + "fg": "#EEEEEE" + }, + "Comment": { + "fg": "#AAAAAA", + "italic": true + }, + "String": { + "fg": "#AAD94C" + }, + "Escape": { + "fg": "#7dcfff" + }, + "Interpolation": { + "fg": "#7dcfff" + }, + "Regexp": { + "fg": "#D2A6FF" + }, + "Number": { + "fg": "#E6C08A" + }, + "True": { + "fg": "#0FFF0F" + }, + "False": { + "fg": "#FF0F0F" + }, + "Char": { + "fg": "#FFAF70" + }, + "Keyword": { + "fg": "#FF8F40" + }, + "KeywordOperator": { + "fg": "#F07178" + }, + "Operator": { + "fg": "#FFFFFF", + "italic": true + }, + "Function": { + "fg": "#FFAF70" + }, + "Type": { + "fg": "#F07178" + }, + "Constant": { + "fg": "#7dcfff" + }, + "VariableInstance": { + "fg": "#95E6CB" + }, + "VariableGlobal": { + "fg": "#F07178" + }, + "Annotation": { + "fg": "#7dcfff" + }, + "Directive": { + "fg": "#FF8F40" + }, + "Label": { + "fg": "#D2A6FF" + }, + "Brace1": { + "fg": "#D2A6FF" + }, + "Brace2": { + "fg": "#FFAFAF" + }, + "Brace3": { + "fg": "#FFFF00" + }, + "Brace4": { + "fg": "#0FFF0F" + }, + "Brace5": { + "fg": "#FF0F0F" + } +}