Allow dynamic theming and improve ruby parser

This commit is contained in:
2026-01-18 13:00:41 +00:00
parent 1fda5bf246
commit d0e811904c
18 changed files with 1029 additions and 496 deletions

View File

@@ -6,7 +6,7 @@ Copyright 2025 Syed Daanish
Crib is a TUI based text editor built primaririly for personal use.<br> Crib is a TUI based text editor built primaririly for personal use.<br>
Crib has a vim-style editor modes system but navigation and shortcuts are very different.<br> Crib has a vim-style editor modes system but navigation and shortcuts are very different.<br>
It supports tree-sitter based text highlighting.<br> It supports superfast incremental syntax highlighting.<br>
And LSP for auto-completion, diagnostics, hover docs etc.<br> And LSP for auto-completion, diagnostics, hover docs etc.<br>
It aims to be complete general purpose IDE.<br> It aims to be complete general purpose IDE.<br>
(It is still very much a work in progress so a lot of things may seem incomplete)<br> (It is still very much a work in progress so a lot of things may seem incomplete)<br>
@@ -16,7 +16,7 @@ For now it is just a single file editor. I plan to add a multi-file support with
### Get started ### Get started
Make sure the repo is cloned with submodules to get most of the dependencies. Make sure the repo is cloned with submodules to get `libgrapheme`.
```bash ```bash
git clone --recurse-submodules https://git.syedm.dev/SyedM/crib.git git clone --recurse-submodules https://git.syedm.dev/SyedM/crib.git
@@ -26,7 +26,7 @@ git clone --recurse-submodules https://git.syedm.dev/SyedM/crib.git
#### System-wide libraries #### System-wide libraries
Make sure you have the following dependencies installed: Make sure you have the following dependencies installed (apart from the standard C++ libraries):
* **[nlohmann/json](https://github.com/nlohmann/json)** * **[nlohmann/json](https://github.com/nlohmann/json)**
Install it via your package manager. Once installed, the header should be available as: Install it via your package manager. Once installed, the header should be available as:
@@ -34,31 +34,26 @@ Make sure you have the following dependencies installed:
#include <nlohmann/json.hpp> #include <nlohmann/json.hpp>
``` ```
* **libmagic**
Install it so that you can include it in your code:
```cpp
#include <magic.h>
```
* **[PCRE2](https://github.com/PCRE2Project/pcre2)** * **[PCRE2](https://github.com/PCRE2Project/pcre2)**
Install the library to use its headers: Install the library to use its headers:
```cpp ```cpp
#include <pcre2.h> #include <pcre2.h>
``` ```
It also uses `xclip` at runtime for copying/pasting *(TODO: make it portable)*. * **libmagic**
Install it so that you can include it in your code (most *nix systems have it installed):
```cpp
#include <magic.h>
```
It also uses `xclip` at runtime for copying/pasting *(TODO: make it os portable)*.
And any modern terminal should work fine - preferably `kitty` or `wezterm`.<br> And any modern terminal should work fine - preferably `kitty` or `wezterm`.<br>
#### `./libs` folder #### `./libs` folder
Some other dependancies like `libgrapheme` and `tree-sitter*` and `unicode_width` are added as submodules or copied.<br> Some other dependancies like `libgrapheme` and `unicode_width` are added as submodules or copied.<br>
`unicode_width` is compiled by the makefile so nothing to do there.<br> `unicode_width` is compiled by the makefile so nothing to do there.<br>
`libgrapheme` needs to be compiled using `make` in it's folder.<br> `libgrapheme` needs to be compiled using `make` in it's folder.<br>
`tree-sitter` needs to be compiled using `make` in it's folder.<br>
For other tree-sitter grammars, run `make` in their folders except some for which `npm install` needs to be used (see their README.md)<br>
For any problems with `npm install` make sure to have older versions of node installed.<br>
For some even manual clang or gcc compilation may be required.<br>
*TODO: Make a detailed list of how to do compile each*<br>
#### LSPs #### LSPs
@@ -93,8 +88,7 @@ The following lsp's are supported and can be installed anywhere in your `$PATH`<
#### Compiler #### Compiler
`g++` and `clang++` should both work fine but `c++20+` is required. `g++` and `clang++` should both work fine but `c++20+` is required.
The makefile has been set to use g++ if made with `make test` and clang++ if made with `make release`<br> The makefile uses `clang++` by default.<br>
This can be changed but I have found clang++ builds to be slightly faster - also test builds do not have the flags needed to be used system wide or any optimizations.<br>
Can remove `ccache` if you want from the makefile.<br> Can remove `ccache` if you want from the makefile.<br>
#### Compliling #### Compliling
@@ -105,8 +99,8 @@ make release
### Running ### Running
Preferably add `bin` folder to PATH or move `bin/crib` to somewhere in PATH.<br> Preferably add the `bin` folder to PATH or move `bin/crib` to somewhere in PATH.<br>
But make sure that `scripts/` and `grammar/` are at `../` relative to the binary or it will crash.<br> But make sure that `scripts/` are at `../` relative to the binary or it will crash.<br>
`scripts/init.sh` and `scripts/exit.sh` can be used to add hooks to the editor on startup and exit `scripts/init.sh` and `scripts/exit.sh` can be used to add hooks to the editor on startup and exit
(Make sure to remove my `kitty` hooks from them if you want).<br> (Make sure to remove my `kitty` hooks from them if you want).<br>
For some LSP's to work properly `crib` needs to be run from the root folder of the project. *To be fixed*<br> For some LSP's to work properly `crib` needs to be run from the root folder of the project. *To be fixed*<br>
@@ -256,37 +250,38 @@ Activated by `:` or `;`.
- hooks jumping (bookmarking) - hooks jumping (bookmarking)
- color hex code highlighting - color hex code highlighting
- current line highlighting - current line highlighting
- current word under cursor highlighting <!-- - TODO: current word under cursor highlighting -->
#### Tree-sitter syntax highlighting and filetype detection (using extention or libmagic) for: #### syntax highlighting and filetype detection (using extention or libmagic) for:
- bash
- c/cpp (and headers)
- css
- fish
- go/gomod
- haskell
- html/erb
- javascript
- typescript/tsx
- json/jsonc
- ruby - ruby
- lua <!-- TODO: -->
- python <!-- - bash -->
- rust <!-- - c/cpp (and headers) -->
- php <!-- - css -->
- markdown <!-- - fish -->
- nginx <!-- - go/gomod -->
- toml <!-- - haskell -->
- yaml <!-- - html/erb -->
- sql <!-- - javascript -->
- make <!-- - typescript/tsx -->
- gdscript <!-- - json/jsonc -->
- man pages <!-- - lua -->
- diff/patch <!-- - python -->
- gitattributes/gitignore <!-- - rust -->
- tree-sitter queries <!-- - php -->
- regex <!-- - markdown -->
- ini <!-- - nginx -->
<!-- - toml -->
<!-- - yaml -->
<!-- - sql -->
<!-- - make -->
<!-- - gdscript -->
<!-- - man pages -->
<!-- - diff/patch -->
<!-- - gitattributes/gitignore -->
<!-- - tree-sitter queries -->
<!-- - regex -->
<!-- - ini -->
#### LSP-powered features: #### LSP-powered features:
- diagnostics - diagnostics

16
TODO.md
View File

@@ -8,18 +8,17 @@ Copyright 2025 Syed Daanish
* [ ] **LSP Bug:** Check why `fish-lsp` is behaving so off with completions filtering. * [ ] **LSP Bug:** Check why `fish-lsp` is behaving so off with completions filtering.
* [ ] **File IO:** Normalize/validate unicode on file open (enforce UTF-8, handle other types gracefully). * [ ] **File IO:** Normalize/validate unicode on file open (enforce UTF-8, handle other types gracefully).
* [ ] **Critical Crash:** Fix bug where closing immediately while LSP is still loading hangs and then segfaults (especially on slow ones like fish-lsp). * [ ] **Critical Crash:** Fix bug where closing immediately while LSP is still loading hangs and then segfaults (especially on slow ones like fish-lsp where quick edits and exit can hang).
* [ ] **Navigation Bug:** Fix bug where `Alt+Up` at EOF adds an extra line. * [ ] **Line move:** fix the move line functions to work without the calculations from folds as folds are removed.
* [ ] **Modularize handle_events functions:** The function is over 700 lines with a lot of repeating blocks. Split into smaller functions. * [ ] **Modularize handle_events and renderer functions:** The function is over 700 lines with a lot of repeating blocks. Split into smaller functions.
* [ ] **Editor Indentation Fix:** * [ ] **Editor Indentation Fix:** - Main : merger indentation with the parser for more accurate results.
* [ ] Keep cache of language maps in engine to reduce lookup time. * [ ] Keep cache of language maps in engine to reduce lookup time.
* [ ] In indents add function to support tab which indents if before any content and inserts a pure \t otherwise. * [ ] In indents add function to support tab which indents if before any content and inserts a pure \t otherwise.
* [ ] And backspace which undents if before any content. * [ ] And backspace which undents if before any content.
* [ ] Add block indentation support. * [ ] Add block indentation support.
* [ ] Ignore comments/strings (maybe as-set by tree-sitter) when auto-indenting. * [ ] Ignore comments/strings from parser when auto-indenting.
* [ ] Just use span cursor to avoid strings/comments.. And use another map for c-style single line block and add stuff like operators to it.
* [ ] These will dedent when the block immediately after them is dedented * [ ] These will dedent when the block immediately after them is dedented
* [ ] Dont dedent is ending is valid starting is invalid but also empty * [ ] Dont dedent if ending is valid starting is invalid but also empty
* [ ] Just leave asis if starting is empty * [ ] Just leave asis if starting is empty
* [ ] **Readme:** Update readme to show indetation mechanics. * [ ] **Readme:** Update readme to show indetation mechanics.
* [ ] **LSP Bug:** Try to find out why emojis are breaking lsp edits. (check the ruby sample) * [ ] **LSP Bug:** Try to find out why emojis are breaking lsp edits. (check the ruby sample)
@@ -35,8 +34,9 @@ Copyright 2025 Syed Daanish
* make it faster for line inserts/deletes too (treeify the vector) * make it faster for line inserts/deletes too (treeify the vector)
* Try to make all functions better now that folds have been purged * Try to make all functions better now that folds have been purged
* Cleanup syntax and renderer files * Cleanup syntax and renderer files
* Fix ruby regexp not living across lines when edits are made
### Core Editing Mechanics * for ruby regex use hueristic where is a space is seen after the / it is not a regexp
* [ ] **Undo/Redo:** Add support for undo/redo history. * [ ] **Undo/Redo:** Add support for undo/redo history.

View File

@@ -73,19 +73,76 @@ struct Highlight {
uint8_t flags; uint8_t flags;
}; };
inline static const std::unordered_map<uint8_t, Highlight> highlight_map = { enum struct TokenKind : uint8_t {
{0, {0xFFFFFF, 0, 0}}, {1, {0xAAAAAA, 0, CF_ITALIC}}, #define ADD(name) name,
{2, {0xAAD94C, 0, 0}}, {3, {0xFFFFFF, 0, CF_ITALIC}}, #include "syntax/tokens.def"
{4, {0xFF8F40, 0, 0}}, {5, {0xFFB454, 0, 0}}, #undef ADD
{6, {0xD2A6FF, 0, 0}}, {7, {0x95E6CB, 0, 0}}, Count
{8, {0xF07178, 0, 0}}, {9, {0xE6C08A, 0, 0}},
{10, {0x7dcfff, 0, 0}},
}; };
constexpr size_t TOKEN_KIND_COUNT = static_cast<size_t>(TokenKind::Count);
const std::unordered_map<std::string, TokenKind> kind_map = {
#define ADD(name) {#name, TokenKind::name},
#include "syntax/tokens.def"
#undef ADD
};
extern std::array<Highlight, TOKEN_KIND_COUNT> highlights;
inline void load_theme(std::string filename) {
uint32_t len = 0;
char *raw = load_file(filename.c_str(), &len);
if (!raw)
return;
std::string data(raw, len);
free(raw);
json j = json::parse(data);
Highlight default_hl = {0xFFFFFF, 0, 0};
if (j.contains("Default")) {
auto def = j["Default"];
if (def.contains("fg") && def["fg"].is_string())
default_hl.fg = HEX(def["fg"]);
if (def.contains("bg") && def["bg"].is_string())
default_hl.bg = HEX(def["bg"]);
if (def.contains("italic") && def["italic"].get<bool>())
default_hl.flags |= CF_ITALIC;
if (def.contains("bold") && def["bold"].get<bool>())
default_hl.flags |= CF_BOLD;
if (def.contains("underline") && def["underline"].get<bool>())
default_hl.flags |= CF_UNDERLINE;
if (def.contains("strikethrough") && def["strikethrough"].get<bool>())
default_hl.flags |= CF_STRIKETHROUGH;
}
for (auto &hl : highlights)
hl = default_hl;
for (auto &[key, value] : j.items()) {
if (key == "Default")
continue;
auto it = kind_map.find(key);
if (it == kind_map.end())
continue;
Highlight hl = {0xFFFFFF, 0, 0};
if (value.contains("fg") && value["fg"].is_string())
hl.fg = HEX(value["fg"]);
if (value.contains("bg") && value["bg"].is_string())
hl.bg = HEX(value["bg"]);
if (value.contains("italic") && value["italic"].get<bool>())
hl.flags |= CF_ITALIC;
if (value.contains("bold") && value["bold"].get<bool>())
hl.flags |= CF_BOLD;
if (value.contains("underline") && value["underline"].get<bool>())
hl.flags |= CF_UNDERLINE;
if (value.contains("strikethrough") && value["strikethrough"].get<bool>())
hl.flags |= CF_STRIKETHROUGH;
highlights[static_cast<uint8_t>(it->second)] = hl;
}
}
struct Token { struct Token {
uint32_t start; uint32_t start;
uint32_t end; uint32_t end;
uint8_t type; TokenKind type;
}; };
struct LineData { struct LineData {

View File

@@ -10,9 +10,27 @@
bool name##_state_match(std::shared_ptr<void> state_1, \ bool name##_state_match(std::shared_ptr<void> state_1, \
std::shared_ptr<void> state_2); std::shared_ptr<void> state_2);
#define LANG_A(name) {name##_parse, name##_state_match} #define LANG_A(name) \
{ \
#name, { name##_parse, name##_state_match } \
}
template <typename T>
inline std::shared_ptr<T> ensure_state(std::shared_ptr<T> state) {
using U = typename T::full_state_type;
if (!state)
state = std::make_shared<T>();
if (!state.unique())
state = std::make_shared<T>(*state);
if (!state->full_state)
state->full_state = std::make_shared<U>();
else if (!state->full_state.unique())
state->full_state = std::make_shared<U>(*state->full_state);
return state;
}
DEF_LANG(ruby); DEF_LANG(ruby);
DEF_LANG(bash);
inline static const std::unordered_map< inline static const std::unordered_map<
std::string, std::string,
@@ -22,7 +40,8 @@ inline static const std::unordered_map<
bool (*)(std::shared_ptr<void> state_1, bool (*)(std::shared_ptr<void> state_1,
std::shared_ptr<void> state_2)>> std::shared_ptr<void> state_2)>>
parsers = { parsers = {
{"ruby", LANG_A(ruby)}, LANG_A(ruby),
LANG_A(bash),
}; };
#endif #endif

View File

@@ -1,212 +1,233 @@
// #include "syntax/decl.h" #ifndef LINE_TREE_H
// #define LINE_TREE_H
// struct LineTree {
// void clear() { #include "syntax/decl.h"
// clear_node(root);
// root = nullptr; struct LineTree {
// stack_size = 0; void clear() {
// } std::unique_lock lock(mtx);
// void build(uint32_t x) { root = build_node(x); } clear_node(root);
// LineData *at(uint32_t x) { root = nullptr;
// LineNode *n = root; stack_size = 0;
// while (n) { }
// uint32_t left_size = n->left ? n->left->size : 0; void build(uint32_t x) {
// if (x < left_size) { std::unique_lock lock(mtx);
// n = n->left; root = build_node(x);
// } else if (x < left_size + n->data.size()) { }
// return &n->data[x - left_size]; LineData *at(uint32_t x) {
// } else { std::shared_lock lock(mtx);
// x -= left_size + n->data.size(); LineNode *n = root;
// n = n->right; while (n) {
// } uint32_t left_size = n->left ? n->left->size : 0;
// } if (x < left_size) {
// return nullptr; n = n->left;
// } } else if (x < left_size + n->data.size()) {
// LineData *start_iter(uint32_t x) { return &n->data[x - left_size];
// stack_size = 0; } else {
// LineNode *n = root; x -= left_size + n->data.size();
// while (n) { n = n->right;
// uint32_t left_size = n->left ? n->left->size : 0; }
// if (x < left_size) { }
// push(n, 0); return nullptr;
// n = n->left; }
// } else if (x < left_size + n->data.size()) { LineData *start_iter(uint32_t x) {
// push(n, x - left_size + 1); std::shared_lock lock(mtx);
// return &n->data[x - left_size]; stack_size = 0;
// } else { LineNode *n = root;
// x -= left_size + n->data.size(); while (n) {
// push(n, UINT32_MAX); uint32_t left_size = n->left ? n->left->size : 0;
// n = n->right; if (x < left_size) {
// } push(n, 0);
// } n = n->left;
// return nullptr; } else if (x < left_size + n->data.size()) {
// } push(n, x - left_size + 1);
// void end_iter() { stack_size = 0; } return &n->data[x - left_size];
// LineData *next() { } else {
// while (stack_size) { x -= left_size + n->data.size();
// auto &f = stack[stack_size - 1]; push(n, UINT32_MAX);
// LineNode *n = f.node; n = n->right;
// if (f.index < n->data.size()) }
// return &n->data[f.index++]; }
// stack_size--; return nullptr;
// if (n->right) { }
// n = n->right; void end_iter() { stack_size = 0; }
// while (n) { LineData *next() {
// push(n, 0); std::shared_lock lock(mtx);
// if (!n->left) while (stack_size) {
// break; auto &f = stack[stack_size - 1];
// n = n->left; LineNode *n = f.node;
// } if (f.index < n->data.size())
// return &stack[stack_size - 1].node->data[0]; return &n->data[f.index++];
// } stack_size--;
// } if (n->right) {
// return nullptr; n = n->right;
// } while (n) {
// void insert(uint32_t x, uint32_t y) { root = insert_node(root, x, y); } push(n, 0);
// void erase(uint32_t x, uint32_t y) { root = erase_node(root, x, y); } if (!n->left)
// uint32_t count() { return subtree_size(root); } break;
// ~LineTree() { clear(); } n = n->left;
// }
// private: return &stack[stack_size - 1].node->data[0];
// struct LineNode { }
// LineNode *left = nullptr; }
// LineNode *right = nullptr; return nullptr;
// uint8_t depth = 1; }
// uint32_t size = 0; void insert(uint32_t x, uint32_t y) {
// std::vector<LineData> data; std::unique_lock lock(mtx);
// }; root = insert_node(root, x, y);
// struct Frame { }
// LineNode *node; void erase(uint32_t x, uint32_t y) {
// uint32_t index; std::unique_lock lock(mtx);
// }; root = erase_node(root, x, y);
// void push(LineNode *n, uint32_t x) { }
// stack[stack_size].node = n; uint32_t count() {
// stack[stack_size].index = x; std::shared_lock lock(mtx);
// stack_size++; return subtree_size(root);
// } }
// static void clear_node(LineNode *n) { ~LineTree() { clear(); }
// if (!n)
// return; private:
// clear_node(n->left); struct LineNode {
// clear_node(n->right); LineNode *left = nullptr;
// delete n; LineNode *right = nullptr;
// } uint8_t depth = 1;
// LineNode *root = nullptr; uint32_t size = 0;
// Frame stack[32]; std::vector<LineData> data;
// uint8_t stack_size = 0; };
// static constexpr uint32_t LEAF_TARGET = 256; struct Frame {
// LineTree::LineNode *erase_node(LineNode *n, uint32_t x, uint32_t y) { LineNode *node;
// if (!n) uint32_t index;
// return nullptr; };
// if (!n->left && !n->right) { void push(LineNode *n, uint32_t x) {
// n->data.erase(n->data.begin() + x, n->data.begin() + x + y); stack[stack_size].node = n;
// fix(n); stack[stack_size].index = x;
// return n; stack_size++;
// } }
// uint32_t left_size = subtree_size(n->left); static void clear_node(LineNode *n) {
// if (x < left_size) if (!n)
// n->left = erase_node(n->left, x, y); return;
// else clear_node(n->left);
// n->right = erase_node(n->right, x - left_size - n->data.size(), y); clear_node(n->right);
// if (n->left && n->right && delete n;
// subtree_size(n->left) + subtree_size(n->right) < 256) { }
// return merge(n->left, n->right); LineNode *root = nullptr;
// } Frame stack[32];
// return rebalance(n); std::atomic<uint8_t> stack_size = 0;
// } std::shared_mutex mtx;
// LineTree::LineNode *insert_node(LineNode *n, uint32_t x, uint32_t y) { static constexpr uint32_t LEAF_TARGET = 256;
// if (!n) { LineTree::LineNode *erase_node(LineNode *n, uint32_t x, uint32_t y) {
// auto *leaf = new LineNode(); if (!n)
// leaf->data.resize(y); return nullptr;
// leaf->size = y; if (!n->left && !n->right) {
// return leaf; n->data.erase(n->data.begin() + x, n->data.begin() + x + y);
// } fix(n);
// if (!n->left && !n->right) { return n;
// n->data.insert(n->data.begin() + x, y, LineData{}); }
// fix(n); uint32_t left_size = subtree_size(n->left);
// if (n->data.size() > 512) if (x < left_size)
// return split_leaf(n); n->left = erase_node(n->left, x, y);
// return n; else
// } n->right = erase_node(n->right, x - left_size - n->data.size(), y);
// uint32_t left_size = subtree_size(n->left); if (n->left && n->right &&
// if (x <= left_size) subtree_size(n->left) + subtree_size(n->right) < 256) {
// n->left = insert_node(n->left, x, y); return merge(n->left, n->right);
// else }
// n->right = insert_node(n->right, x - left_size - n->data.size(), y); return rebalance(n);
// return rebalance(n); }
// } LineTree::LineNode *insert_node(LineNode *n, uint32_t x, uint32_t y) {
// LineNode *build_node(uint32_t count) { if (!n) {
// if (count <= LEAF_TARGET) { auto *leaf = new LineNode();
// auto *n = new LineNode(); leaf->data.resize(y);
// n->data.resize(count); leaf->size = y;
// n->size = count; return leaf;
// return n; }
// } if (!n->left && !n->right) {
// uint32_t left_count = count / 2; n->data.insert(n->data.begin() + x, y, LineData{});
// uint32_t right_count = count - left_count; fix(n);
// auto *n = new LineNode(); if (n->data.size() > 512)
// n->left = build_node(left_count); return split_leaf(n);
// n->right = build_node(right_count); return n;
// fix(n); }
// return n; uint32_t left_size = subtree_size(n->left);
// } if (x <= left_size)
// static LineNode *split_leaf(LineNode *n) { n->left = insert_node(n->left, x, y);
// auto *right = new LineNode(); else
// size_t mid = n->data.size() / 2; n->right = insert_node(n->right, x - left_size - n->data.size(), y);
// right->data.assign(n->data.begin() + mid, n->data.end()); return rebalance(n);
// n->data.resize(mid); }
// fix(n); LineNode *build_node(uint32_t count) {
// fix(right); if (count <= LEAF_TARGET) {
// auto *parent = new LineNode(); auto *n = new LineNode();
// parent->left = n; n->data.resize(count);
// parent->right = right; n->size = count;
// fix(parent); return n;
// return parent; }
// } uint32_t left_count = count / 2;
// static LineNode *merge(LineNode *a, LineNode *b) { uint32_t right_count = count - left_count;
// a->data.insert(a->data.end(), b->data.begin(), b->data.end()); auto *n = new LineNode();
// delete b; n->left = build_node(left_count);
// fix(a); n->right = build_node(right_count);
// return a; fix(n);
// } return n;
// static void fix(LineNode *n) { }
// n->depth = 1 + MAX(height(n->left), height(n->right)); static LineNode *split_leaf(LineNode *n) {
// n->size = subtree_size(n->left) + n->data.size() + auto *right = new LineNode();
// subtree_size(n->right); size_t mid = n->data.size() / 2;
// } right->data.assign(n->data.begin() + mid, n->data.end());
// static LineNode *rotate_right(LineNode *y) { n->data.resize(mid);
// LineNode *x = y->left; fix(n);
// LineNode *T2 = x->right; fix(right);
// x->right = y; auto *parent = new LineNode();
// y->left = T2; parent->left = n;
// fix(y); parent->right = right;
// fix(x); fix(parent);
// return x; return parent;
// } }
// static LineNode *rotate_left(LineNode *x) { static LineNode *merge(LineNode *a, LineNode *b) {
// LineNode *y = x->right; a->data.insert(a->data.end(), b->data.begin(), b->data.end());
// LineNode *T2 = y->left; delete b;
// y->left = x; fix(a);
// x->right = T2; return a;
// fix(x); }
// fix(y); static void fix(LineNode *n) {
// return y; n->depth = 1 + MAX(height(n->left), height(n->right));
// } n->size = subtree_size(n->left) + n->data.size() + subtree_size(n->right);
// static LineNode *rebalance(LineNode *n) { }
// fix(n); static LineNode *rotate_right(LineNode *y) {
// int balance = int(height(n->left)) - int(height(n->right)); LineNode *x = y->left;
// if (balance > 1) { LineNode *T2 = x->right;
// if (height(n->left->left) < height(n->left->right)) x->right = y;
// n->left = rotate_left(n->left); y->left = T2;
// return rotate_right(n); fix(y);
// } fix(x);
// if (balance < -1) { return x;
// if (height(n->right->right) < height(n->right->left)) }
// n->right = rotate_right(n->right); static LineNode *rotate_left(LineNode *x) {
// return rotate_left(n); LineNode *y = x->right;
// } LineNode *T2 = y->left;
// return n; y->left = x;
// } x->right = T2;
// static uint8_t height(LineNode *n) { return n ? n->depth : 0; } fix(x);
// static uint32_t subtree_size(LineNode *n) { return n ? n->size : 0; } fix(y);
// }; return y;
}
static LineNode *rebalance(LineNode *n) {
fix(n);
int balance = int(height(n->left)) - int(height(n->right));
if (balance > 1) {
if (height(n->left->left) < height(n->left->right))
n->left = rotate_left(n->left);
return rotate_right(n);
}
if (balance < -1) {
if (height(n->right->right) < height(n->right->left))
n->right = rotate_right(n->right);
return rotate_left(n);
}
return n;
}
static uint8_t height(LineNode *n) { return n ? n->depth : 0; }
static uint32_t subtree_size(LineNode *n) { return n ? n->size : 0; }
};
#endif

View File

@@ -1,4 +1,8 @@
#ifndef SYNTAX_PARSER_H
#define SYNTAX_PARSER_H
#include "syntax/decl.h" #include "syntax/decl.h"
#include "syntax/line_tree.h"
struct Parser { struct Parser {
Knot *root; Knot *root;
@@ -12,7 +16,7 @@ struct Parser {
std::atomic<uint32_t> scroll_max{UINT32_MAX - 2048}; std::atomic<uint32_t> scroll_max{UINT32_MAX - 2048};
std::mutex mutex; std::mutex mutex;
std::mutex data_mutex; std::mutex data_mutex;
std::vector<LineData> line_data; LineTree line_tree;
std::set<uint32_t> dirty_lines; std::set<uint32_t> dirty_lines;
Parser(Knot *n_root, std::shared_mutex *n_knot_mutex, std::string n_lang, Parser(Knot *n_root, std::shared_mutex *n_knot_mutex, std::string n_lang,
@@ -21,13 +25,6 @@ struct Parser {
uint32_t new_end_line); uint32_t new_end_line);
void work(); void work();
void scroll(uint32_t line); void scroll(uint32_t line);
uint8_t get_type(Coord c) {
if (c.row >= line_data.size())
return 0;
const LineData &line = line_data[c.row];
for (const Token &t : line.tokens)
if (t.start <= c.col && c.col < t.end)
return t.type;
return 0;
}
}; };
#endif

51
include/syntax/tokens.def Normal file
View File

@@ -0,0 +1,51 @@
ADD(Data)
ADD(Comment)
ADD(String)
ADD(Escape)
ADD(Interpolation)
ADD(Regexp)
ADD(Number)
ADD(True)
ADD(False)
ADD(Char)
ADD(Keyword)
ADD(KeywordOperator)
ADD(Operator)
ADD(Function)
ADD(Type)
ADD(Constant)
ADD(VariableInstance)
ADD(VariableGlobal)
ADD(Annotation)
ADD(Directive)
ADD(Label)
ADD(Brace1)
ADD(Brace2)
ADD(Brace3)
ADD(Brace4)
ADD(Brace5)
ADD(Heading1)
ADD(Heading2)
ADD(Heading3)
ADD(Heading4)
ADD(Heading5)
ADD(Heading6)
ADD(Blockquote)
ADD(List)
ADD(ListItem)
ADD(Code)
ADD(LanguageName)
ADD(LinkLabel)
ADD(ImageLabel)
ADD(Link)
ADD(Table)
ADD(TableHeader)
ADD(Italic)
ADD(Bold)
ADD(Underline)
ADD(Strikethrough)
ADD(HorixontalRule)
ADD(Tag)
ADD(Attribute)
ADD(CheckDone)
ADD(CheckNotDone)

View File

@@ -71,6 +71,14 @@ struct Language {
#define UNUSED(x) (void)(x) #define UNUSED(x) (void)(x)
#define USING(x) UNUSED(sizeof(x)) #define USING(x) UNUSED(sizeof(x))
inline uint32_t HEX(const std::string &s) {
if (s.empty())
return 0xFFFFFF;
size_t start = (s.front() == '#') ? 1 : 0;
return static_cast<uint32_t>(std::stoul(s.substr(start), nullptr, 16));
}
bool compare(const char *a, const char *b, size_t n);
std::string clean_text(const std::string &input); std::string clean_text(const std::string &input);
std::string percent_encode(const std::string &s); std::string percent_encode(const std::string &s);
std::string percent_decode(const std::string &s); std::string percent_decode(const std::string &s);

View File

@@ -22,13 +22,29 @@ cjk_samples = [
] ]
# Ruby regex with unicode # Ruby regex with unicode
$unicode_regex = /[一-龯ぁ-んァ-ヶー々〆〤]/ $unicode_regex = /[一-龯ぁ-ん#{0x3000}
\-ヶー
s wow
々〆〤]/
UNICORE = %r{
{#{}}
}
UNINITCORE = %{
{{#{}}}
}
# Unicode identifiers (valid in Ruby) # Unicode identifiers (valid in Ruby)
= 0x5_4eddaee = 0x5_4eddaee
π = 3.14_159e+2, ?\u0234, ?\,, ?\x0A, ?s π = 3.14_159e+2, ?\u0234, ?\,, ?\x0A, ?s
= -> { "こんにちは" } = -> { "こんに \n ちは" }
# Method using unicode variable names # Method using unicode variable names
def math_test def math_test
@@ -53,7 +69,7 @@ multi = <<BASH
local n="$1" local n="$1"
if ((n <= 1)); then if ((n <= 1)); then
echo 1 echo 1
else else\ns
local prev local prev
prev=$(factorial $((n - 1))) prev=$(factorial $((n - 1)))
echo $((n * prev)) echo $((n * prev))
@@ -82,9 +98,11 @@ mixed = [
two_docs = <<DOC1 , <<DOC2 two_docs = <<DOC1 , <<DOC2
stuff for doc2 stuff for doc2
DOC1 DOC1
stuff for doc 2 with #{interpolation} and more stuff for doc 2 with \#{interpolation} and more
DOC2 DOC2
p = 0 <<22
mixed.each { |m| puts m } mixed.each { |m| puts m }
# Unicode in comments — highlight me! # Unicode in comments — highlight me!
@@ -100,6 +118,7 @@ end
escaped = "Line1\nLine2\tTabbed 😀" escaped = "Line1\nLine2\tTabbed 😀"
puts escaped puts escaped
p = 0 <<2
# Frozen string literal test # Frozen string literal test
# frozen_string_literal: true # frozen_string_literal: true
const_str = "定数文字列🔒".freeze const_str = "定数文字列🔒".freeze
@@ -199,6 +218,8 @@ def greet_person(name)
end end
end end
h = a / a
# Calling methods # Calling methods
greet_person("Alice") greet_person("Alice")
greet_person("Bob") greet_person("Bob")

View File

@@ -32,8 +32,8 @@ Editor *new_editor(const char *filename_arg, Coord position, Coord size) {
if (editor->lang.name != "unknown") if (editor->lang.name != "unknown")
editor->parser = new Parser(editor->root, &editor->knot_mtx, editor->parser = new Parser(editor->root, &editor->knot_mtx,
editor->lang.name, size.row + 5); editor->lang.name, size.row + 5);
// if (len <= (1024 * 28)) if (len <= (1024 * 28))
// request_add_to_lsp(editor->lang, editor); request_add_to_lsp(editor->lang, editor);
editor->indents.compute_indent(editor); editor->indents.compute_indent(editor);
return editor; return editor;
} }

View File

@@ -1,5 +1,7 @@
#include "editor/editor.h" #include "editor/editor.h"
#include "main.h" #include "main.h"
#include "syntax/decl.h"
#include "syntax/parser.h"
void render_editor(Editor *editor) { void render_editor(Editor *editor) {
uint32_t sel_start = 0, sel_end = 0; uint32_t sel_start = 0, sel_end = 0;
@@ -23,6 +25,15 @@ void render_editor(Editor *editor) {
std::unique_lock<std::mutex> lock; std::unique_lock<std::mutex> lock;
if (editor->parser) if (editor->parser)
lock = std::unique_lock<std::mutex>(editor->parser->mutex); lock = std::unique_lock<std::mutex>(editor->parser->mutex);
LineData *line_data = nullptr;
auto get_type = [&](uint32_t col) {
if (!line_data)
return 0;
for (auto const &token : line_data->tokens)
if (token.start <= col && token.end > col)
return (int)token.type;
return 0;
};
std::shared_lock knot_lock(editor->knot_mtx); std::shared_lock knot_lock(editor->knot_mtx);
if (editor->selection_active) { if (editor->selection_active) {
Coord start, end; Coord start, end;
@@ -82,6 +93,10 @@ void render_editor(Editor *editor) {
while (rendered_rows < editor->size.row) { while (rendered_rows < editor->size.row) {
uint32_t line_len; uint32_t line_len;
char *line = next_line(it, &line_len); char *line = next_line(it, &line_len);
if (line_data)
line_data = editor->parser->line_tree.next();
else
line_data = editor->parser->line_tree.start_iter(line_index);
if (!line) if (!line)
break; break;
if (line_len > 0 && line[line_len - 1] == '\n') if (line_len > 0 && line[line_len - 1] == '\n')
@@ -140,9 +155,8 @@ void render_editor(Editor *editor) {
uint32_t absolute_byte_pos = uint32_t absolute_byte_pos =
global_byte_offset + current_byte_offset + local_render_offset; global_byte_offset + current_byte_offset + local_render_offset;
const Highlight *hl = nullptr; const Highlight *hl = nullptr;
if (editor->parser && editor->parser->line_data.size() > line_index) if (editor->parser)
hl = &highlight_map.at(editor->parser->get_type( hl = &highlights[get_type(current_byte_offset + local_render_offset)];
{line_index, current_byte_offset + local_render_offset}));
uint32_t fg = hl ? hl->fg : 0xFFFFFF; uint32_t fg = hl ? hl->fg : 0xFFFFFF;
uint32_t bg = hl ? hl->bg : 0; uint32_t bg = hl ? hl->bg : 0;
uint8_t fl = hl ? hl->flags : 0; uint8_t fl = hl ? hl->flags : 0;

View File

@@ -16,19 +16,11 @@ static bool init_lsp(std::shared_ptr<LSPInstance> lsp) {
if (pid == 0) { if (pid == 0) {
dup2(in_pipe[0], STDIN_FILENO); dup2(in_pipe[0], STDIN_FILENO);
dup2(out_pipe[1], STDOUT_FILENO); dup2(out_pipe[1], STDOUT_FILENO);
#ifdef __clang__
int devnull = open("/dev/null", O_WRONLY); int devnull = open("/dev/null", O_WRONLY);
if (devnull >= 0) { if (devnull >= 0) {
dup2(devnull, STDERR_FILENO); dup2(devnull, STDERR_FILENO);
close(devnull); close(devnull);
} }
#else
int log = open("/tmp/lsp.log", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (log >= 0) {
dup2(log, STDERR_FILENO);
close(log);
}
#endif
close(in_pipe[0]); close(in_pipe[0]);
close(in_pipe[1]); close(in_pipe[1]);
close(out_pipe[0]); close(out_pipe[0]);

View File

@@ -2,6 +2,7 @@
#include "editor/editor.h" #include "editor/editor.h"
#include "io/sysio.h" #include "io/sysio.h"
#include "lsp/lsp.h" #include "lsp/lsp.h"
#include "syntax/decl.h"
#include "ui/bar.h" #include "ui/bar.h"
#include "utils/utils.h" #include "utils/utils.h"
@@ -61,6 +62,8 @@ int main(int argc, char *argv[]) {
system(("bash " + get_exe_dir() + "/../scripts/init.sh").c_str()); system(("bash " + get_exe_dir() + "/../scripts/init.sh").c_str());
load_theme(get_exe_dir() + "/../themes/default.json");
Editor *editor = new_editor(filename, {0, 0}, {screen.row - 2, screen.col}); Editor *editor = new_editor(filename, {0, 0}, {screen.row - 2, screen.col});
Bar bar(screen); Bar bar(screen);

73
src/syntax/bash.cc Normal file
View File

@@ -0,0 +1,73 @@
#include "syntax/decl.h"
#include "syntax/langs.h"
#include "utils/utils.h"
struct BashFullState {
int brace_level = 0;
enum : uint8_t { NONE, STRING, HEREDOC };
uint8_t in_state = BashFullState::NONE;
bool line_cont = false;
struct Lit {
std::string delim = "";
int brace_level = 1;
bool allow_interp = false;
bool operator==(const BashFullState::Lit &other) const {
return delim == other.delim && brace_level == other.brace_level &&
allow_interp == other.allow_interp;
}
} lit;
bool operator==(const BashFullState &other) const {
return in_state == other.in_state && lit == other.lit &&
brace_level == other.brace_level && line_cont == other.line_cont;
}
};
struct BashState {
using full_state_type = BashFullState;
int interp_level = 0;
std::stack<std::shared_ptr<BashFullState>> interp_stack;
std::shared_ptr<BashFullState> full_state;
bool operator==(const BashState &other) const {
return interp_level == other.interp_level &&
interp_stack == other.interp_stack &&
((full_state && other.full_state &&
*full_state == *other.full_state));
}
};
bool bash_state_match(std::shared_ptr<void> state_1,
std::shared_ptr<void> state_2) {
if (!state_1 || !state_2)
return false;
return *std::static_pointer_cast<BashState>(state_1) ==
*std::static_pointer_cast<BashState>(state_2);
}
std::shared_ptr<void> bash_parse(std::vector<Token> *tokens,
std::shared_ptr<void> in_state,
const char *text, uint32_t len) {
static bool keywords_trie_init = false;
if (!keywords_trie_init) {
keywords_trie_init = true;
}
tokens->clear();
auto state = ensure_state(std::static_pointer_cast<BashState>(in_state));
uint32_t i = 0;
while (len > 0 && (text[len - 1] == '\n' || text[len - 1] == '\r' ||
text[len - 1] == '\t' || text[len - 1] == ' '))
len--;
if (len == 0)
return state;
bool heredoc_first = false;
while (i < len) {
i += utf8_codepoint_width(text[i]);
}
return state;
}

View File

@@ -1,12 +1,14 @@
#include "syntax/parser.h"
#include "io/knot.h" #include "io/knot.h"
#include "main.h" #include "main.h"
#include "syntax/decl.h"
#include "syntax/langs.h" #include "syntax/langs.h"
#include "syntax/parser.h"
std::array<Highlight, TOKEN_KIND_COUNT> highlights = {};
Parser::Parser(Knot *n_root, std::shared_mutex *n_knot_mutex, Parser::Parser(Knot *n_root, std::shared_mutex *n_knot_mutex,
std::string n_lang, uint32_t n_scroll_max) { std::string n_lang, uint32_t n_scroll_max) {
scroll_max = n_scroll_max; scroll_max = n_scroll_max;
line_data.reserve(n_root->line_count + 1);
knot_mutex = n_knot_mutex; knot_mutex = n_knot_mutex;
lang = n_lang; lang = n_lang;
auto pair = parsers.find(n_lang); auto pair = parsers.find(n_lang);
@@ -24,11 +26,9 @@ void Parser::edit(Knot *n_root, uint32_t start_line, uint32_t old_end_line,
std::lock_guard lock(data_mutex); std::lock_guard lock(data_mutex);
root = n_root; root = n_root;
if (((int64_t)old_end_line - (int64_t)start_line) > 0) if (((int64_t)old_end_line - (int64_t)start_line) > 0)
line_data.erase(line_data.begin() + start_line, line_tree.erase(start_line + 1, old_end_line - start_line);
line_data.begin() + start_line + old_end_line - start_line);
if (((int64_t)new_end_line - (int64_t)old_end_line) > 0) if (((int64_t)new_end_line - (int64_t)old_end_line) > 0)
line_data.insert(line_data.begin() + start_line, line_tree.insert(start_line + 1, new_end_line - start_line);
new_end_line - old_end_line, LineData{});
dirty_lines.insert(start_line); dirty_lines.insert(start_line);
} }
@@ -42,16 +42,18 @@ void Parser::work() {
tmp_dirty.swap(dirty_lines); tmp_dirty.swap(dirty_lines);
lock_data.unlock(); lock_data.unlock();
std::set<uint32_t> remaining_dirty; std::set<uint32_t> remaining_dirty;
std::unique_lock lock(mutex);
lock.unlock();
for (uint32_t c_line : tmp_dirty) { for (uint32_t c_line : tmp_dirty) {
if (c_line > scroll_max) { if (c_line > scroll_max) {
remaining_dirty.insert(c_line); remaining_dirty.insert(c_line);
continue; continue;
} }
std::unique_lock lock(mutex); uint32_t line_count = line_tree.count();
uint32_t line_count = (uint32_t)line_data.size(); lock_data.lock();
std::shared_ptr<void> prev_state = std::shared_ptr<void> prev_state =
(c_line > 0) ? line_data[c_line - 1].out_state : nullptr; (c_line > 0) ? line_tree.at(c_line - 1)->out_state : nullptr;
lock.unlock(); lock_data.unlock();
while (c_line < line_count) { while (c_line < line_count) {
if (!running.load(std::memory_order_relaxed)) { if (!running.load(std::memory_order_relaxed)) {
free(text); free(text);
@@ -70,14 +72,17 @@ void Parser::work() {
if (c_line < scroll_max && if (c_line < scroll_max &&
((scroll_max > 100 && c_line > scroll_max - 100) || c_line < 100)) ((scroll_max > 100 && c_line > scroll_max - 100) || c_line < 100))
lock.lock(); lock.lock();
if (line_tree.count() < c_line) {
if (lock.owns_lock())
lock.unlock();
continue;
}
lock_data.lock(); lock_data.lock();
LineData *line_data = line_tree.at(c_line);
std::shared_ptr<void> new_state = std::shared_ptr<void> new_state =
parse_func(&line_data[c_line].tokens, prev_state, text, r_len); parse_func(&line_data->tokens, prev_state, text, r_len);
lock_data.unlock(); line_data->in_state = prev_state;
line_data[c_line].in_state = prev_state; line_data->out_state = new_state;
line_data[c_line].out_state = new_state;
if (lock.owns_lock())
lock.unlock();
if (!running.load(std::memory_order_relaxed)) { if (!running.load(std::memory_order_relaxed)) {
free(text); free(text);
return; return;
@@ -85,16 +90,24 @@ void Parser::work() {
prev_state = new_state; prev_state = new_state;
c_line++; c_line++;
if (c_line < line_count && c_line > scroll_max + 50) { if (c_line < line_count && c_line > scroll_max + 50) {
lock_data.unlock();
if (lock.owns_lock())
lock.unlock();
if (c_line > 0) if (c_line > 0)
remaining_dirty.insert(c_line - 1); remaining_dirty.insert(c_line - 1);
remaining_dirty.insert(c_line); remaining_dirty.insert(c_line);
break; break;
} }
lock.lock();
if (c_line < line_count && if (c_line < line_count &&
state_match_func(prev_state, line_data[c_line].in_state)) state_match_func(prev_state, line_tree.at(c_line)->in_state)) {
lock_data.unlock();
if (lock.owns_lock())
lock.unlock();
break; break;
lock.unlock(); }
lock_data.unlock();
if (lock.owns_lock())
lock.unlock();
} }
if (!running.load(std::memory_order_relaxed)) { if (!running.load(std::memory_order_relaxed)) {
free(text); free(text);
@@ -110,20 +123,20 @@ void Parser::scroll(uint32_t line) {
if (line != scroll_max) { if (line != scroll_max) {
scroll_max = line; scroll_max = line;
uint32_t c_line = line > 100 ? line - 100 : 0; uint32_t c_line = line > 100 ? line - 100 : 0;
if (line_data.size() < c_line) if (line_tree.count() < c_line)
return; return;
if (line_data[c_line].in_state || line_data[c_line].out_state) std::unique_lock lock_data(data_mutex);
if (line_tree.at(c_line)->in_state || line_tree.at(c_line)->out_state)
return; return;
lock_data.unlock();
std::shared_lock k_lock(*knot_mutex); std::shared_lock k_lock(*knot_mutex);
k_lock.unlock(); k_lock.unlock();
uint32_t capacity = 256; uint32_t capacity = 256;
char *text = (char *)calloc((capacity + 1), sizeof(char)); char *text = (char *)calloc((capacity + 1), sizeof(char));
std::unique_lock lock_data(data_mutex); uint32_t line_count = line_tree.count();
lock_data.unlock();
std::unique_lock lock(mutex); std::unique_lock lock(mutex);
uint32_t line_count = (uint32_t)line_data.size();
std::shared_ptr<void> prev_state = std::shared_ptr<void> prev_state =
(c_line > 0) ? line_data[c_line - 1].out_state : nullptr; (c_line > 0) ? line_tree.at(c_line - 1)->out_state : nullptr;
lock.unlock(); lock.unlock();
while (c_line < line_count) { while (c_line < line_count) {
if (!running.load(std::memory_order_relaxed)) { if (!running.load(std::memory_order_relaxed)) {
@@ -143,12 +156,18 @@ void Parser::scroll(uint32_t line) {
if (c_line < scroll_max && if (c_line < scroll_max &&
((scroll_max > 100 && c_line > scroll_max - 100) || c_line < 100)) ((scroll_max > 100 && c_line > scroll_max - 100) || c_line < 100))
lock.lock(); lock.lock();
if (line_tree.count() < c_line) {
if (lock.owns_lock())
lock.unlock();
continue;
}
lock_data.lock(); lock_data.lock();
LineData *line_data = line_tree.at(c_line);
std::shared_ptr<void> new_state = std::shared_ptr<void> new_state =
parse_func(&line_data[c_line].tokens, prev_state, text, r_len); parse_func(&line_data->tokens, prev_state, text, r_len);
line_data->in_state = nullptr;
line_data->out_state = new_state;
lock_data.unlock(); lock_data.unlock();
line_data[c_line].in_state = nullptr;
line_data[c_line].out_state = new_state;
if (lock.owns_lock()) if (lock.owns_lock())
lock.unlock(); lock.unlock();
if (!running.load(std::memory_order_relaxed)) { if (!running.load(std::memory_order_relaxed)) {

View File

@@ -1,24 +1,28 @@
#include "syntax/decl.h"
#include "syntax/langs.h" #include "syntax/langs.h"
const static std::vector<std::string> base_keywords = { const static std::vector<std::string> base_keywords = {
// style 4 "class", "module", "begin", "end", "else", "rescue", "ensure", "do", "when",
"if", "else", "elsif", "case", "rescue", "ensure", "do", "for", };
"while", "until", "def", "class", "module", "begin", "end", "unless",
const static std::vector<std::string> expecting_keywords = {
"if", "elsif", "case", "for", "while", "until", "unless",
}; };
const static std::vector<std::string> operator_keywords = { const static std::vector<std::string> operator_keywords = {
// style 5 "alias", "BEGIN", "break", "catch", "defined?", "in", "next",
"alias", "and", "BEGIN", "break", "catch", "defined?", "in", "next", "redo", "rescue", "retry", "super", "self", "nil", "undef",
"not", "or", "redo", "rescue", "retry", "return", "super", "yield", };
"self", "nil", "true", "false", "undef", "when",
const static std::vector<std::string> expecting_operators = {
"and", "return", "not", "yield", "or",
}; };
const static std::vector<std::string> operators = { const static std::vector<std::string> operators = {
"+", "-", "*", "/", "%", "**", "==", "!=", "===", "+", "-", "*", "/", "%", "**", "==", "!=", "===", "<=>", ">",
"<=>", ">", ">=", "<", "<=", "&&", "||", "!", "&", ">=", "<", "<=", "&&", "||", "!", "&", "|", "^", "~", "<<",
"|", "^", "~", "<<", ">>", "=", "+=", "-=", "*=", ">>", "=", "+=", "-=", "*=", "/=", "%=", "**=", "&=", "|=", "^=",
"/=", "%=", "**=", "&=", "|=", "^=", "<<=", ">>=", "..", "<<=", ">>=", "..", "...", "===", "=", "=>", "&", "`", "->", "=~",
"...", "===", "=", "=>", "&.", "[]", "[]=", "`", "->",
}; };
struct HeredocInfo { struct HeredocInfo {
@@ -34,19 +38,16 @@ struct HeredocInfo {
}; };
struct RubyFullState { struct RubyFullState {
// TODO: use this to highlight each level seperaletly like vscode colored
// braces extention thingy does
int brace_level = 0; int brace_level = 0;
int paren_level = 0;
int bracket_level = 0;
enum : uint8_t { NONE, STRING, REGEXP, COMMENT, HEREDOC, END }; enum : uint8_t { NONE, STRING, REGEXP, COMMENT, HEREDOC, END };
uint8_t in_state = RubyFullState::NONE; uint8_t in_state = RubyFullState::NONE;
bool expecting_expr = false;
struct Lit { struct Lit {
char delim_start = '\0'; char delim_start = '\0';
char delim_end = '\0'; char delim_end = '\0';
// For stuff like %Q{ { these braces are valid } this part is still str }
int brace_level = 1; int brace_level = 1;
bool allow_interp = false; bool allow_interp = false;
@@ -60,12 +61,13 @@ struct RubyFullState {
bool operator==(const RubyFullState &other) const { bool operator==(const RubyFullState &other) const {
return in_state == other.in_state && lit == other.lit && return in_state == other.in_state && lit == other.lit &&
brace_level == other.brace_level && brace_level == other.brace_level &&
paren_level == other.paren_level && expecting_expr == other.expecting_expr;
bracket_level == other.bracket_level;
} }
}; };
struct RubyState { struct RubyState {
using full_state_type = RubyFullState;
int interp_level = 0; int interp_level = 0;
std::stack<std::shared_ptr<RubyFullState>> interp_stack; std::stack<std::shared_ptr<RubyFullState>> interp_stack;
std::shared_ptr<RubyFullState> full_state; std::shared_ptr<RubyFullState> full_state;
@@ -80,32 +82,16 @@ struct RubyState {
} }
}; };
inline std::shared_ptr<RubyState> inline static bool identifier_start_char(char c) {
ensure_state(std::shared_ptr<RubyState> state) {
if (!state)
state = std::make_shared<RubyState>();
if (state.unique())
return state;
return std::make_shared<RubyState>(*state);
}
inline std::shared_ptr<RubyState>
ensure_full_state(std::shared_ptr<RubyState> state) {
state = ensure_state(state);
if (!state->full_state)
state->full_state = std::make_shared<RubyFullState>();
else if (!state->full_state.unique())
state->full_state = std::make_shared<RubyFullState>(*state->full_state);
return state;
}
bool identifier_start_char(char c) {
return !isascii(c) || isalpha(c) || c == '_'; return !isascii(c) || isalpha(c) || c == '_';
} }
bool identifier_char(char c) { return !isascii(c) || isalnum(c) || c == '_'; } inline static bool identifier_char(char c) {
return !isascii(c) || isalnum(c) || c == '_';
}
uint32_t get_next_word(const char *text, uint32_t i, uint32_t len) { inline static uint32_t get_next_word(const char *text, uint32_t i,
uint32_t len) {
if (i >= len || !identifier_start_char(text[i])) if (i >= len || !identifier_start_char(text[i]))
return 0; return 0;
uint32_t width = 1; uint32_t width = 1;
@@ -116,12 +102,12 @@ uint32_t get_next_word(const char *text, uint32_t i, uint32_t len) {
return width; return width;
} }
bool compare(const char *a, const char *b, size_t n) { bool ruby_state_match(std::shared_ptr<void> state_1,
size_t i = 0; std::shared_ptr<void> state_2) {
for (; i < n; ++i) if (!state_1 || !state_2)
if (a[i] != b[i]) return false;
return false; return *std::static_pointer_cast<RubyState>(state_1) ==
return true; *std::static_pointer_cast<RubyState>(state_2);
} }
std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens, std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
@@ -129,21 +115,20 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
const char *text, uint32_t len) { const char *text, uint32_t len) {
static bool keywords_trie_init = false; static bool keywords_trie_init = false;
static Trie base_keywords_trie; static Trie base_keywords_trie;
static Trie expecting_keywords_trie;
static Trie operator_keywords_trie; static Trie operator_keywords_trie;
static Trie expecting_operators_trie;
static Trie operator_trie; static Trie operator_trie;
if (!keywords_trie_init) { if (!keywords_trie_init) {
base_keywords_trie.build(base_keywords); base_keywords_trie.build(base_keywords);
expecting_keywords_trie.build(expecting_keywords);
operator_keywords_trie.build(operator_keywords); operator_keywords_trie.build(operator_keywords);
expecting_operators_trie.build(expecting_operators);
operator_trie.build(operators); operator_trie.build(operators);
keywords_trie_init = true; keywords_trie_init = true;
} }
tokens->clear(); tokens->clear();
if (!in_state) auto state = ensure_state(std::static_pointer_cast<RubyState>(in_state));
in_state = std::make_shared<RubyState>();
std::shared_ptr<RubyState> state =
std::static_pointer_cast<RubyState>(in_state);
if (!state->full_state)
state->full_state = std::make_shared<RubyFullState>();
uint32_t i = 0; uint32_t i = 0;
while (len > 0 && (text[len - 1] == '\n' || text[len - 1] == '\r' || while (len > 0 && (text[len - 1] == '\n' || text[len - 1] == '\r' ||
text[len - 1] == '\t' || text[len - 1] == ' ')) text[len - 1] == '\t' || text[len - 1] == ' '))
@@ -152,15 +137,12 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
return state; return state;
bool heredoc_first = false; bool heredoc_first = false;
while (i < len) { while (i < len) {
if (state->full_state->in_state == RubyFullState::END) { if (state->full_state->in_state == RubyFullState::END)
tokens->clear();
return state; return state;
}
if (state->full_state->in_state == RubyFullState::COMMENT) { if (state->full_state->in_state == RubyFullState::COMMENT) {
tokens->push_back({i, len, 1}); tokens->push_back({i, len, TokenKind::Comment});
if (i == 0 && len == 4 && text[i] == '=' && text[i + 1] == 'e' && if (i == 0 && len == 4 && text[i] == '=' && text[i + 1] == 'e' &&
text[i + 2] == 'n' && text[i + 3] == 'd') { text[i + 2] == 'n' && text[i + 3] == 'd') {
state = ensure_full_state(state);
state->full_state->in_state = RubyFullState::NONE; state->full_state->in_state = RubyFullState::NONE;
} }
return state; return state;
@@ -175,32 +157,32 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
if (len - start == state->heredocs.front().delim.length() && if (len - start == state->heredocs.front().delim.length() &&
compare(text + start, state->heredocs.front().delim.c_str(), compare(text + start, state->heredocs.front().delim.c_str(),
state->heredocs.front().delim.length())) { state->heredocs.front().delim.length())) {
state = ensure_full_state(state);
state->heredocs.pop_front(); state->heredocs.pop_front();
if (state->heredocs.empty()) if (state->heredocs.empty())
state->full_state->in_state = RubyFullState::NONE; state->full_state->in_state = RubyFullState::NONE;
tokens->push_back({i, len, 10}); tokens->push_back({i, len, TokenKind::Annotation});
return state; return state;
} }
} }
uint32_t start = i; uint32_t start = i;
if (!state->heredocs.front().allow_interpolation) { if (!state->heredocs.front().allow_interpolation) {
tokens->push_back({i, len, 2}); tokens->push_back({i, len, TokenKind::String});
return state; return state;
} else { } else {
while (i < len) { while (i < len) {
if (text[i] == '\\') { if (text[i] == '\\') {
// TODO: highlight the escape character tokens->push_back({start, i, TokenKind::String});
start = i;
i++; i++;
if (i < len) if (i < len)
i++; i++;
tokens->push_back({start, i, TokenKind::Escape});
continue; continue;
} }
if (text[i] == '#' && i + 1 < len && text[i + 1] == '{') { if (text[i] == '#' && i + 1 < len && text[i + 1] == '{') {
tokens->push_back({start, i, 2}); tokens->push_back({start, i, TokenKind::String});
tokens->push_back({i, i + 2, 10}); tokens->push_back({i, i + 2, TokenKind::Interpolation});
i += 2; i += 2;
state = ensure_state(state);
state->interp_stack.push(state->full_state); state->interp_stack.push(state->full_state);
state->full_state = std::make_shared<RubyFullState>(); state->full_state = std::make_shared<RubyFullState>();
state->interp_level = 1; state->interp_level = 1;
@@ -209,7 +191,7 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
} }
if (i == len) if (i == len)
tokens->push_back({start, len, 2}); tokens->push_back({start, len, TokenKind::String});
continue; continue;
} }
} }
@@ -217,19 +199,20 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
uint32_t start = i; uint32_t start = i;
while (i < len) { while (i < len) {
if (text[i] == '\\') { if (text[i] == '\\') {
// TODO: highlight the escape character - need to make priority work tokens->push_back({start, i, TokenKind::String});
// and this have higher start = i;
i++; i++;
if (i < len) if (i < len)
i++; i++;
tokens->push_back({start, i, TokenKind::Escape});
continue;
continue; continue;
} }
if (state->full_state->lit.allow_interp && text[i] == '#' && if (state->full_state->lit.allow_interp && text[i] == '#' &&
i + 1 < len && text[i + 1] == '{') { i + 1 < len && text[i + 1] == '{') {
tokens->push_back({start, i, 2}); tokens->push_back({start, i, TokenKind::String});
tokens->push_back({i, i + 2, 10}); tokens->push_back({i, i + 2, TokenKind::Interpolation});
i += 2; i += 2;
state = ensure_state(state);
state->interp_stack.push(state->full_state); state->interp_stack.push(state->full_state);
state->full_state = std::make_shared<RubyFullState>(); state->full_state = std::make_shared<RubyFullState>();
state->interp_level = 1; state->interp_level = 1;
@@ -238,23 +221,23 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
if (text[i] == state->full_state->lit.delim_start && if (text[i] == state->full_state->lit.delim_start &&
state->full_state->lit.delim_start != state->full_state->lit.delim_start !=
state->full_state->lit.delim_end) { state->full_state->lit.delim_end) {
state = ensure_full_state(state);
state->full_state->lit.brace_level++; state->full_state->lit.brace_level++;
} }
if (text[i] == state->full_state->lit.delim_end) { if (text[i] == state->full_state->lit.delim_end) {
state = ensure_full_state(state);
if (state->full_state->lit.delim_start == if (state->full_state->lit.delim_start ==
state->full_state->lit.delim_end) { state->full_state->lit.delim_end) {
i++; i++;
tokens->push_back({start, i, 2}); tokens->push_back({start, i, TokenKind::String});
state->full_state->in_state = RubyFullState::NONE; state->full_state->in_state = RubyFullState::NONE;
state->full_state->expecting_expr = false;
break; break;
} else { } else {
state->full_state->lit.brace_level--; state->full_state->lit.brace_level--;
if (state->full_state->lit.brace_level == 0) { if (state->full_state->lit.brace_level == 0) {
i++; i++;
tokens->push_back({start, i, 2}); tokens->push_back({start, i, TokenKind::String});
state->full_state->in_state = RubyFullState::NONE; state->full_state->in_state = RubyFullState::NONE;
state->full_state->expecting_expr = false;
break; break;
} }
} }
@@ -262,15 +245,67 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
} }
if (i == len) if (i == len)
tokens->push_back({start, len, 2}); tokens->push_back({start, len, TokenKind::String});
continue;
}
if (state->full_state->in_state == RubyFullState::REGEXP) {
uint32_t start = i;
while (i < len) {
if (text[i] == '\\') {
tokens->push_back({start, i, TokenKind::Regexp});
;
start = i;
i++;
if (i < len)
i++;
tokens->push_back({start, i, TokenKind::Escape});
continue;
}
if (text[i] == '#' && i + 1 < len && text[i + 1] == '{') {
tokens->push_back({start, i, TokenKind::Regexp});
tokens->push_back({i, i + 2, TokenKind::Interpolation});
i += 2;
state->interp_stack.push(state->full_state);
state->full_state = std::make_shared<RubyFullState>();
state->interp_level = 1;
break;
}
if (text[i] == state->full_state->lit.delim_start &&
state->full_state->lit.delim_start !=
state->full_state->lit.delim_end) {
state->full_state->lit.brace_level++;
}
if (text[i] == state->full_state->lit.delim_end) {
if (state->full_state->lit.delim_start ==
state->full_state->lit.delim_end) {
i += 1;
tokens->push_back({start, i, TokenKind::Regexp});
state->full_state->in_state = RubyFullState::NONE;
state->full_state->expecting_expr = false;
break;
} else {
state->full_state->lit.brace_level--;
if (state->full_state->lit.brace_level == 0) {
i += 1;
tokens->push_back({start, i, TokenKind::Regexp});
state->full_state->in_state = RubyFullState::NONE;
state->full_state->expecting_expr = false;
break;
}
}
}
i++;
}
if (i == len)
tokens->push_back({start, len, TokenKind::Regexp});
continue; continue;
} }
if (i == 0 && len == 6) { if (i == 0 && len == 6) {
if (text[i] == '=' && text[i + 1] == 'b' && text[i + 2] == 'e' && if (text[i] == '=' && text[i + 1] == 'b' && text[i + 2] == 'e' &&
text[i + 3] == 'g' && text[i + 4] == 'i' && text[i + 5] == 'n') { text[i + 3] == 'g' && text[i + 4] == 'i' && text[i + 5] == 'n') {
state = ensure_full_state(state);
state->full_state->in_state = RubyFullState::COMMENT; state->full_state->in_state = RubyFullState::COMMENT;
tokens->push_back({0, len, 1}); state->full_state->expecting_expr = false;
tokens->push_back({0, len, TokenKind::Comment});
return state; return state;
} }
} }
@@ -278,9 +313,9 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
if (text[i] == '_' && text[i + 1] == '_' && text[i + 2] == 'E' && if (text[i] == '_' && text[i + 1] == '_' && text[i + 2] == 'E' &&
text[i + 3] == 'N' && text[i + 4] == 'D' && text[i + 5] == '_' && text[i + 3] == 'N' && text[i + 4] == 'D' && text[i + 5] == '_' &&
text[i + 6] == '_') { text[i + 6] == '_') {
state = ensure_full_state(state);
tokens->clear(); tokens->clear();
state->full_state->in_state = RubyFullState::END; state->full_state->in_state = RubyFullState::END;
state->full_state->expecting_expr = false;
return state; return state;
} }
} }
@@ -291,7 +326,7 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
indented = true; indented = true;
if (text[j] == '~' || text[j] == '-') if (text[j] == '~' || text[j] == '-')
j++; j++;
tokens->push_back({i, j, 10}); tokens->push_back({i, j, TokenKind::Operator});
if (j >= len) if (j >= len)
continue; continue;
std::string delim; std::string delim;
@@ -304,12 +339,15 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
while (j < len && text[j] != q) while (j < len && text[j] != q)
delim += text[j++]; delim += text[j++];
} else { } else {
while (j < len && identifier_char(text[j])) if (j < len && identifier_start_char(text[j])) {
delim += text[j++]; delim += text[j++];
while (j < len && identifier_char(text[j]))
delim += text[j++];
}
} }
state->full_state->expecting_expr = false;
if (!delim.empty()) { if (!delim.empty()) {
tokens->push_back({s, j, 10}); tokens->push_back({s, j, TokenKind::Annotation});
state = ensure_full_state(state);
state->heredocs.push_back({delim, interpolation, indented}); state->heredocs.push_back({delim, interpolation, indented});
state->full_state->in_state = RubyFullState::HEREDOC; state->full_state->in_state = RubyFullState::HEREDOC;
heredoc_first = true; heredoc_first = true;
@@ -317,18 +355,47 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i = j; i = j;
continue; continue;
} }
if (text[i] == '#') { if (text[i] == '/' && state->full_state->expecting_expr) {
tokens->push_back({i, len, 1}); tokens->push_back({i, i + 1, TokenKind::Regexp});
state->full_state->in_state = RubyFullState::REGEXP;
state->full_state->expecting_expr = false;
state->full_state->lit.delim_start = '/';
state->full_state->lit.delim_end = '/';
state->full_state->lit.allow_interp = true;
i++;
continue;
} else if (text[i] == '#') {
tokens->push_back({i, len, TokenKind::Comment});
state->full_state->expecting_expr = false;
return state; return state;
} else if (text[i] == '.') {
uint32_t start = i;
i++;
if (i < len && text[i] == '.') {
i++;
if (i < len && text[i] == '.') {
i++;
}
}
tokens->push_back({start, i, TokenKind::Operator});
state->full_state->expecting_expr = false;
continue;
} else if (text[i] == ':') { } else if (text[i] == ':') {
state->full_state->expecting_expr = false;
uint32_t start = i; uint32_t start = i;
i++; i++;
if (i >= len) { if (i >= len) {
tokens->push_back({start, i, 3}); tokens->push_back({start, i, TokenKind::Operator});
state->full_state->expecting_expr = true;
continue;
}
if (text[i] == ':') {
i++;
continue; continue;
} }
if (text[i] == '\'' || text[i] == '"') { if (text[i] == '\'' || text[i] == '"') {
tokens->push_back({start, i, 6}); tokens->push_back({start, i, TokenKind::Operator});
state->full_state->expecting_expr = true;
continue; continue;
} }
if (text[i] == '$' || text[i] == '@') { if (text[i] == '$' || text[i] == '@') {
@@ -338,24 +405,25 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
while (i < len && identifier_char(text[i])) while (i < len && identifier_char(text[i]))
i++; i++;
tokens->push_back({start, i, 6}); tokens->push_back({start, i, TokenKind::Label});
continue; continue;
} }
uint32_t op_len = operator_trie.match(text, i, len, identifier_char); uint32_t op_len = operator_trie.match(text, i, len, identifier_char);
if (op_len > 0) { if (op_len > 0) {
tokens->push_back({start, i + op_len, 6}); tokens->push_back({start, i + op_len, TokenKind::Label});
i += op_len; i += op_len;
continue; continue;
} }
if (identifier_start_char(text[i])) { if (identifier_start_char(text[i])) {
uint32_t word_len = get_next_word(text, i, len); uint32_t word_len = get_next_word(text, i, len);
tokens->push_back({start, i + word_len, 6}); tokens->push_back({start, i + word_len, TokenKind::Label});
i += word_len; i += word_len;
continue; continue;
} }
tokens->push_back({start, i, 3}); tokens->push_back({start, i, TokenKind::Operator});
continue; continue;
} else if (text[i] == '@') { } else if (text[i] == '@') {
state->full_state->expecting_expr = false;
uint32_t start = i; uint32_t start = i;
i++; i++;
if (i >= len) if (i >= len)
@@ -368,9 +436,10 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
continue; continue;
while (i < len && identifier_char(text[i])) while (i < len && identifier_char(text[i]))
i++; i++;
tokens->push_back({start, i, 7}); tokens->push_back({start, i, TokenKind::VariableInstance});
continue; continue;
} else if (text[i] == '$') { } else if (text[i] == '$') {
state->full_state->expecting_expr = false;
uint32_t start = i; uint32_t start = i;
i++; i++;
if (i >= len) if (i >= len)
@@ -390,9 +459,10 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
} else { } else {
continue; continue;
} }
tokens->push_back({start, i, 8}); tokens->push_back({start, i, TokenKind::VariableGlobal});
continue; continue;
} else if (text[i] == '?') { } else if (text[i] == '?') {
state->full_state->expecting_expr = false;
uint32_t start = i; uint32_t start = i;
i++; i++;
if (i < len && text[i] == '\\') { if (i < len && text[i] == '\\') {
@@ -405,7 +475,7 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
continue; continue;
if (i < len && isxdigit(text[i])) if (i < len && isxdigit(text[i]))
i++; i++;
tokens->push_back({start, i, 7}); tokens->push_back({start, i, TokenKind::Char});
continue; continue;
} else if (i < len && text[i] == 'u') { } else if (i < len && text[i] == 'u') {
i++; i++;
@@ -425,42 +495,81 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
else else
continue; continue;
tokens->push_back({start, i, 7}); tokens->push_back({start, i, TokenKind::Char});
continue; continue;
} else if (i < len) { } else if (i < len) {
i++; i++;
tokens->push_back({start, i, 7}); tokens->push_back({start, i, TokenKind::Char});
continue; continue;
} }
} else if (i < len && text[i] != ' ') { } else if (i < len && text[i] != ' ') {
i++; i++;
tokens->push_back({start, i, 7}); tokens->push_back({start, i, TokenKind::Char});
continue; continue;
} else { } else {
tokens->push_back({start, i, 3}); state->full_state->expecting_expr = true;
tokens->push_back({start, i, TokenKind::Operator});
continue; continue;
} }
} else if (text[i] == '{') { } else if (text[i] == '{') {
tokens->push_back({i, i + 1, 3}); state->full_state->expecting_expr = true;
state = ensure_state(state); uint8_t brace_color =
(uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5);
tokens->push_back({i, i + 1, (TokenKind)brace_color});
state->interp_level++; state->interp_level++;
state->full_state->brace_level++;
i++; i++;
continue; continue;
} else if (text[i] == '}') { } else if (text[i] == '}') {
state = ensure_full_state(state); state->full_state->expecting_expr = false;
state->interp_level--; state->interp_level--;
if (state->interp_level == 0 && !state->interp_stack.empty()) { if (state->interp_level == 0 && !state->interp_stack.empty()) {
state->full_state = state->interp_stack.top(); state->full_state = state->interp_stack.top();
state->interp_stack.pop(); state->interp_stack.pop();
tokens->push_back({i, i + 1, 10}); tokens->push_back({i, i + 1, TokenKind::Interpolation});
} else { } else {
tokens->push_back({i, i + 1, 3}); state->full_state->brace_level--;
uint8_t brace_color =
(uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5);
tokens->push_back({i, i + 1, (TokenKind)brace_color});
} }
i++; i++;
continue; continue;
} else if (text[i] == '(') {
state->full_state->expecting_expr = true;
uint8_t brace_color =
(uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5);
tokens->push_back({i, i + 1, (TokenKind)brace_color});
state->full_state->brace_level++;
i++;
continue;
} else if (text[i] == ')') {
state->full_state->expecting_expr = false;
state->full_state->brace_level--;
uint8_t brace_color =
(uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5);
tokens->push_back({i, i + 1, (TokenKind)brace_color});
i++;
continue;
} else if (text[i] == '[') {
state->full_state->expecting_expr = true;
uint8_t brace_color =
(uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5);
tokens->push_back({i, i + 1, (TokenKind)brace_color});
state->full_state->brace_level++;
i++;
continue;
} else if (text[i] == ']') {
state->full_state->expecting_expr = false;
state->full_state->brace_level--;
uint8_t brace_color =
(uint8_t)TokenKind::Brace1 + (state->full_state->brace_level % 5);
tokens->push_back({i, i + 1, (TokenKind)brace_color});
i++;
continue;
} else if (text[i] == '\'') { } else if (text[i] == '\'') {
tokens->push_back({i, i + 1, 2}); state->full_state->expecting_expr = false;
state = ensure_full_state(state); tokens->push_back({i, i + 1, TokenKind::String});
state->full_state->in_state = RubyFullState::STRING; state->full_state->in_state = RubyFullState::STRING;
state->full_state->lit.delim_start = '\''; state->full_state->lit.delim_start = '\'';
state->full_state->lit.delim_end = '\''; state->full_state->lit.delim_end = '\'';
@@ -468,8 +577,8 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
continue; continue;
} else if (text[i] == '"') { } else if (text[i] == '"') {
tokens->push_back({i, i + 1, 2}); state->full_state->expecting_expr = false;
state = ensure_full_state(state); tokens->push_back({i, i + 1, TokenKind::String});
state->full_state->in_state = RubyFullState::STRING; state->full_state->in_state = RubyFullState::STRING;
state->full_state->lit.delim_start = '"'; state->full_state->lit.delim_start = '"';
state->full_state->lit.delim_end = '"'; state->full_state->lit.delim_end = '"';
@@ -477,8 +586,8 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
continue; continue;
} else if (text[i] == '`') { } else if (text[i] == '`') {
tokens->push_back({i, i + 1, 2}); state->full_state->expecting_expr = false;
state = ensure_full_state(state); tokens->push_back({i, i + 1, TokenKind::String});
state->full_state->in_state = RubyFullState::STRING; state->full_state->in_state = RubyFullState::STRING;
state->full_state->lit.delim_start = '`'; state->full_state->lit.delim_start = '`';
state->full_state->lit.delim_end = '`'; state->full_state->lit.delim_end = '`';
@@ -486,6 +595,7 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i++; i++;
continue; continue;
} else if (text[i] == '%') { } else if (text[i] == '%') {
state->full_state->expecting_expr = false;
if (i + 1 >= len) { if (i + 1 >= len) {
i++; i++;
continue; continue;
@@ -495,15 +605,24 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
char delim_end = '\0'; char delim_end = '\0';
bool allow_interp = true; bool allow_interp = true;
int prefix_len = 1; int prefix_len = 1;
bool is_regexp = false;
switch (type) { switch (type) {
case 'r':
is_regexp = true;
allow_interp = true;
prefix_len = 2;
break;
case 'Q': case 'Q':
case 'x': case 'x':
case 'I':
case 'W':
allow_interp = true; allow_interp = true;
prefix_len = 2; prefix_len = 2;
break; break;
case 'w': case 'w':
case 'q': case 'q':
case 'i': case 'i':
case 's':
allow_interp = false; allow_interp = false;
prefix_len = 2; prefix_len = 2;
break; break;
@@ -539,9 +658,10 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
delim_end = delim_start; delim_end = delim_start;
break; break;
} }
tokens->push_back({i, i + prefix_len + 1, 2}); tokens->push_back({i, i + prefix_len + 1,
state = ensure_full_state(state); (is_regexp ? TokenKind::Regexp : TokenKind::String)});
state->full_state->in_state = RubyFullState::STRING; state->full_state->in_state =
is_regexp ? RubyFullState::REGEXP : RubyFullState::STRING;
state->full_state->lit.delim_start = delim_start; state->full_state->lit.delim_start = delim_start;
state->full_state->lit.delim_end = delim_end; state->full_state->lit.delim_end = delim_end;
state->full_state->lit.allow_interp = allow_interp; state->full_state->lit.allow_interp = allow_interp;
@@ -549,6 +669,7 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i += prefix_len + 1; i += prefix_len + 1;
continue; continue;
} else if (isdigit(text[i])) { } else if (isdigit(text[i])) {
state->full_state->expecting_expr = false;
uint32_t start = i; uint32_t start = i;
if (text[i] == '0') { if (text[i] == '0') {
i++; i++;
@@ -646,85 +767,137 @@ std::shared_ptr<void> ruby_parse(std::vector<Token> *tokens,
i--; i--;
} }
} }
tokens->push_back({start, i, 9}); tokens->push_back({start, i, TokenKind::Number});
continue; continue;
} else if (identifier_start_char(text[i])) { } else if (identifier_start_char(text[i])) {
state->full_state->expecting_expr = false;
uint32_t length; uint32_t length;
if ((length = base_keywords_trie.match(text, i, len, identifier_char)) > if ((length = base_keywords_trie.match(text, i, len, identifier_char))) {
0) { tokens->push_back({i, i + length, TokenKind::Keyword});
tokens->push_back({i, i + length, 4}); i += length;
continue;
} else if ((length = expecting_keywords_trie.match(text, i, len,
identifier_char))) {
state->full_state->expecting_expr = true;
tokens->push_back({i, i + length, TokenKind::Keyword});
i += length; i += length;
continue; continue;
} else if ((length = operator_keywords_trie.match(text, i, len, } else if ((length = operator_keywords_trie.match(text, i, len,
identifier_char)) > 0) { identifier_char))) {
tokens->push_back({i, i + length, 5}); tokens->push_back({i, i + length, TokenKind::KeywordOperator});
i += length;
continue;
} else if ((length = expecting_operators_trie.match(
text, i, len, identifier_char)) > 0) {
state->full_state->expecting_expr = true;
tokens->push_back({i, i + length, TokenKind::KeywordOperator});
i += length; i += length;
continue; continue;
} else if (text[i] >= 'A' && text[i] <= 'Z') { } else if (text[i] >= 'A' && text[i] <= 'Z') {
uint32_t start = i; uint32_t start = i;
i += get_next_word(text, i, len); i += get_next_word(text, i, len);
tokens->push_back({start, i, 10}); tokens->push_back({start, i, TokenKind::Constant});
continue; continue;
} else { } else {
uint32_t start = i; uint32_t start = i;
if (i + 4 < len && text[i] == 't' && text[i + 1] == 'r' &&
text[i + 2] == 'u' && text[i + 3] == 'e') {
i += 4;
tokens->push_back({start, i, TokenKind::True});
continue;
}
if (i + 5 < len && text[i] == 'f' && text[i + 1] == 'a' &&
text[i + 2] == 'l' && text[i + 3] == 's' && text[i + 4] == 'e') {
i += 5;
tokens->push_back({start, i, TokenKind::False});
continue;
}
if (i + 3 < len && text[i] == 'd' && text[i + 1] == 'e' &&
text[i + 2] == 'f') {
i += 3;
tokens->push_back({start, i, TokenKind::Keyword});
while (i < len && (text[i] == ' ' || text[i] == '\t'))
i++;
while (i < len) {
if (identifier_start_char(text[i])) {
uint32_t width = get_next_word(text, i, len);
if (text[i] >= 'A' && text[i] <= 'Z')
tokens->push_back({i, i + width, TokenKind::Constant});
else if (width == 4 && (text[i] >= 's' && text[i + 1] == 'e' &&
text[i + 2] == 'l' && text[i + 3] == 'f'))
tokens->push_back({i, i + width, TokenKind::Keyword});
i += width;
if (i < len && text[i] == '.') {
i++;
continue;
}
tokens->push_back({i - width, i, TokenKind::Function});
break;
} else {
break;
}
}
continue;
}
while (i < len && identifier_char(text[i])) while (i < len && identifier_char(text[i]))
i++; i++;
if (i < len && text[i] == ':') { if (i < len && text[i] == ':') {
i++; i++;
tokens->push_back({start, i, 6}); tokens->push_back({start, i, TokenKind::Label});
continue; continue;
} else if (i < len && (text[i] == '!' || text[i] == '?')) { } else if (i < len && (text[i] == '!' || text[i] == '?')) {
i++; i++;
tokens->push_back({start, i, TokenKind::Function});
} else {
uint32_t tmp = i;
if (tmp < len && (text[tmp] == '(' || text[tmp] == '{')) {
tokens->push_back({start, i, TokenKind::Function});
continue;
} else if (tmp < len && (text[tmp] == ' ' || text[tmp] == '\t')) {
tmp++;
} else {
continue;
}
while (tmp < len && (text[tmp] == ' ' || text[tmp] == '\t'))
tmp++;
if (tmp >= len)
continue;
if (!isascii(text[tmp])) {
tokens->push_back({start, i, TokenKind::Function});
continue;
} else if (text[tmp] == '-' || text[tmp] == '&' || text[tmp] == '%' ||
text[tmp] == ':') {
if (tmp + 1 >= len ||
(text[tmp + 1] == ' ' || text[tmp + 1] == '>'))
continue;
} else if (text[tmp] == ']' || text[tmp] == '}' || text[tmp] == ')' ||
text[tmp] == ',' || text[tmp] == ';' || text[tmp] == '.' ||
text[tmp] == '+' || text[tmp] == '*' || text[tmp] == '/' ||
text[tmp] == '=' || text[tmp] == '?' || text[tmp] == '|' ||
text[tmp] == '^' || text[tmp] == '<' || text[tmp] == '>') {
continue;
}
tokens->push_back({start, i, TokenKind::Function});
} }
continue; continue;
} }
} else { } else {
uint32_t op_len; uint32_t op_len;
if ((op_len = operator_trie.match(text, i, len, if ((op_len =
[](char) { return false; })) > 0) { operator_trie.match(text, i, len, [](char) { return false; }))) {
tokens->push_back({i, i + op_len, 3}); tokens->push_back({i, i + op_len, TokenKind::Operator});
i += op_len; i += op_len;
state->full_state->expecting_expr = true;
continue;
} else {
i += utf8_codepoint_width(text[i]);
continue; continue;
} }
} }
i += utf8_codepoint_width(text[i]);
} }
return state; return state;
} }
bool ruby_state_match(std::shared_ptr<void> state_1, // TODO: Add trie's for builtins and highlight them separately liek (Array /
std::shared_ptr<void> state_2) { // self etc)
if (!state_1 || !state_2) // And in regex better highlighting of regex structures
return false;
return *std::static_pointer_cast<RubyState>(state_1) ==
*std::static_pointer_cast<RubyState>(state_2);
}
// function calls matched with alphanumeric names followed immediately by !
// or ? or `(` immediately or siwth space or are followed by a non-keyword
// or non-operator (some operators like - for negating and ! for not or {
// for block might be allowed?)
// a word following :: or . is matched as a property
// and any random word is matched as a variable name
// or as a class/module name if it starts with a capital letter
//
// regex are matched as text within / and / as long as
// the first / is not
// following a literal (int/float/string) or variable or brace close
// and is following a keyword or operator liek return /regex/ or x =
// /regex/ . so maybe add feild expecting_expr to state that is true right
// after keyword or some operators like = , =~ , `,` etc?
//
// (left to implement) -
//
// words - breaks up into these submatches
// - Constants that start with a capital letter
// - a word following :: or . is matched as a property
// - function call if ending with ! or ? or ( or are followed by a
// non-keyword or non-operator . ill figure it out
//
// regex (and distinguish between / for division and / for regex) and
// %r{} ones too
//
// Matching brace colors by brace depth
//

View File

@@ -1,5 +1,13 @@
#include "utils/utils.h" #include "utils/utils.h"
bool compare(const char *a, const char *b, size_t n) {
size_t i = 0;
for (; i < n; ++i)
if (a[i] != b[i])
return false;
return true;
}
std::string percent_decode(const std::string &s) { std::string percent_decode(const std::string &s) {
std::string out; std::string out;
out.reserve(s.size()); out.reserve(s.size());

82
themes/default.json Normal file
View File

@@ -0,0 +1,82 @@
{
"Default": {
"fg": "#EEEEEE"
},
"Comment": {
"fg": "#AAAAAA",
"italic": true
},
"String": {
"fg": "#AAD94C"
},
"Escape": {
"fg": "#7dcfff"
},
"Interpolation": {
"fg": "#7dcfff"
},
"Regexp": {
"fg": "#D2A6FF"
},
"Number": {
"fg": "#E6C08A"
},
"True": {
"fg": "#0FFF0F"
},
"False": {
"fg": "#FF0F0F"
},
"Char": {
"fg": "#FFAF70"
},
"Keyword": {
"fg": "#FF8F40"
},
"KeywordOperator": {
"fg": "#F07178"
},
"Operator": {
"fg": "#FFFFFF",
"italic": true
},
"Function": {
"fg": "#FFAF70"
},
"Type": {
"fg": "#F07178"
},
"Constant": {
"fg": "#7dcfff"
},
"VariableInstance": {
"fg": "#95E6CB"
},
"VariableGlobal": {
"fg": "#F07178"
},
"Annotation": {
"fg": "#7dcfff"
},
"Directive": {
"fg": "#FF8F40"
},
"Label": {
"fg": "#D2A6FF"
},
"Brace1": {
"fg": "#D2A6FF"
},
"Brace2": {
"fg": "#FFAFAF"
},
"Brace3": {
"fg": "#FFFF00"
},
"Brace4": {
"fg": "#0FFF0F"
},
"Brace5": {
"fg": "#FF0F0F"
}
}