Initial Commit

This commit is contained in:
2025-08-30 16:07:19 +01:00
commit d86c15e30c
169 changed files with 121377 additions and 0 deletions

View File

@@ -0,0 +1,94 @@
.Dd 2022-10-06
.Dt GRAPHEME_NEXT_CHARACTER_BREAK_UTF8 3
.Os suckless.org
.Sh NAME
.Nm grapheme_next_character_break_utf8
.Nd determine byte-offset to next grapheme cluster break
.Sh SYNOPSIS
.In grapheme.h
.Ft size_t
.Fn grapheme_next_character_break_utf8 "const char *str" "size_t len"
.Sh DESCRIPTION
The
.Fn grapheme_next_character_break_utf8
function computes the offset (in bytes) to the next grapheme cluster
break (see
.Xr libgrapheme 7 )
in the UTF-8-encoded string
.Va str
of length
.Va len .
If a grapheme cluster begins at
.Va str
this offset is equal to the length of said grapheme cluster.
.Pp
If
.Va len
is set to
.Dv SIZE_MAX
(stdint.h is already included by grapheme.h) the string
.Va str
is interpreted to be NUL-terminated and processing stops when
a NUL-byte is encountered.
.Pp
For non-UTF-8 input
data
.Xr grapheme_is_character_break 3 and
.Xr grapheme_next_character_break 3
can be used instead.
.Sh RETURN VALUES
The
.Fn grapheme_next_character_break_utf8
function returns the offset (in bytes) to the next grapheme cluster
break in
.Va str
or 0 if
.Va str
is
.Dv NULL .
.Sh EXAMPLES
.Bd -literal
/* cc (-static) -o example example.c -lgrapheme */
#include <grapheme.h>
#include <stdint.h>
#include <stdio.h>
int
main(void)
{
/* UTF-8 encoded input */
char *s = "T\\xC3\\xABst \\xF0\\x9F\\x91\\xA8\\xE2\\x80\\x8D\\xF0"
"\\x9F\\x91\\xA9\\xE2\\x80\\x8D\\xF0\\x9F\\x91\\xA6 \\xF0"
"\\x9F\\x87\\xBA\\xF0\\x9F\\x87\\xB8 \\xE0\\xA4\\xA8\\xE0"
"\\xA5\\x80 \\xE0\\xAE\\xA8\\xE0\\xAE\\xBF!";
size_t ret, len, off;
printf("Input: \\"%s\\"\\n", s);
/* print each grapheme cluster with byte-length */
printf("grapheme clusters in NUL-delimited input:\\n");
for (off = 0; s[off] != '\\0'; off += ret) {
ret = grapheme_next_character_break_utf8(s + off, SIZE_MAX);
printf("%2zu bytes | %.*s\\n", ret, (int)ret, s + off);
}
printf("\\n");
/* do the same, but this time string is length-delimited */
len = 17;
printf("grapheme clusters in input delimited to %zu bytes:\\n", len);
for (off = 0; off < len; off += ret) {
ret = grapheme_next_character_break_utf8(s + off, len - off);
printf("%2zu bytes | %.*s\\n", ret, (int)ret, s + off);
}
return 0;
}
.Ed
.Sh SEE ALSO
.Xr grapheme_next_character_break 3 ,
.Xr libgrapheme 7
.Sh STANDARDS
.Fn grapheme_next_character_break_utf8
is compliant with the Unicode 15.0.0 specification.
.Sh AUTHORS
.An Laslo Hunhold Aq Mt dev@frign.de