the way sxpp's streaming lexer (tokenizer) works, UTF-16 and UTF-32 input streams are already supported.
since all controlling characters are well below 0x7f, and the lexer doesn't output strings, only token types and locations, you can just feed it 0xff clamped chars, and then offsets and positions are implicitly correct.
#devlog #sxpp #lsp