[PR #103] [CLOSED] A function to detect strings with incomplete (split) UTF8 bytes at the end. #153

Closed
opened 2026-05-05 03:41:40 -06:00 by gitea-mirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ultimatepp/ultimatepp/pull/103
Author: @ismail-yilmaz
Created: 10/1/2022
Status: Closed

Base: masterHead: check_utf8_split


📝 Commits (10+)

  • e9b1e00 Core/CheckUtf8() function added.
  • f362fd6 autotest/IncompeteUTf8test added.
  • 9e961de Merge branch 'ultimatepp:master' into check_utf8_split
  • 37d0b2c Merge branch 'ultimatepp:master' into check_utf8_split
  • 8356086 Merge branch 'ultimatepp:master' into check_utf8_split
  • ee265bb Merge branch 'ultimatepp:master' into check_utf8_split
  • 3a43b06 Merge branch 'ultimatepp:master' into check_utf8_split
  • fa80ba9 Merge branch 'ultimatepp:master' into check_utf8_split
  • dde586a Merge branch 'ultimatepp:master' into check_utf8_split
  • 5c3a5b0 Merge branch 'ultimatepp:master' into check_utf8_split

📊 Changes

5 files changed (+66 additions, -0 deletions)

View changed files

autotest/IncompleteUtf8Test/IncompleteUtf8Test.cpp (+25 -0)
autotest/IncompleteUtf8Test/IncompleteUtf8Test.upp (+9 -0)
📝 uppsrc/Core/CharSet.h (+4 -0)
📝 uppsrc/Core/Utf.cpp (+11 -0)
📝 uppsrc/Core/src.tpp/Utf_en-us.tpp (+17 -0)

📄 Description

This patch adds a CheckUtf8Split() function (and its overloads) to Core/Utf8 helper functions, it also adds an autotest & Topic++ entry.

What it does:

  • Checks for incomplete (potentially split) UTF-8 byte sequences at the end of any given string buffer.
  • Returns 0, if no such incomplete UTF-8 sequence is encountered. Otherwise, it returns the position of the first byte of the incomplete sequence, relative to the end of the buffer. E.g., the function will return 2, if it encounters a three-byte sequence with a single byte missing.

What it doesn't:

  • Traverses the whole buffer to check for invalid sequences (CheckUtf8 already does that).

The main purpose of this function is to allow developers to work better on environments where incoming data chunks are not always aligned with the variable length of UTF-8 sequences.

Please review.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ultimatepp/ultimatepp/pull/103 **Author:** [@ismail-yilmaz](https://github.com/ismail-yilmaz) **Created:** 10/1/2022 **Status:** ❌ Closed **Base:** `master` ← **Head:** `check_utf8_split` --- ### 📝 Commits (10+) - [`e9b1e00`](https://github.com/ultimatepp/ultimatepp/commit/e9b1e00e2eef036021188dc299ef07af17b6aa6a) Core/CheckUtf8() function added. - [`f362fd6`](https://github.com/ultimatepp/ultimatepp/commit/f362fd6a9c5a1db634125929d1b0b6be6cb6c581) autotest/IncompeteUTf8test added. - [`9e961de`](https://github.com/ultimatepp/ultimatepp/commit/9e961de0d7816a30c2b2a30da3902193a75e0de7) Merge branch 'ultimatepp:master' into check_utf8_split - [`37d0b2c`](https://github.com/ultimatepp/ultimatepp/commit/37d0b2c224a7610c7d8113e782ff6bebceb240bb) Merge branch 'ultimatepp:master' into check_utf8_split - [`8356086`](https://github.com/ultimatepp/ultimatepp/commit/8356086aebf82337b1a75c6cf2f92c0458eefe10) Merge branch 'ultimatepp:master' into check_utf8_split - [`ee265bb`](https://github.com/ultimatepp/ultimatepp/commit/ee265bbfae989ce7916db2a8cdf8df0a332775bc) Merge branch 'ultimatepp:master' into check_utf8_split - [`3a43b06`](https://github.com/ultimatepp/ultimatepp/commit/3a43b06ce085114eb7477815dd4a6e3c5e10ed80) Merge branch 'ultimatepp:master' into check_utf8_split - [`fa80ba9`](https://github.com/ultimatepp/ultimatepp/commit/fa80ba96f26873b3984e8da386350cfffe6966f3) Merge branch 'ultimatepp:master' into check_utf8_split - [`dde586a`](https://github.com/ultimatepp/ultimatepp/commit/dde586a57c76efb1749f1122476f84095f6d49d0) Merge branch 'ultimatepp:master' into check_utf8_split - [`5c3a5b0`](https://github.com/ultimatepp/ultimatepp/commit/5c3a5b00352c216333e21e10d074e79eb5aa90f8) Merge branch 'ultimatepp:master' into check_utf8_split ### 📊 Changes **5 files changed** (+66 additions, -0 deletions) <details> <summary>View changed files</summary> ➕ `autotest/IncompleteUtf8Test/IncompleteUtf8Test.cpp` (+25 -0) ➕ `autotest/IncompleteUtf8Test/IncompleteUtf8Test.upp` (+9 -0) 📝 `uppsrc/Core/CharSet.h` (+4 -0) 📝 `uppsrc/Core/Utf.cpp` (+11 -0) 📝 `uppsrc/Core/src.tpp/Utf_en-us.tpp` (+17 -0) </details> ### 📄 Description This patch adds a CheckUtf8Split() function (and its overloads) to Core/Utf8 helper functions, it also adds an autotest & Topic++ entry. What it does: - Checks for incomplete (potentially split) UTF-8 byte sequences *at the end of* any given string buffer. - Returns 0, if no such incomplete UTF-8 sequence is encountered. Otherwise, it returns the position of the first byte of the incomplete sequence, relative to the end of the buffer. E.g., the function will return 2, if it encounters a three-byte sequence with a single byte missing. What it doesn't: - Traverses the whole buffer to check for invalid sequences (CheckUtf8 already does that). The main purpose of this function is to allow developers to work better on environments where incoming data chunks are not always aligned with the variable length of UTF-8 sequences. Please review. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
gitea-mirror 2026-05-05 03:41:40 -06:00
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/ultimatepp#153
No description provided.