[GH-ISSUE #508] Error or warn with invalid Unicode input #395

Closed
opened 2026-05-05 12:14:37 -06:00 by gitea-mirror · 1 comment
Owner

Originally created by @billdenney on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/508

I am converting a lot of files from an information source to xlsx files. Some of the source files have invalid Unicode in them including \u0001 and \ufffe. For my use case, I'm filtering out the bad characters when found. It may help others either not have my issue or find the issue more easily if libxlsxwriter gave an error or warning with invalid Unicode.

Here is info on what is allowable: https://en.wikipedia.org/wiki/Valid_characters_in_XML

I understand that this is a data issue and not necessarily a libxlsxwriter issue, so feel free to close this if not of interest.

Originally created by @billdenney on GitHub (Jan 16, 2026). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/508 I am converting a lot of files from an information source to xlsx files. Some of the source files have invalid Unicode in them including `\u0001` and `\ufffe`. For my use case, I'm filtering out the bad characters when found. It may help others either not have my issue or find the issue more easily if `libxlsxwriter` gave an error or warning with invalid Unicode. Here is info on what is allowable: https://en.wikipedia.org/wiki/Valid_characters_in_XML I understand that this is a data issue and not necessarily a `libxlsxwriter` issue, so feel free to close this if not of interest.
Author
Owner

@jmcnamara commented on GitHub (Jan 21, 2026):

Hi Bill,

Thanks for the suggestion.

The \u0001 character should be escaped already: https://github.com/jmcnamara/libxlsxwriter/blob/main/src/xmlwriter.c#L253-L308

The Python library also escapes \uFFFE and \uFFFF but the C version doesn't: https://github.com/jmcnamara/XlsxWriter/blob/main/xlsxwriter/xmlwriter.py#L233

However, in general, I think this is out of scope for libxlsxwriter apart from the 1-byte characters. There are probably better tools that a user can use to sanitize their input data.

<!-- gh-comment-id:3779519438 --> @jmcnamara commented on GitHub (Jan 21, 2026): Hi Bill, Thanks for the suggestion. The \u0001 character should be escaped already: https://github.com/jmcnamara/libxlsxwriter/blob/main/src/xmlwriter.c#L253-L308 The Python library also escapes \uFFFE and \uFFFF but the C version doesn't: https://github.com/jmcnamara/XlsxWriter/blob/main/xlsxwriter/xmlwriter.py#L233 However, in general, I think this is out of scope for `libxlsxwriter` apart from the 1-byte characters. There are probably better tools that a user can use to sanitize their input data.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#395
No description provided.