mirror of
https://github.com/jmcnamara/libxlsxwriter.git
synced 2026-05-15 14:15:54 -06:00
[GH-ISSUE #508] Error or warn with invalid Unicode input #395
Labels
No labels
awaiting user feedback
bug
cmake
cmake
docs
feature request
in progress
long term
medium term
medium term
pull-request
question
question
ready to close
short term
under investigation
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/libxlsxwriter#395
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @billdenney on GitHub (Jan 16, 2026).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/508
I am converting a lot of files from an information source to xlsx files. Some of the source files have invalid Unicode in them including
\u0001and\ufffe. For my use case, I'm filtering out the bad characters when found. It may help others either not have my issue or find the issue more easily iflibxlsxwritergave an error or warning with invalid Unicode.Here is info on what is allowable: https://en.wikipedia.org/wiki/Valid_characters_in_XML
I understand that this is a data issue and not necessarily a
libxlsxwriterissue, so feel free to close this if not of interest.@jmcnamara commented on GitHub (Jan 21, 2026):
Hi Bill,
Thanks for the suggestion.
The \u0001 character should be escaped already: https://github.com/jmcnamara/libxlsxwriter/blob/main/src/xmlwriter.c#L253-L308
The Python library also escapes \uFFFE and \uFFFF but the C version doesn't: https://github.com/jmcnamara/XlsxWriter/blob/main/xlsxwriter/xmlwriter.py#L233
However, in general, I think this is out of scope for
libxlsxwriterapart from the 1-byte characters. There are probably better tools that a user can use to sanitize their input data.