[GH-ISSUE #290] German "Umlauts" corrupt the file. #233

Closed
opened 2026-05-05 12:00:29 -06:00 by gitea-mirror · 6 comments
Owner

Originally created by @FrankenApps on GitHub (May 19, 2020).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/290

Originally assigned to: @jmcnamara on GitHub.

Hi, thanks for this great library.
Whenever I try to write german Umlauts to a cell, this results in a corrupted file, which can not be repaired by Excel. This is bad, because quite a few german names contain these letters, e.g.

Ä, Ö, Ü, ä, ö, ü and ß

A basic example would be

#include "xlsxwriter.h"
int main() {
    lxw_workbook  *workbook  = workbook_new("utf8.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
    worksheet_write_string(worksheet, 2, 1, "Krähe", NULL);
    worksheet_write_string(worksheet, 2, 1, "Kr\204he", NULL); //masked umlaut
    return workbook_close(workbook);
}

I already tried to encode my source file with UTF-8, but unfortunately had no luck so far.
When I insert the russian letters, from the utf8.c example I get an intact workbook.

I do not understand, why this problem arises, because all the .xml files inside the .xlsx file seem to be UTF-8 encoded and should therefore support these characters...

Originally created by @FrankenApps on GitHub (May 19, 2020). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/290 Originally assigned to: @jmcnamara on GitHub. Hi, thanks for this great library. Whenever I try to write german **Umlauts** to a cell, this results in a corrupted file, which can not be repaired by Excel. This is bad, because quite a few german names contain these letters, e.g. > Ä, Ö, Ü, ä, ö, ü and ß A basic example would be ``` #include "xlsxwriter.h" int main() { lxw_workbook *workbook = workbook_new("utf8.xlsx"); lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL); worksheet_write_string(worksheet, 2, 1, "Krähe", NULL); worksheet_write_string(worksheet, 2, 1, "Kr\204he", NULL); //masked umlaut return workbook_close(workbook); } ``` I already tried to encode my source file with UTF-8, but unfortunately had no luck so far. When I insert the russian letters, from the _utf8.c example_ I get an intact workbook. I do not understand, why this problem arises, because all the .xml files inside the .xlsx file seem to be UTF-8 encoded and should therefore support these characters...
gitea-mirror 2026-05-05 12:00:29 -06:00
  • closed this issue
  • added the
    question
    label
Author
Owner

@jmcnamara commented on GitHub (May 19, 2020):

@utelle Could you help with this question.

<!-- gh-comment-id:630815121 --> @jmcnamara commented on GitHub (May 19, 2020): @utelle Could you help with this question.
Author
Owner

@jmcnamara commented on GitHub (May 19, 2020):

BTW, the first example works for me on MacOS:

include "xlsxwriter.h"

int main() {
    lxw_workbook  *workbook  = workbook_new("utf8.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
    worksheet_write_string(worksheet, 2, 1, "Krähe", NULL);
    return workbook_close(workbook);
}

Output:

screenshot

<!-- gh-comment-id:630817914 --> @jmcnamara commented on GitHub (May 19, 2020): BTW, the first example works for me on MacOS: ```C include "xlsxwriter.h" int main() { lxw_workbook *workbook = workbook_new("utf8.xlsx"); lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL); worksheet_write_string(worksheet, 2, 1, "Krähe", NULL); return workbook_close(workbook); } ``` Output: ![screenshot](https://user-images.githubusercontent.com/94267/82332091-e14d0f00-99dc-11ea-8066-9cd9e4936cde.png)
Author
Owner

@FrankenApps commented on GitHub (May 19, 2020):

That is interesting, when I run this Code on Windows 10 64-bit

include "xlsxwriter.h"

int main() {
    lxw_workbook  *workbook  = workbook_new("utf8.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);
    worksheet_write_string(worksheet, 2, 1, "Krähe", NULL);
    return workbook_close(workbook);
} 

I get the attached file.
I will have to try this on iOS soon and then could give you an update.

BTW, this is where the problem occurs (in sharedStrings.xml):

<si><t>Kr�he</t></si>

utf8.xlsx

<!-- gh-comment-id:630835869 --> @FrankenApps commented on GitHub (May 19, 2020): That is interesting, when I run this Code on Windows 10 64-bit ``` include "xlsxwriter.h" int main() { lxw_workbook *workbook = workbook_new("utf8.xlsx"); lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL); worksheet_write_string(worksheet, 2, 1, "Krähe", NULL); return workbook_close(workbook); } ``` I get the attached file. I will have to try this on iOS soon and then could give you an update. BTW, this is where the problem occurs (in **sharedStrings.xml**): ``` <si><t>Kr�he</t></si> ``` [utf8.xlsx](https://github.com/jmcnamara/libxlsxwriter/files/4650901/utf8.xlsx)
Author
Owner

@utelle commented on GitHub (May 19, 2020):

The German umlaut ä in sharedStrings.xml is encoded in ISO 8859-1 resp Windows-1252. That is, it is definitely not encoded as UTF-8 - therefore Excel refuses to open the file correctly.

If you see the string encoding="UTF-8" in the XML files, this simply means that it is expected that strings are encoded in UTF-8, but it is your responsibility as the developer to make sure that the UTF-8 encoding is effectively used for strings.

Obviously, your source code editor does not use UTF-8 encoding, but Windows-1252 resp ISO 8859-1 encoding.

There are 2 possible approaches to overcome the problem:

  1. Make sure that your C source file is really encoded in UTF-8.
  2. Encode the German umlaut explicitly.

The latter would be either

worksheet_write_string(worksheet, 2, 1, "Kr\xc3\xa4he", NULL);

or

worksheet_write_string(worksheet, 2, 1, "Kr\u00e4he", NULL);

However, I'm not sure whether the second form really works for standard C strings. It could be that this form can be used only for wide strings (wchar_t).

If you enter the text strings containing German umlauts or other Unicode characters from other sources (user interface, files etc), you will have to make sure that those sources are UTF-8 encoded.

<!-- gh-comment-id:630879810 --> @utelle commented on GitHub (May 19, 2020): The German umlaut **ä** in **sharedStrings.xml** is encoded in _ISO 8859-1_ resp _Windows-1252_. That is, it is definitely **not** encoded as UTF-8 - therefore Excel refuses to open the file correctly. If you see the string **encoding="UTF-8"** in the XML files, this simply means that it is expected that strings are encoded in UTF-8, but it is your responsibility as the developer to make sure that the UTF-8 encoding is effectively used for strings. Obviously, your source code editor does not use _UTF-8_ encoding, but _Windows-1252_ resp _ISO 8859-1_ encoding. There are 2 possible approaches to overcome the problem: 1) Make sure that your C source file is really encoded in UTF-8. 2) Encode the German umlaut explicitly. The latter would be either ```c worksheet_write_string(worksheet, 2, 1, "Kr\xc3\xa4he", NULL); ``` or ```c worksheet_write_string(worksheet, 2, 1, "Kr\u00e4he", NULL); ``` However, I'm not sure whether the second form really works for standard C strings. It could be that this form can be used only for wide strings (`wchar_t`). If you enter the text strings containing German umlauts or other Unicode characters from other sources (user interface, files etc), you will have to make sure that those sources are UTF-8 encoded.
Author
Owner

@FrankenApps commented on GitHub (May 19, 2020):

@utelle Thanks a lot.
I figured it out, unfortunately the problem was on my side.
I saved the source file as Unicode (UTF-8 with signature) - Codepage 65001 from Visual Studio. However this does not seem to be the correct setting.
Saving the file with Unicode (UTF-8 without signature) - Codepage 65001 works as expected.
The method of using

worksheet_write_string(worksheet, 2, 1, "Kr\xc3\xa4he", NULL);

works for me, too.

<!-- gh-comment-id:630894654 --> @FrankenApps commented on GitHub (May 19, 2020): @utelle Thanks a lot. I figured it out, unfortunately the problem was on my side. I saved the source file as **Unicode (UTF-8 with signature) - Codepage 65001** from Visual Studio. However this does not seem to be the correct setting. Saving the file with **Unicode (UTF-8 without signature) - Codepage 65001** works as expected. The method of using ``` worksheet_write_string(worksheet, 2, 1, "Kr\xc3\xa4he", NULL); ``` works for me, too.
Author
Owner

@jmcnamara commented on GitHub (May 19, 2020):

Thanks @utelle

<!-- gh-comment-id:631103019 --> @jmcnamara commented on GitHub (May 19, 2020): Thanks @utelle
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#233
No description provided.