[GH-ISSUE #238] Corrupted Excel files after inserting image with filename containing non-latin characters #190

Closed
opened 2026-05-05 11:52:06 -06:00 by gitea-mirror · 39 comments
Owner

Originally created by @RaFaeL-NN on GitHub (Jul 19, 2019).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/238

Originally assigned to: @jmcnamara on GitHub.

If I have an image with non-latin (russian) filename ("лого.png"), I'm using worksheet_insert_image(worksheet, 10, 15, "лого.png"). The library is using under Windows. I have to send a filename without UTF encoding, otherwise I got an error 9. I get "no error" with creation the xlsx file without encoding image filename, but the file is corrupted on opening in Excel (drawing1.xml). I think, it is because of copying original filename to tag "descr" without encoding to UTF-8 (may be there is an another problem, really I don't know). Corrupted file attached
Report.xlsx.

Originally created by @RaFaeL-NN on GitHub (Jul 19, 2019). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/238 Originally assigned to: @jmcnamara on GitHub. If I have an image with non-latin (russian) filename ("лого.png"), I'm using worksheet_insert_image(worksheet, 10, 15, "лого.png"). The library is using under Windows. I have to send a filename without UTF encoding, otherwise I got an error 9. I get "no error" with creation the xlsx file without encoding image filename, but the file is corrupted on opening in Excel (drawing1.xml). I think, it is because of copying original filename to tag "descr" without encoding to UTF-8 (may be there is an another problem, really I don't know). Corrupted file attached [Report.xlsx](https://github.com/jmcnamara/libxlsxwriter/files/3410683/Report.xlsx).
gitea-mirror 2026-05-05 11:52:06 -06:00
Author
Owner

@jmcnamara commented on GitHub (Jul 19, 2019):

Can you add a small working example that demonstrates the issue with the sample image.

<!-- gh-comment-id:513184695 --> @jmcnamara commented on GitHub (Jul 19, 2019): Can you add a small working example that demonstrates the issue with the sample image.
Author
Owner

@RaFaeL-NN commented on GitHub (Jul 19, 2019):

OK
test.zip

<!-- gh-comment-id:513229405 --> @RaFaeL-NN commented on GitHub (Jul 19, 2019): OK [test.zip](https://github.com/jmcnamara/libxlsxwriter/files/3411251/test.zip)
Author
Owner

@jmcnamara commented on GitHub (Jul 19, 2019):

Seems to be a Windows only issue. It works fine on Linux/Mac OS.

I'll look into a fix.

<!-- gh-comment-id:513348012 --> @jmcnamara commented on GitHub (Jul 19, 2019): Seems to be a Windows only issue. It works fine on Linux/Mac OS. I'll look into a fix.
Author
Owner

@m00k12 commented on GitHub (Aug 9, 2019):

I'm seeing something very similar, writing from Linux.

Passing worksheet_write_string() a string that contains an invalid ASCII character causes Excel to fail to load the entire worksheet (LibreOffice dropped just the affected column of the worksheet). The example of this was a string containing a byte 0xf6, commonly interpreted as 'o diaeresis' or an 'o' with an umlaut where I am.

The fix may be to check that the passed string is valid UTF-8 or pure ASCII and return an lxw_error code and omit the cell so we get a chance of an error at generation time, rather than when loading into Excel.

For reference Excel did produce a warning pointing to the invalid character in the input file, it was at column 549511 of a line over a million chars long and took a hex editor to find!

Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.
Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Illegal xml character. Line 2, column 549511.

Great work though!

<!-- gh-comment-id:519884208 --> @m00k12 commented on GitHub (Aug 9, 2019): I'm seeing something very similar, writing from Linux. Passing <code>worksheet_write_string()</code> a string that contains an invalid ASCII character causes Excel to fail to load the entire worksheet (LibreOffice dropped just the affected column of the worksheet). The example of this was a string containing a byte 0xf6, commonly interpreted as 'o diaeresis' or an 'o' with an umlaut where I am. The fix may be to check that the passed string is valid UTF-8 or pure ASCII and return an <code>lxw_error</code> code and omit the cell so we get a chance of an error at generation time, rather than when loading into Excel. For reference Excel did produce a warning pointing to the invalid character in the input file, it was at column 549511 of a line over a million chars long and took a hex editor to find! >Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded. >Replaced Part: /xl/worksheets/sheet1.xml part with XML error. Illegal xml character. Line 2, column 549511. Great work though!
Author
Owner

@jmcnamara commented on GitHub (Aug 9, 2019):

@m00k12

I'm seeing something very similar, writing from Linux.

Technically it isn't quite the same on Linux.

In general the filename and any strings the you use in libxlsxwriter (or Excel) must be UTF-8.

The issue reported, as far as I can see, relates to reading a Unicode .png file with the zlib interfaces on Windows. I still haven't had a chance to validate it in MSVC but I tried it on Linux and it works as expected.

The fix may be to check that the passed string is valid UTF-8 or pure ASCII and return an lxw_error code and omit the cell so we get a chance of an error at generation time, rather than when loading into Excel.

I think, for now, the user will just have to ensure that they are passing valid UTF-8 strings to the library.

<!-- gh-comment-id:519897423 --> @jmcnamara commented on GitHub (Aug 9, 2019): @m00k12 > I'm seeing something very similar, writing from Linux. Technically it isn't quite the same on Linux. In general the filename and any strings the you use in libxlsxwriter (or Excel) must be UTF-8. The issue reported, as far as I can see, relates to reading a Unicode .png file with the zlib interfaces on Windows. I still haven't had a chance to validate it in MSVC but I tried it on Linux and it works as expected. > The fix may be to check that the passed string is valid UTF-8 or pure ASCII and return an `lxw_error` code and omit the cell so we get a chance of an error at generation time, rather than when loading into Excel. I think, for now, the user will just have to ensure that they are passing valid UTF-8 strings to the library.
Author
Owner

@Alexhuszagh commented on GitHub (Sep 16, 2019):

@jmcnamara, @m00k12 It would be fairly easy to provide a check that a C-string is valid UTF-8.

A comprehensive implementation would be (this should work in C89 as long as the compiler has stdint.h (non-conforming, but very common) and the header has uint8_t and uintptr_t, both of which should be likely conditions. It should work in C99 or later as long as stdint.h has both those types.

#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <string.h>

/**
 * Magic number to determine the number of continuation bytes required.
 *  A UTF-8 code point is either valid ASCII (highest bit is 0),
 *  or it comprises a start + continuation bytes.
 */
static const uint8_t UTF8_BYTES[256] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5};

/* Get the distance between two pointers. */
__inline size_t distance(uint8_t const* first, uint8_t const* last)
{
    uintptr_t dist = (uintptr_t)last - (uintptr_t) first;
    return (size_t)dist;
}

/* Determine if byte is a valid continuation byte. */
__inline bool is_continuation_byte(uint8_t c)
{
    return c >= 0x80 && c<= 0xbf;
}

/* Determine if byte is a valid start byte. */
__inline bool is_start_byte(uint8_t c)
{
    return c > 0xbf;
}

/* Determine if the byte is valid. */
__inline bool is_valid_byte(uint8_t byte, bool is_continuation)
{
    if (is_continuation) {
        return is_continuation_byte(byte);
    }
    return byte < 0x80 || is_start_byte(byte);
}

/**
 * Decode an entire codepoint and increment first.
 *  :param first: Pointer to a pointer the first character in the sequence.
 *  :param last: Pointer to 1-past last character. Normally the null terminator.
 */
bool is_valid_codepoint(uint8_t const* * first, uint8_t const* last)
{
    assert(first != NULL);
    assert(last != NULL);

    uint8_t const* ptr = *first;
    assert(ptr != NULL);

    /* Get the expected number of UTF-8 continuation bytes in the code point. */
    uint8_t byte_count = UTF8_BYTES[*ptr];

    /**
     * If the required number of bytes would overrun our buffer,
     * exit early. We need to calculate the distance between our ptr
     * and last, and see if the byte_count >= that distance, since
     * incrementing ptr may go outside of the array, and could therefore
     * be optimized out.
     * See: ARR36-C in the CERT C Coding Standard.
     */
    size_t dist = distance(ptr, last);
    if (byte_count >= dist) {
        return false;
    }

    /**
     * This may seem to skip a check, since this loop doesn't care if the
     * first byte is a start or ascii byte. However, this is already
     * checked by the `UTF8_BYTES[*ptr]`, so the initial check is redundant
     * either way.
     *
     * NOTE: Fallthrough is **intentional**.
     */
    bool is_continuation = false;
    switch (byte_count) {
        case 5:
            /**
             * The original UTF-8 specification allowed for 6-byte code points.
             * The maximum now is 4.
             */
            return false;
        case 4:
            /**
             * The original UTF-8 specification allowed for 6-byte code points.
             * The maximum now is 4.
             */
            return false;
        case 3:
            if (!is_valid_byte(*ptr++, is_continuation))
                return false;
            is_continuation = true;
        case 2:
            if (!is_valid_byte(*ptr++, is_continuation))
                return false;
            is_continuation = true;
        case 1:
            if (!is_valid_byte(*ptr++, is_continuation))
                return false;
            is_continuation = true;
        case 0:
            if (!is_valid_byte(*ptr++, is_continuation))
                return false;
    }

    /* Increment our in-out variable. */
    *first = ptr;

    return true;
}

/**
 * Check if a UTF-8 sequence is valid.
 *  :param first: Pointer to the first character in the sequence.
 *  :param last: Pointer to 1-past last character. Normally the null terminator.
 */
bool is_valid_utf8(uint8_t const* first, uint8_t const* last)
{
    assert(first != NULL);
    assert(last != NULL);

    while (first != last) {
        if (!is_valid_codepoint(&first, last)) {
            return false;
        }
    }
    return true;
}

/**
 * Check if a UTF-8 C-string is valid.
 */
bool is_valid_utf8_cstring(char const* cstring)
{
    uint8_t const* first = (uint8_t const*) cstring;
    size_t length = strlen(cstring);
    uint8_t const* last = first + length;
    return is_valid_utf8(first, last);
}

/* REMOVE AFTER HERE */
/* --------------------------------- */
/**
 * Various tests to ensure the UTF-8 validator works.
 * Note these tests assume `char` is a signed data type, which is not standard confirming.
 * The rest of the code, however, should be standard conforming.
 */

#include <stdio.h>

int main()
{
    /* Valid ASCII */
    const char data1[] = {72, 97, 110, 103, 117, 108, 0};
    printf("ASCII is valid: %d\n", is_valid_utf8_cstring(data1));

    /* Valid UTF-8 (Hangul) */
    const char data2[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, 0};
    printf("UTF-8 is valid: %d\n", is_valid_utf8_cstring(data2));

    /* Invalid Code Point */
    /* -19 (237 as uint8_t) should be followed by 2 continuation bytes. */
    /* Here, we only have 1 continuation byte. */
    const char data3[] = {-19, -107, 72, -19, -107, -100, -22, -75, -83, -20, -106, -76, 0};
    printf("Invalid code point is valid: %d\n", is_valid_utf8_cstring(data3));

    /* Invalid Continuation Byte */
    /* -107 (149 as uint8_t) is a continuation bytes without any start byte. */
    const char data4[] = {-107, 72, -19, -107, -100, -22, -75, -83, -20, -106, -76, 0};
    printf("Invalid continuation byte is valid: %d\n", is_valid_utf8_cstring(data4));

    /* Invalid trailing start byte */
    const char data5[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, -19, 0};
    printf("Invalid trailing start byte is valid: %d\n", is_valid_utf8_cstring(data5));

    /* Invalid trailing start byte + continuation byte */
    const char data6[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, -19, -107, 0};
    printf("Invalid trailing start byte + continuation byte is valid: %d\n", is_valid_utf8_cstring(data6));

    /* Invalid trailing continuation byte */
    const char data7[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, -107, 0};
    printf("Invalid trailing continuation byte is valid: %d\n", is_valid_utf8_cstring(data7));

    return 0;
}

When running this, I get the following results:

ASCII is valid: 1
UTF-8 is valid: 1
Invalid code point is valid: 0
Invalid continuation byte is valid: 0
Invalid trailing start byte is valid: 0
Invalid trailing start byte + continuation byte is valid: 0
Invalid trailing continuation byte is valid: 0
<!-- gh-comment-id:531847927 --> @Alexhuszagh commented on GitHub (Sep 16, 2019): @jmcnamara, @m00k12 It would be fairly easy to provide a check that a C-string is valid UTF-8. A comprehensive implementation would be (this should work in C89 as long as the compiler has `stdint.h` (non-conforming, but very common) and the header has `uint8_t` and `uintptr_t`, both of which should be likely conditions. It should work in C99 or later as long as `stdint.h` has both those types. ```c #include <assert.h> #include <stdbool.h> #include <stddef.h> #include <stdint.h> #include <string.h> /** * Magic number to determine the number of continuation bytes required. * A UTF-8 code point is either valid ASCII (highest bit is 0), * or it comprises a start + continuation bytes. */ static const uint8_t UTF8_BYTES[256] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5}; /* Get the distance between two pointers. */ __inline size_t distance(uint8_t const* first, uint8_t const* last) { uintptr_t dist = (uintptr_t)last - (uintptr_t) first; return (size_t)dist; } /* Determine if byte is a valid continuation byte. */ __inline bool is_continuation_byte(uint8_t c) { return c >= 0x80 && c<= 0xbf; } /* Determine if byte is a valid start byte. */ __inline bool is_start_byte(uint8_t c) { return c > 0xbf; } /* Determine if the byte is valid. */ __inline bool is_valid_byte(uint8_t byte, bool is_continuation) { if (is_continuation) { return is_continuation_byte(byte); } return byte < 0x80 || is_start_byte(byte); } /** * Decode an entire codepoint and increment first. * :param first: Pointer to a pointer the first character in the sequence. * :param last: Pointer to 1-past last character. Normally the null terminator. */ bool is_valid_codepoint(uint8_t const* * first, uint8_t const* last) { assert(first != NULL); assert(last != NULL); uint8_t const* ptr = *first; assert(ptr != NULL); /* Get the expected number of UTF-8 continuation bytes in the code point. */ uint8_t byte_count = UTF8_BYTES[*ptr]; /** * If the required number of bytes would overrun our buffer, * exit early. We need to calculate the distance between our ptr * and last, and see if the byte_count >= that distance, since * incrementing ptr may go outside of the array, and could therefore * be optimized out. * See: ARR36-C in the CERT C Coding Standard. */ size_t dist = distance(ptr, last); if (byte_count >= dist) { return false; } /** * This may seem to skip a check, since this loop doesn't care if the * first byte is a start or ascii byte. However, this is already * checked by the `UTF8_BYTES[*ptr]`, so the initial check is redundant * either way. * * NOTE: Fallthrough is **intentional**. */ bool is_continuation = false; switch (byte_count) { case 5: /** * The original UTF-8 specification allowed for 6-byte code points. * The maximum now is 4. */ return false; case 4: /** * The original UTF-8 specification allowed for 6-byte code points. * The maximum now is 4. */ return false; case 3: if (!is_valid_byte(*ptr++, is_continuation)) return false; is_continuation = true; case 2: if (!is_valid_byte(*ptr++, is_continuation)) return false; is_continuation = true; case 1: if (!is_valid_byte(*ptr++, is_continuation)) return false; is_continuation = true; case 0: if (!is_valid_byte(*ptr++, is_continuation)) return false; } /* Increment our in-out variable. */ *first = ptr; return true; } /** * Check if a UTF-8 sequence is valid. * :param first: Pointer to the first character in the sequence. * :param last: Pointer to 1-past last character. Normally the null terminator. */ bool is_valid_utf8(uint8_t const* first, uint8_t const* last) { assert(first != NULL); assert(last != NULL); while (first != last) { if (!is_valid_codepoint(&first, last)) { return false; } } return true; } /** * Check if a UTF-8 C-string is valid. */ bool is_valid_utf8_cstring(char const* cstring) { uint8_t const* first = (uint8_t const*) cstring; size_t length = strlen(cstring); uint8_t const* last = first + length; return is_valid_utf8(first, last); } /* REMOVE AFTER HERE */ /* --------------------------------- */ /** * Various tests to ensure the UTF-8 validator works. * Note these tests assume `char` is a signed data type, which is not standard confirming. * The rest of the code, however, should be standard conforming. */ #include <stdio.h> int main() { /* Valid ASCII */ const char data1[] = {72, 97, 110, 103, 117, 108, 0}; printf("ASCII is valid: %d\n", is_valid_utf8_cstring(data1)); /* Valid UTF-8 (Hangul) */ const char data2[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, 0}; printf("UTF-8 is valid: %d\n", is_valid_utf8_cstring(data2)); /* Invalid Code Point */ /* -19 (237 as uint8_t) should be followed by 2 continuation bytes. */ /* Here, we only have 1 continuation byte. */ const char data3[] = {-19, -107, 72, -19, -107, -100, -22, -75, -83, -20, -106, -76, 0}; printf("Invalid code point is valid: %d\n", is_valid_utf8_cstring(data3)); /* Invalid Continuation Byte */ /* -107 (149 as uint8_t) is a continuation bytes without any start byte. */ const char data4[] = {-107, 72, -19, -107, -100, -22, -75, -83, -20, -106, -76, 0}; printf("Invalid continuation byte is valid: %d\n", is_valid_utf8_cstring(data4)); /* Invalid trailing start byte */ const char data5[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, -19, 0}; printf("Invalid trailing start byte is valid: %d\n", is_valid_utf8_cstring(data5)); /* Invalid trailing start byte + continuation byte */ const char data6[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, -19, -107, 0}; printf("Invalid trailing start byte + continuation byte is valid: %d\n", is_valid_utf8_cstring(data6)); /* Invalid trailing continuation byte */ const char data7[] = {-19, -107, -100, -22, -75, -83, -20, -106, -76, -107, 0}; printf("Invalid trailing continuation byte is valid: %d\n", is_valid_utf8_cstring(data7)); return 0; } ``` When running this, I get the following results: ```bash ASCII is valid: 1 UTF-8 is valid: 1 Invalid code point is valid: 0 Invalid continuation byte is valid: 0 Invalid trailing start byte is valid: 0 Invalid trailing start byte + continuation byte is valid: 0 Invalid trailing continuation byte is valid: 0 ```
Author
Owner

@Alexhuszagh commented on GitHub (Sep 16, 2019):

If the UTF-8 validator has any interest, I'd gladly submit a PR with it included. However, I believe this only addresses the second issue raised, and not the original bug on MSVC (which I will look into later).

<!-- gh-comment-id:531853199 --> @Alexhuszagh commented on GitHub (Sep 16, 2019): If the UTF-8 validator has any interest, I'd gladly submit a PR with it included. However, I believe this only addresses the second issue raised, and not the original bug on MSVC (which I will look into later).
Author
Owner

@jmcnamara commented on GitHub (Sep 16, 2019):

If the UTF-8 validator has any interest, I'd gladly submit a PR with it included.

@Alexhuszagh Thanks for the input, but I don't think that is a feature that I want to add/maintain.

<!-- gh-comment-id:531856014 --> @jmcnamara commented on GitHub (Sep 16, 2019): > If the UTF-8 validator has any interest, I'd gladly submit a PR with it included. @Alexhuszagh Thanks for the input, but I don't think that is a feature that I want to add/maintain.
Author
Owner

@Alexhuszagh commented on GitHub (Sep 16, 2019):

@jmcnamara No worries, just figured I'd make the suggestion since I've previously implemented this code.

<!-- gh-comment-id:531859884 --> @Alexhuszagh commented on GitHub (Sep 16, 2019): @jmcnamara No worries, just figured I'd make the suggestion since I've previously implemented this code.
Author
Owner

@m00k12 commented on GitHub (Sep 16, 2019):

I used libiconv to clean all strings before output.

<!-- gh-comment-id:531880816 --> @m00k12 commented on GitHub (Sep 16, 2019): I used libiconv to clean all strings before output.
Author
Owner

@jmcnamara commented on GitHub (Oct 6, 2019):

I don't work on Windows very often so I keep failing to get to this to debug it. Maybe someone else on the thread can have a look.

A test program would look like this:

#include "xlsxwriter.h"

int main() {

    lxw_workbook  *workbook  = workbook_new("test.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);

    worksheet_insert_image(worksheet, 0, 0, "лого.png");

    workbook_close(workbook);

    return 0;
}

You can use any png image. The source file should be UTF-8 encoded.

There are two potential issue.

The first is that libxlsxwriter needs to open and parse the image added with worksheet_insert_image() to read the image metadata such as height, width and DPI. The 'fopen()' on the filename to read the image metadata probably won't work for a file with an non-ASCII name:

https://github.com/jmcnamara/libxlsxwriter/blob/master/src/worksheet.c#L5546


    /* Check that the image file exists and can be opened. */
    image_stream = fopen(filename, "rb");
    if (!image_stream) {
        LXW_WARN_FORMAT1("worksheet_insert_image()/_opt(): "
                         "file doesn't exist or can't be opened: %s.",
                         filename);
        return LXW_ERROR_PARAMETER_VALIDATION;
    }

The second, potential issue, is when the file is added to the zip file:

https://github.com/jmcnamara/libxlsxwriter/blob/master/src/packager.c#L245

If someone who works on Windows more regularly, and knows how these types of filename handling works, can take a look that would be great.

+ @utelle

<!-- gh-comment-id:538739072 --> @jmcnamara commented on GitHub (Oct 6, 2019): I don't work on Windows very often so I keep failing to get to this to debug it. Maybe someone else on the thread can have a look. A test program would look like this: ```C #include "xlsxwriter.h" int main() { lxw_workbook *workbook = workbook_new("test.xlsx"); lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL); worksheet_insert_image(worksheet, 0, 0, "лого.png"); workbook_close(workbook); return 0; } ``` You can use any png image. The source file should be UTF-8 encoded. There are two potential issue. The first is that libxlsxwriter needs to open and parse the image added with `worksheet_insert_image()` to read the image metadata such as height, width and DPI. The 'fopen()' on the filename to read the image metadata probably won't work for a file with an non-ASCII name: https://github.com/jmcnamara/libxlsxwriter/blob/master/src/worksheet.c#L5546 ```C /* Check that the image file exists and can be opened. */ image_stream = fopen(filename, "rb"); if (!image_stream) { LXW_WARN_FORMAT1("worksheet_insert_image()/_opt(): " "file doesn't exist or can't be opened: %s.", filename); return LXW_ERROR_PARAMETER_VALIDATION; } ``` The second, potential issue, is when the file is added to the zip file: https://github.com/jmcnamara/libxlsxwriter/blob/master/src/packager.c#L245 If someone who works on Windows more regularly, and knows how these types of filename handling works, can take a look that would be great. \+ @utelle
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 6, 2019):

fopen() in Windows works only with ANSI filenames.

The fopen function opens the file that is specified by filename. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP)

Another solution is in converting ANSI filename with MultiByteToWideChar(0,... and WideCharToMultiByte(65001,... before putting it in XML, not before fopen()

Third variant is an option for non-copying filename to "descr" or an optional parameter "descr" in worksheet_insert_image() (with filename by default, if you like this way)

<!-- gh-comment-id:538745195 --> @RaFaeL-NN commented on GitHub (Oct 6, 2019): fopen() in Windows works only with ANSI filenames. _The fopen function opens the file that is specified by filename. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP)_ Another solution is in converting ANSI filename with MultiByteToWideChar(0,... and WideCharToMultiByte(65001,... before putting it in XML, not before fopen() Third variant is an option for non-copying filename to "descr" or an optional parameter "descr" in worksheet_insert_image() (with filename by default, if you like this way)
Author
Owner

@jmcnamara commented on GitHub (Oct 6, 2019):

I've pushed a fix to the win_fopen branch that uses _wfopen() on Windows (203c22e973614ccce6a51b2f1f9d030d37af4363).

I build that branch into a library and compile the above code against it. It worked as expected with a sample лого.png image file:

aa_image

Note, the source file must be UTF-8 encoded.

Can you try it out and let me know.

<!-- gh-comment-id:538784725 --> @jmcnamara commented on GitHub (Oct 6, 2019): I've pushed a fix to the `win_fopen` branch that uses `_wfopen()` on Windows (203c22e973614ccce6a51b2f1f9d030d37af4363). I build that branch into a library and compile the above code against it. It worked as expected with a sample лого.png image file: ![aa_image](https://user-images.githubusercontent.com/94267/66275367-cd0a3680-e87f-11e9-9a24-21894069ccdc.png) Note, the source file must be UTF-8 encoded. Can you try it out and let me know.
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 6, 2019):

No difference. I am using MinGW, not MSVC. Is a patch applicable for it?

<!-- gh-comment-id:538791029 --> @RaFaeL-NN commented on GitHub (Oct 6, 2019): No difference. I am using MinGW, not MSVC. Is a patch applicable for it?
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 6, 2019):

P.S. It works with comments #ifdef _MSC_VER line and some lines at bottom, and replacing _MAX_PATH with 260

<!-- gh-comment-id:538793045 --> @RaFaeL-NN commented on GitHub (Oct 6, 2019): P.S. It works with comments #ifdef _MSC_VER line and some lines at bottom, and replacing _MAX_PATH with 260
Author
Owner

@jmcnamara commented on GitHub (Oct 6, 2019):

Ok. Good.

Can you try replace _MSC_VER like this, and test it:

diff --git a/src/utility.c b/src/utility.c
index 4a63745..661ea16 100644
--- a/src/utility.c
+++ b/src/utility.c
@@ -602,7 +602,11 @@ lxw_hash_password(const char *password)
 }

 /* Make a simple portable version of fopen() for Windows. */
-#ifdef _MSC_VER
+#ifdef __MINGW32__
+#undef _WIN32
+#endif
+
+#ifdef _WIN32

 #include <windows.h>
<!-- gh-comment-id:538793310 --> @jmcnamara commented on GitHub (Oct 6, 2019): Ok. Good. Can you try replace _MSC_VER like this, and test it: ```diff diff --git a/src/utility.c b/src/utility.c index 4a63745..661ea16 100644 --- a/src/utility.c +++ b/src/utility.c @@ -602,7 +602,11 @@ lxw_hash_password(const char *password) } /* Make a simple portable version of fopen() for Windows. */ -#ifdef _MSC_VER +#ifdef __MINGW32__ +#undef _WIN32 +#endif + +#ifdef _WIN32 #include <windows.h> ```
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 6, 2019):

Don't work. If I undef _WIN32, how can it works?

It works with replacing "#ifdef _MSC_VER" with "#ifdef _WIN32" and _MAX_PATH with 260

<!-- gh-comment-id:538794133 --> @RaFaeL-NN commented on GitHub (Oct 6, 2019): Don't work. If I undef _WIN32, how can it works? It works with replacing "#ifdef _MSC_VER" with "#ifdef _WIN32" and _MAX_PATH with 260
Author
Owner

@jmcnamara commented on GitHub (Oct 6, 2019):

What version of MinGW are you using? 32 or 64bit and can you send a link to the one you download/use?

<!-- gh-comment-id:538794884 --> @jmcnamara commented on GitHub (Oct 6, 2019): What version of MinGW are you using? 32 or 64bit and can you send a link to the one you download/use?
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 6, 2019):

32bit. Really I don't know version, because I am using it just for making dll of libxlsxwriter and never updating it for years after installing ~3 years ago. You can try the latest one from mingw.org

P.S. For now, the minimal working variant is

/* Make a simple portable version of fopen() for Windows. */
#ifdef _WIN32

#ifndef _MAX_PATH
#define _MAX_PATH MAX_PATH
#endif

#include <windows.h>

FILE *
lxw_fopen(const char *filename, const char *mode)
<!-- gh-comment-id:538796151 --> @RaFaeL-NN commented on GitHub (Oct 6, 2019): 32bit. Really I don't know version, because I am using it just for making dll of libxlsxwriter and never updating it for years after installing ~3 years ago. You can try the latest one from mingw.org P.S. For now, the minimal working variant is ``` /* Make a simple portable version of fopen() for Windows. */ #ifdef _WIN32 #ifndef _MAX_PATH #define _MAX_PATH MAX_PATH #endif #include <windows.h> FILE * lxw_fopen(const char *filename, const char *mode) ```
Author
Owner

@jmcnamara commented on GitHub (Oct 7, 2019):

I understand that this fix works in your case but I'm not convinced that it is a general issue (outside of MSVC).

The fopen() function should work in MinGW32 without modification. Here is an example:


$ uname -a
MINGW32_NT-10.0-WOW john 2.6.0(0.304/5/3) 2016-09-07 21:23 i686 Msys

$ cd ~/libxlsxwriter

$ git checkout master

$ git pull origin
...

$ make examples V=1
make -C examples
...

# Add the worksheet_insert_image() line to an existing utf8 example.
$ vim examples/utf8.c

# Here is the resultant file:
$ cat examples/utf8.c
/*
 * A simple Unicode UTF-8 example using libxlsxwriter.
 *
 * Note: The source file must be UTF-8 encoded.
 *
 * Copyright 2014-2018, John McNamara, jmcnamara@cpan.org
 *
 */

#include "xlsxwriter.h"

int main() {

    lxw_workbook  *workbook  = workbook_new("utf8.xlsx");
    lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);

    worksheet_write_string(worksheet, 2, 1, "Это фраза на русском!", NULL);
    worksheet_insert_image(worksheet, 0, 0, "лого.png");


    return workbook_close(workbook);
}

# Copy a suitable image.
$ cp test/functional/src/images/red.png лого.png

# Re-make the examples.
$ make examples V=1
make -C examples
make[1]: Entering directory '/home/jmcnam2/libxlsxwriter/examples'
cc -I../include -g -Wall -Wextra utf8.c -o utf8 ../src/libxlsxwriter.a -lz
make[1]: Leaving directory '/home/jmcnam2/libxlsxwriter/examples'

# Run the example. Note, there are no errors/warnings.
$ ./examples/utf8.exe

# Note the output file.
$ ls -ltr
total 137
drwxr-xr-x 1 jmcnam2 Domain Users     0 Oct 18  2017 dev
...
-rw-r--r-- 1 jmcnam2 Domain Users   200 Oct  7 09:28 лого.png
drwxr-xr-x 1 jmcnam2 Domain Users     0 Oct  7 09:28 examples
-rw-r--r-- 1 jmcnam2 Domain Users  6854 Oct  7 09:29 utf8.xlsx

Output:

aa_image

Can you try reproduce in a more recent version of MinGW using the example above and the more general libxlsxwriter MinGW instructions here: https://libxlsxwriter.github.io/getting_started.html#gsg_ming

<!-- gh-comment-id:538917537 --> @jmcnamara commented on GitHub (Oct 7, 2019): I understand that this fix works in your case but I'm not convinced that it is a general issue (outside of MSVC). The `fopen()` function should work in MinGW32 without modification. Here is an example: ```bash $ uname -a MINGW32_NT-10.0-WOW john 2.6.0(0.304/5/3) 2016-09-07 21:23 i686 Msys $ cd ~/libxlsxwriter $ git checkout master $ git pull origin ... $ make examples V=1 make -C examples ... # Add the worksheet_insert_image() line to an existing utf8 example. $ vim examples/utf8.c # Here is the resultant file: $ cat examples/utf8.c /* * A simple Unicode UTF-8 example using libxlsxwriter. * * Note: The source file must be UTF-8 encoded. * * Copyright 2014-2018, John McNamara, jmcnamara@cpan.org * */ #include "xlsxwriter.h" int main() { lxw_workbook *workbook = workbook_new("utf8.xlsx"); lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL); worksheet_write_string(worksheet, 2, 1, "Это фраза на русском!", NULL); worksheet_insert_image(worksheet, 0, 0, "лого.png"); return workbook_close(workbook); } # Copy a suitable image. $ cp test/functional/src/images/red.png лого.png # Re-make the examples. $ make examples V=1 make -C examples make[1]: Entering directory '/home/jmcnam2/libxlsxwriter/examples' cc -I../include -g -Wall -Wextra utf8.c -o utf8 ../src/libxlsxwriter.a -lz make[1]: Leaving directory '/home/jmcnam2/libxlsxwriter/examples' # Run the example. Note, there are no errors/warnings. $ ./examples/utf8.exe # Note the output file. $ ls -ltr total 137 drwxr-xr-x 1 jmcnam2 Domain Users 0 Oct 18 2017 dev ... -rw-r--r-- 1 jmcnam2 Domain Users 200 Oct 7 09:28 лого.png drwxr-xr-x 1 jmcnam2 Domain Users 0 Oct 7 09:28 examples -rw-r--r-- 1 jmcnam2 Domain Users 6854 Oct 7 09:29 utf8.xlsx ``` Output: ![aa_image](https://user-images.githubusercontent.com/94267/66299698-f496e880-e8eb-11e9-8e31-60035ffe0020.png) Can you try reproduce in a more recent version of MinGW using the example above and the more general libxlsxwriter MinGW instructions here: https://libxlsxwriter.github.io/getting_started.html#gsg_ming
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 7, 2019):

The fopen() function should work in MinGW32 without modification.

No. A program, compiled in MinGW depends on msvcrt.dll from Windows. fopen() in msvcrt.dll use ANSI only

<!-- gh-comment-id:538931232 --> @RaFaeL-NN commented on GitHub (Oct 7, 2019): > The `fopen()` function should work in MinGW32 without modification. No. A program, compiled in MinGW depends on **msvcrt.dll from Windows**. fopen() in msvcrt.dll use ANSI only
Author
Owner

@jmcnamara commented on GitHub (Oct 7, 2019):

No. A program, compiled in MinGW depends on msvcrt.dll from Windows. fopen() in msvcrt.dll use ANSI only

I don't understand. Why does the MinGW example I show above work with fopen and a non-ansii filename?

<!-- gh-comment-id:538950501 --> @jmcnamara commented on GitHub (Oct 7, 2019): > No. A program, compiled in MinGW depends on **msvcrt.dll from Windows**. fopen() in msvcrt.dll use ANSI only I don't understand. Why does the MinGW example I show above work with fopen and a non-ansii filename?
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 7, 2019):

What version of Windows and what version of msvcrt.dll? Do you understand where fopen() is stored?

lib1
lib2
lib3

<!-- gh-comment-id:538952669 --> @RaFaeL-NN commented on GitHub (Oct 7, 2019): What version of Windows and what version of msvcrt.dll? Do you understand where fopen() is stored? ![lib1](https://user-images.githubusercontent.com/34895749/66306956-b8738000-e90b-11e9-8af4-d560e026a229.PNG) ![lib2](https://user-images.githubusercontent.com/34895749/66306957-b8738000-e90b-11e9-8ddf-e01409a0d5cd.PNG) ![lib3](https://user-images.githubusercontent.com/34895749/66306958-b8738000-e90b-11e9-95a4-0e038835897d.PNG)
Author
Owner

@jmcnamara commented on GitHub (Oct 7, 2019):

What version of Windows and what version of msvcrt.dll?

Windows 10 and msvcrt.dll ver 7.0.16299.

<!-- gh-comment-id:538955438 --> @jmcnamara commented on GitHub (Oct 7, 2019): > What version of Windows and what version of msvcrt.dll? Windows 10 and msvcrt.dll ver 7.0.16299.
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 7, 2019):

Can you attach an archive with utf8.exe and лого.png?

<!-- gh-comment-id:538957587 --> @RaFaeL-NN commented on GitHub (Oct 7, 2019): Can you attach an archive with utf8.exe and лого.png?
Author
Owner

@jmcnamara commented on GitHub (Oct 7, 2019):

I can't attach them but you can rebuild them by following my instructions above, I used MSYS2 and the following setup:

pacman -S git gcc make zlib-devel

# Clone and build libxlsxwriter.
git clone https://github.com/jmcnamara/libxlsxwriter.git
cd libxlsxwriter/
make
<!-- gh-comment-id:538979406 --> @jmcnamara commented on GitHub (Oct 7, 2019): I can't attach them but you can rebuild them by following my instructions above, I used [MSYS2](http://www.msys2.org/) and the following setup: ```bash pacman -S git gcc make zlib-devel # Clone and build libxlsxwriter. git clone https://github.com/jmcnamara/libxlsxwriter.git cd libxlsxwriter/ make ```
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 7, 2019):

I have to check what your filename is in ANSI on disk (EB EE E3 EE 2E 70 6E 67), not in UTF-8 (D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67). How can you rename it to Win-1251 without russian locale?

<!-- gh-comment-id:538979873 --> @RaFaeL-NN commented on GitHub (Oct 7, 2019): I have to check what your filename is in ANSI **on disk** (EB EE E3 EE 2E 70 6E 67), not in UTF-8 (D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67). How can you rename it to Win-1251 without russian locale?
Author
Owner

@jmcnamara commented on GitHub (Oct 8, 2019):

I have to check what your filename is in ANSI on disk (EB EE E3 EE 2E 70 6E 67), not in UTF-8 (D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67). How can you rename it to Win-1251 without russian locale?

It would be better if you built the environment with the instructions I gave above and tried it yourself.

Alternatively, you could give detailed steps on how I can replicate your environment.

Anyway, the filename is UTF-8 encoded (it has to be to match the ut8.c example encoding):

$ ls -1 *.png
лого.png


$ ls -1 *.png | xxd
00000000: d0bb d0be d0b3 d0be 2e70 6e67 0a         .........png.
<!-- gh-comment-id:539459026 --> @jmcnamara commented on GitHub (Oct 8, 2019): > I have to check what your filename is in ANSI **on disk** (EB EE E3 EE 2E 70 6E 67), not in UTF-8 (D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67). How can you rename it to Win-1251 without russian locale? It would be better if you built the environment with the instructions I gave above and tried it yourself. Alternatively, you could give detailed steps on how I can replicate your environment. Anyway, the filename is UTF-8 encoded (it has to be to match the ut8.c example encoding): ```bash $ ls -1 *.png лого.png $ ls -1 *.png | xxd 00000000: d0bb d0be d0b3 d0be 2e70 6e67 0a .........png. ```
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 8, 2019):

Anyway, the filename is UTF-8 encoded

Well. But the original issue is with filename in ANSI (EB EE E3 EE 2E 70 6E 67) (read first post). In Windows filenames are stored in ANSI, not in UTF-8. If filename is D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67 it will NOT be displaying at Windows Explorer (and all other programs) as "лого.png", but as "лого.png" (not readable). If I have file "лого.png" (D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67) of course, I can insert it by lib. But I have not, and nobody have. So, I need a way to insert image with filename "лого.png" in ANSI (EB EE E3 EE 2E 70 6E 67) and to have a correct xlsx on output. You can test it with file from test.zip in third post (you have to unzip it under Windows)

https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as

<!-- gh-comment-id:539542377 --> @RaFaeL-NN commented on GitHub (Oct 8, 2019): > Anyway, the filename is UTF-8 encoded Well. But the **original issue is with filename in ANSI** (EB EE E3 EE 2E 70 6E 67) (read first post). In Windows filenames are stored in ANSI, not in UTF-8. If filename is D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67 it will NOT be displaying at Windows Explorer (and all other programs) as "лого.png", but as "лого.png" (not readable). If I have file "лого.png" (D0 BB D0 BE D0 B3 D0 BE 2E 70 6E 67) of course, I can insert it by lib. But I have not, and nobody have. So, I need a way to insert image with filename "лого.png" in ANSI (EB EE E3 EE 2E 70 6E 67) and to have a correct xlsx on output. You can test it with file from test.zip in third post (you have to unzip it under Windows) https://stackoverflow.com/questions/2050973/what-encoding-are-filenames-in-ntfs-stored-as
Author
Owner

@Alexhuszagh commented on GitHub (Oct 8, 2019):

I used libiconv to clean all strings before output.

Iconv is great, the main issue is the licensing which is obviously not compatible here. Also, not worth a dependency, I'm guessing.

<!-- gh-comment-id:539549841 --> @Alexhuszagh commented on GitHub (Oct 8, 2019): > I used libiconv to clean all strings before output. Iconv is great, the main issue is the licensing which is obviously not compatible here. Also, not worth a dependency, I'm guessing.
Author
Owner

@Alexhuszagh commented on GitHub (Oct 8, 2019):

@jmcnamara Windows still uses various macros that may depend on your compiler configuration to determine whether fopen uses narrow or wide paths.

For higher-level APIs, this may be controlled via the UNICODE (and _UNICODE) macros:
https://docs.microsoft.com/en-us/windows/win32/learnwin32/working-with-strings#unicode-and-ansi-functions

For our use-case, this is controlled via parameters to fopen:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=vs-2019#remarks

The fopen function opens the file that is specified by filename. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP). In Windows Desktop applications this can be changed to the OEM codepage (CP_OEMCP) by using the SetFileApisToOEM function. You can use the AreFileApisANSI function to determine whether filename is interpreted using the ANSI or the system default OEM codepage. _wfopen is a wide-character version of fopen; the arguments to _wfopen are wide-character strings. Otherwise, _wfopen and fopen behave identically. Just using _wfopen does not affect the coded character set that is used in the file stream.

I would recommend both of you ensure that you are working on the correct set of configurations, but likely, we should our file utilities work with UTF-8-encoded paths (which will almost certainly require conversion to UTF-16). For example, on WINE, using the following source code, I get 1 for code compiled by both i686 and x86_64 MinGW, meaning it is using the ANSI encoding for filenames (which can be highly-variable depending on locale):

#include <windows.h>
#include <stdio.h>

int main() {
    printf("AreFileApisANSI: %d\n", AreFileApisANSI());
}
<!-- gh-comment-id:539555323 --> @Alexhuszagh commented on GitHub (Oct 8, 2019): @jmcnamara Windows still uses various macros that may depend on your compiler configuration to determine whether `fopen` uses narrow or wide paths. For higher-level APIs, this may be controlled via the `UNICODE` (and `_UNICODE`) macros: https://docs.microsoft.com/en-us/windows/win32/learnwin32/working-with-strings#unicode-and-ansi-functions For our use-case, this is controlled via parameters to `fopen`: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=vs-2019#remarks > The fopen function opens the file that is specified by filename. By default, a narrow filename string is interpreted using the ANSI codepage (CP_ACP). In Windows Desktop applications this can be changed to the OEM codepage (CP_OEMCP) by using the SetFileApisToOEM function. You can use the AreFileApisANSI function to determine whether filename is interpreted using the ANSI or the system default OEM codepage. _wfopen is a wide-character version of fopen; the arguments to _wfopen are wide-character strings. Otherwise, _wfopen and fopen behave identically. Just using _wfopen does not affect the coded character set that is used in the file stream. I would recommend both of you ensure that you are working on the correct set of configurations, but likely, we should our file utilities work with UTF-8-encoded paths (which will almost certainly require conversion to UTF-16). For example, on WINE, using the following source code, I get `1` for code compiled by both i686 and x86_64 MinGW, meaning it is using the ANSI encoding for filenames (which can be highly-variable depending on locale): ```c #include <windows.h> #include <stdio.h> int main() { printf("AreFileApisANSI: %d\n", AreFileApisANSI()); } ```
Author
Owner

@Alexhuszagh commented on GitHub (Oct 8, 2019):

A simple conversion utility (and an example showing it works) is as follows (please note this example must be compiled with a UTF-8 source encoding, which for MSVC is provided with the "/utf-8" flag, on MinGW, I believe this is the default):

#include <windows.h>
#include <stdlib.h>


wchar_t* utf8_to_utf16(const char* string)
{
    // We need to call MultiByteToWideChar twice:
    //  Once to get the length of the output string, to avoid overflows.
    //  The second time to actually convert it.
    int count = MultiByteToWideChar(CP_UTF8, 0, string, -1, NULL, 0);
    if (count == 0) {
        // Did not succeed, return null pointer.
        return NULL;
    }
    wchar_t* wstr = malloc(sizeof(wchar_t) * count);
    if (MultiByteToWideChar(CP_UTF8, 0, string, -1, wstr, count) == 0) {
        // Did not succeed, free and return null pointer.
        free(wstr);
        return NULL;
    }
    return wstr;
}


int main()
{
    // Sample Korean text, just force a Unicode file.
    const char* narrow = "path/to/한글.xlsx";
    wchar_t* wide = utf8_to_utf16(narrow);

    // Print as a string.
    wprintf(L"Printing as a string: %ls\n", wide);

    // Print bytes, to confirm it's UTF-16.
    wprintf(L"Printing as an array: [");
    size_t length = wcslen(wide);
    for (size_t i = 0; i < length; i++) {
        char* item = (char*)&wide[i];
        wprintf(L"%d, %d, ", item[0], item[1]);
    }
    wprintf(L"]\n");
    free(wide);

    return 0;
}

This should be sufficient to fix the issue, along with any NULL-checks to ensure the path isn't NULL (which may happen due to invalid input, or memory errors) while writing to file.

<!-- gh-comment-id:539566023 --> @Alexhuszagh commented on GitHub (Oct 8, 2019): A simple conversion utility (and an example showing it works) is as follows (please note this example must be compiled with a UTF-8 source encoding, which for MSVC is provided with the "/utf-8" flag, on MinGW, I believe this is the default): ```c #include <windows.h> #include <stdlib.h> wchar_t* utf8_to_utf16(const char* string) { // We need to call MultiByteToWideChar twice: // Once to get the length of the output string, to avoid overflows. // The second time to actually convert it. int count = MultiByteToWideChar(CP_UTF8, 0, string, -1, NULL, 0); if (count == 0) { // Did not succeed, return null pointer. return NULL; } wchar_t* wstr = malloc(sizeof(wchar_t) * count); if (MultiByteToWideChar(CP_UTF8, 0, string, -1, wstr, count) == 0) { // Did not succeed, free and return null pointer. free(wstr); return NULL; } return wstr; } int main() { // Sample Korean text, just force a Unicode file. const char* narrow = "path/to/한글.xlsx"; wchar_t* wide = utf8_to_utf16(narrow); // Print as a string. wprintf(L"Printing as a string: %ls\n", wide); // Print bytes, to confirm it's UTF-16. wprintf(L"Printing as an array: ["); size_t length = wcslen(wide); for (size_t i = 0; i < length; i++) { char* item = (char*)&wide[i]; wprintf(L"%d, %d, ", item[0], item[1]); } wprintf(L"]\n"); free(wide); return 0; } ``` This should be sufficient to fix the issue, along with any NULL-checks to ensure the path isn't NULL (which may happen due to invalid input, or memory errors) while writing to file.
Author
Owner

@jmcnamara commented on GitHub (Oct 9, 2019):

@Alexhuszagh Thanks for that.

<!-- gh-comment-id:539926608 --> @jmcnamara commented on GitHub (Oct 9, 2019): @Alexhuszagh Thanks for that.
Author
Owner

@jmcnamara commented on GitHub (Oct 9, 2019):

@RaFaeL-NN

So, I need a way to insert image with filename "лого.png" in ANSI

Ok. That should be possible to support. I'll add a workaround soon.

<!-- gh-comment-id:539927411 --> @jmcnamara commented on GitHub (Oct 9, 2019): @RaFaeL-NN > So, I need a way to insert image with filename "лого.png" in ANSI Ok. That should be possible to support. I'll add a workaround soon.
Author
Owner

@RaFaeL-NN commented on GitHub (Oct 30, 2019):

Good. Can you move description next to y_scale in lxw_image_options? As you know, I can not use С headers and I have to create my own data structures before sending data to lib. Of course, I can insert dummy empty variables but I hope, you'll change

typedef struct lxw_image_options {

    /** Offset from the left of the cell in pixels. */
    int32_t x_offset;

    /** Offset from the top of the cell in pixels. */
    int32_t y_offset;

    /** X scale of the image as a decimal. */
    double x_scale;

    /** Y scale of the image as a decimal. */
    double y_scale;

    lxw_row_t row;
    lxw_col_t col;
    char *filename;
    char *description;

to

typedef struct lxw_image_options {

    /** Offset from the left of the cell in pixels. */
    int32_t x_offset;

    /** Offset from the top of the cell in pixels. */
    int32_t y_offset;

    /** X scale of the image as a decimal. */
    double x_scale;

    /** Y scale of the image as a decimal. */
    double y_scale;

    char *description;
    lxw_row_t row;
    lxw_col_t col;
    char *filename;
<!-- gh-comment-id:547879466 --> @RaFaeL-NN commented on GitHub (Oct 30, 2019): Good. Can you move _description_ next to _y_scale_ in _lxw_image_options_? As you know, I can not use С headers and I have to create my own data structures before sending data to lib. Of course, I can insert dummy empty variables but I hope, you'll change ```C typedef struct lxw_image_options { /** Offset from the left of the cell in pixels. */ int32_t x_offset; /** Offset from the top of the cell in pixels. */ int32_t y_offset; /** X scale of the image as a decimal. */ double x_scale; /** Y scale of the image as a decimal. */ double y_scale; lxw_row_t row; lxw_col_t col; char *filename; char *description; ``` to ```C typedef struct lxw_image_options { /** Offset from the left of the cell in pixels. */ int32_t x_offset; /** Offset from the top of the cell in pixels. */ int32_t y_offset; /** X scale of the image as a decimal. */ double x_scale; /** Y scale of the image as a decimal. */ double y_scale; char *description; lxw_row_t row; lxw_col_t col; char *filename; ```
Author
Owner

@jmcnamara commented on GitHub (Nov 10, 2019):

Good. Can you move description next to y_scale in lxw_image_options?

Yes, I will. In fact I'm going to refactor the structs used in APIs so that all public fields are documented and available and the structs will be different internally if they contain any additional metadata that isn't public. I'll start with lxw_image_options as a test.

Once the description field is available publicly you can set it to "" (blank string) and the description/filename won't be copied internally and won't corrupt the file if it isn't UTF8. That should allow you to specify any encoding you want for the program to read the image file.

<!-- gh-comment-id:552228465 --> @jmcnamara commented on GitHub (Nov 10, 2019): > Good. Can you move description next to y_scale in lxw_image_options? Yes, I will. In fact I'm going to refactor the structs used in APIs so that all public fields are documented and available and the structs will be different internally if they contain any additional metadata that isn't public. I'll start with `lxw_image_options` as a test. Once the `description` field is available publicly you can set it to "" (blank string) and the description/filename won't be copied internally and won't corrupt the file if it isn't UTF8. That should allow you to specify any encoding you want for the program to read the image file.
Author
Owner

@jmcnamara commented on GitHub (Nov 17, 2019):

I'm going to close this issue. I'd recommend setting the description fields of lxw_image_options to "" (blank string) to prevent any non-utf8 chars getting copied into the file and corrupting it.

<!-- gh-comment-id:554778582 --> @jmcnamara commented on GitHub (Nov 17, 2019): I'm going to close this issue. I'd recommend setting the description fields of `lxw_image_options` to "" (blank string) to prevent any non-utf8 chars getting copied into the file and corrupting it.
Author
Owner

@RaFaeL-NN commented on GitHub (Nov 17, 2019):

I tested it (user 'description' field) and it works good. So, for now and for me the solution of the original issue is of copying the UTF8-encoded filename to description if user omit 'description' field. I think a commit with lxw_fopen actually does not needed

<!-- gh-comment-id:554778744 --> @RaFaeL-NN commented on GitHub (Nov 17, 2019): I tested it (user 'description' field) and it works good. So, for now and for me the solution of the original issue is of copying the UTF8-encoded filename to description if user omit 'description' field. I think a commit with lxw_fopen actually does not needed
Author
Owner

@jmcnamara commented on GitHub (Nov 20, 2019):

I think a commit with lxw_fopen actually does not needed

Yes. It wasn't needed for your case. It is still probably a worthwhile fix for people using MSVC though.

<!-- gh-comment-id:555952678 --> @jmcnamara commented on GitHub (Nov 20, 2019): > I think a commit with lxw_fopen actually does not needed Yes. It wasn't needed for your case. It is still probably a worthwhile fix for people using MSVC though.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#190
No description provided.