[GH-ISSUE #225] Inserting many images creates oversized and broken xlsx file #181

Closed
opened 2026-05-05 11:50:51 -06:00 by gitea-mirror · 7 comments
Owner

Originally created by @michalw69 on GitHub (May 10, 2019).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/225

Originally assigned to: @jmcnamara on GitHub.

I've included testcase that demonstrate the problem stated in the title
main.cpp.txt

Each image is saved as a separate object. It seems the file names in xlsx media folder overlap (there are multiple image0.prn, image1.png etc. files) and cause Excel to complain.

This is also inefficient both in time and space - the original 46MB file after repair shrinked to ca. 3MB.

This could be solved by introducing bitmap objects like this
{ lxw_image *image = workbook_add_image(...);
worksheet_insert_image(worksheet, row, col, image); }

Alternatively but less efficiently image cache could be implemented within worksheet_insert_image_buffer.

Originally created by @michalw69 on GitHub (May 10, 2019). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/225 Originally assigned to: @jmcnamara on GitHub. I've included testcase that demonstrate the problem stated in the title [main.cpp.txt](https://github.com/jmcnamara/libxlsxwriter/files/3165423/main.cpp.txt) Each image is saved as a separate object. It seems the file names in xlsx media folder overlap (there are multiple image0.prn, image1.png etc. files) and cause Excel to complain. This is also inefficient both in time and space - the original 46MB file after repair shrinked to ca. 3MB. This could be solved by introducing bitmap objects like this { lxw_image *image = workbook_add_image(...); worksheet_insert_image(worksheet, row, col, image); } Alternatively but less efficiently image cache could be implemented within worksheet_insert_image_buffer.
gitea-mirror 2026-05-05 11:50:51 -06:00
Author
Owner

@jmcnamara commented on GitHub (May 10, 2019):

Adding the sample program:

#include "xlsxwriter.h"

/* Simple array with some PNG data. */
unsigned char image_buffer[] = {
    0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x00, 0x00, 0x0d,
    0x49, 0x48, 0x44, 0x52, 0x00, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x20,
    0x08, 0x02, 0x00, 0x00, 0x00, 0xfc, 0x18, 0xed, 0xa3, 0x00, 0x00, 0x00,
    0x01, 0x73, 0x52, 0x47, 0x42, 0x00, 0xae, 0xce, 0x1c, 0xe9, 0x00, 0x00,
    0x00, 0x04, 0x67, 0x41, 0x4d, 0x41, 0x00, 0x00, 0xb1, 0x8f, 0x0b, 0xfc,
    0x61, 0x05, 0x00, 0x00, 0x00, 0x20, 0x63, 0x48, 0x52, 0x4d, 0x00, 0x00,
    0x7a, 0x26, 0x00, 0x00, 0x80, 0x84, 0x00, 0x00, 0xfa, 0x00, 0x00, 0x00,
    0x80, 0xe8, 0x00, 0x00, 0x75, 0x30, 0x00, 0x00, 0xea, 0x60, 0x00, 0x00,
    0x3a, 0x98, 0x00, 0x00, 0x17, 0x70, 0x9c, 0xba, 0x51, 0x3c, 0x00, 0x00,
    0x00, 0x46, 0x49, 0x44, 0x41, 0x54, 0x48, 0x4b, 0x63, 0xfc, 0xcf, 0x40,
    0x63, 0x00, 0xb4, 0x80, 0xa6, 0x88, 0xb6, 0xa6, 0x83, 0x82, 0x87, 0xa6,
    0xce, 0x1f, 0xb5, 0x80, 0x98, 0xe0, 0x1d, 0x8d, 0x03, 0x82, 0xa1, 0x34,
    0x1a, 0x44, 0xa3, 0x41, 0x44, 0x30, 0x04, 0x08, 0x2a, 0x18, 0x4d, 0x45,
    0xa3, 0x41, 0x44, 0x30, 0x04, 0x08, 0x2a, 0x18, 0x4d, 0x45, 0xa3, 0x41,
    0x44, 0x30, 0x04, 0x08, 0x2a, 0x18, 0x4d, 0x45, 0x03, 0x1f, 0x44, 0x00,
    0xaa, 0x35, 0xdd, 0x4e, 0xe6, 0xd5, 0xa1, 0x22, 0x00, 0x00, 0x00, 0x00,
    0x49, 0x45, 0x4e, 0x44, 0xae, 0x42, 0x60, 0x82
};

unsigned int image_size = 200;

int main() {
    lxw_workbook  *workbook  = workbook_new("image_buffer.xlsx");

    for(int i=0; i < 3; ++i)
    {
        lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL);

        for(int row=1; row<10000; ++row)
        {
            for(int col=1; col<6; ++col)
            {
                worksheet_insert_image_buffer(worksheet, row, col, image_buffer, image_size);
            }
        }
    }

    workbook_close(workbook);
    return 0;
}

<!-- gh-comment-id:491200664 --> @jmcnamara commented on GitHub (May 10, 2019): Adding the sample program: ```c #include "xlsxwriter.h" /* Simple array with some PNG data. */ unsigned char image_buffer[] = { 0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x00, 0x00, 0x0d, 0x49, 0x48, 0x44, 0x52, 0x00, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x20, 0x08, 0x02, 0x00, 0x00, 0x00, 0xfc, 0x18, 0xed, 0xa3, 0x00, 0x00, 0x00, 0x01, 0x73, 0x52, 0x47, 0x42, 0x00, 0xae, 0xce, 0x1c, 0xe9, 0x00, 0x00, 0x00, 0x04, 0x67, 0x41, 0x4d, 0x41, 0x00, 0x00, 0xb1, 0x8f, 0x0b, 0xfc, 0x61, 0x05, 0x00, 0x00, 0x00, 0x20, 0x63, 0x48, 0x52, 0x4d, 0x00, 0x00, 0x7a, 0x26, 0x00, 0x00, 0x80, 0x84, 0x00, 0x00, 0xfa, 0x00, 0x00, 0x00, 0x80, 0xe8, 0x00, 0x00, 0x75, 0x30, 0x00, 0x00, 0xea, 0x60, 0x00, 0x00, 0x3a, 0x98, 0x00, 0x00, 0x17, 0x70, 0x9c, 0xba, 0x51, 0x3c, 0x00, 0x00, 0x00, 0x46, 0x49, 0x44, 0x41, 0x54, 0x48, 0x4b, 0x63, 0xfc, 0xcf, 0x40, 0x63, 0x00, 0xb4, 0x80, 0xa6, 0x88, 0xb6, 0xa6, 0x83, 0x82, 0x87, 0xa6, 0xce, 0x1f, 0xb5, 0x80, 0x98, 0xe0, 0x1d, 0x8d, 0x03, 0x82, 0xa1, 0x34, 0x1a, 0x44, 0xa3, 0x41, 0x44, 0x30, 0x04, 0x08, 0x2a, 0x18, 0x4d, 0x45, 0xa3, 0x41, 0x44, 0x30, 0x04, 0x08, 0x2a, 0x18, 0x4d, 0x45, 0xa3, 0x41, 0x44, 0x30, 0x04, 0x08, 0x2a, 0x18, 0x4d, 0x45, 0x03, 0x1f, 0x44, 0x00, 0xaa, 0x35, 0xdd, 0x4e, 0xe6, 0xd5, 0xa1, 0x22, 0x00, 0x00, 0x00, 0x00, 0x49, 0x45, 0x4e, 0x44, 0xae, 0x42, 0x60, 0x82 }; unsigned int image_size = 200; int main() { lxw_workbook *workbook = workbook_new("image_buffer.xlsx"); for(int i=0; i < 3; ++i) { lxw_worksheet *worksheet = workbook_add_worksheet(workbook, NULL); for(int row=1; row<10000; ++row) { for(int col=1; col<6; ++col) { worksheet_insert_image_buffer(worksheet, row, col, image_buffer, image_size); } } } workbook_close(workbook); return 0; } ```
Author
Owner

@jmcnamara commented on GitHub (May 10, 2019):

Thanks, that is a bug. I'll look into fixing it soon.

image cache could be implemented within worksheet_insert_image_buffer.

There are plans to do that. Probably in the short term.

<!-- gh-comment-id:491201469 --> @jmcnamara commented on GitHub (May 10, 2019): Thanks, that is a bug. I'll look into fixing it soon. > image cache could be implemented within worksheet_insert_image_buffer. There are plans to do that. Probably in the short term.
Author
Owner

@jmcnamara commented on GitHub (Jun 11, 2019):

I've fixed the issue that was causing broken files. The fix is on master.

Handling duplicate images is a feature request which will be addressed soon. In the meantime add a +1 to the similar Python feature request and you will be notified when it is completed. https://github.com/jmcnamara/XlsxWriter/issues/615

The C version will follow soon after.

<!-- gh-comment-id:501001279 --> @jmcnamara commented on GitHub (Jun 11, 2019): I've fixed the issue that was causing broken files. The fix is on master. Handling duplicate images is a feature request which will be addressed soon. In the meantime add a +1 to the similar Python feature request and you will be notified when it is completed. https://github.com/jmcnamara/XlsxWriter/issues/615 The C version will follow soon after.
Author
Owner

@jmcnamara commented on GitHub (Jun 12, 2019):

I had to add some additional fixes for some other edge cases. Now on master.

<!-- gh-comment-id:501070847 --> @jmcnamara commented on GitHub (Jun 12, 2019): I had to add some additional fixes for some other edge cases. Now on master.
Author
Owner

@michalw69 commented on GitHub (Jun 27, 2019):

Thanks John, the correction works.

My colleague reported only minor problem with building the library with msvc because of including <strings.h> which is unavailable and not necessary.

<!-- gh-comment-id:506290417 --> @michalw69 commented on GitHub (Jun 27, 2019): Thanks John, the correction works. My colleague reported only minor problem with building the library with msvc because of including <strings.h> which is unavailable and not necessary.
Author
Owner

@jmcnamara commented on GitHub (Jun 27, 2019):

My colleague reported only minor problem with building the library with msvc because of including <strings.h> which is unavailable and not necessary.

That is now fixed on master.

<!-- gh-comment-id:506294019 --> @jmcnamara commented on GitHub (Jun 27, 2019): > My colleague reported only minor problem with building the library with msvc because of including <strings.h> which is unavailable and not necessary. That is now fixed on master.
Author
Owner

@jmcnamara commented on GitHub (Dec 26, 2019):

Note, duplicate images are now removed by libxlsxwriter so the output files are smaller, like Excel.

The fix is on master.

<!-- gh-comment-id:569085216 --> @jmcnamara commented on GitHub (Dec 26, 2019): Note, duplicate images are now removed by libxlsxwriter so the output files are smaller, like Excel. The fix is on master.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#181
No description provided.