[GH-ISSUE #461] Using the library in a docker results in a segfault on workbook_get_worksheet_by_name #363

Closed
opened 2026-05-05 12:12:06 -06:00 by gitea-mirror · 7 comments
Owner

Originally created by @BinarSkugga on GitHub (Oct 25, 2024).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/461

Hello,

I am trying to use and run this library using Python. It works great on my computer but when I try to do it inside of a docker container it results in a segfault. I'm using ctypes and libxlsxwriter 1.1.8. I ran it using gdb, here's the full log:

Starting program: /work/venv/bin/python -u service.py -ex
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7f3f1ea00700 (LWP 37)]
[New Thread 0x7f3f1e000700 (LWP 38)]
[New Thread 0x7f3f19600700 (LWP 39)]
[New Thread 0x7f3f16c00700 (LWP 40)]
[New Thread 0x7f3f16200700 (LWP 41)]
[New Thread 0x7f3f11800700 (LWP 42)]
[New Thread 0x7f3f0ee00700 (LWP 43)]
[New Thread 0x7f3f0c400700 (LWP 44)]
[New Thread 0x7f3f09a00700 (LWP 45)]
[New Thread 0x7f3f07000700 (LWP 46)]
[New Thread 0x7f3f04600700 (LWP 47)]
[New Thread 0x7f3f01c00700 (LWP 48)]
[New Thread 0x7f3eff200700 (LWP 49)]
[New Thread 0x7f3efc800700 (LWP 50)]
[New Thread 0x7f3ef9e00700 (LWP 51)]
[New Thread 0x7f3ef9400700 (LWP 52)]
[New Thread 0x7f3ef4a00700 (LWP 53)]
[New Thread 0x7f3ef2000700 (LWP 54)]
[New Thread 0x7f3eef600700 (LWP 55)]
[New Thread 0x7f3eecc00700 (LWP 56)]
[New Thread 0x7f3eea200700 (LWP 57)]
[New Thread 0x7f3ee7800700 (LWP 58)]
[New Thread 0x7f3ee6e00700 (LWP 59)]
Test 1

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007f3f22acd10c in workbook_get_worksheet_by_name (
    self=self@entry=0x610b5d30, name=name@entry=0x55ba615b8e90 "test")
    at workbook.c:2669
2669        if (!name)

Here's my dockerfile for building and running it:

# Seconds stage builds the final image. It packages the application and the venv created in the previous stage.
FROM python:3.12.7-slim-bullseye

WORKDIR /work

RUN apt-get update -y
RUN apt-get install -y git cmake zlib1g-dev

# Debug setup
RUN apt-get install -y gdb strace
ENV CFLAGS="-g -O0"

# Copy our service inside the final image.
COPY service.py .
COPY setup.cfg .
COPY src ./src

# RUN git clone https://github.com/jmcnamara/libxlsxwriter.git
COPY resources ./resources
RUN cd resources/libxlsxwriter-1.1.8 && make V=1
RUN cp ./resources/libxlsxwriter-1.1.8/lib/libxlsxwriter.so ./resources/libxlsxwriter.so

EXPOSE 8080

# Set our service's entrypoint as the command to be ran upon start of a container using this image.
CMD gdb --ex run --args python -u service.py -ex

And finally the python code, although it works on my host:

class Workbook:
    def __init__(self, book_name: str, options: WorkbookOptions, xlsxlib: Any):
        self.xlsx = xlsxlib

        self.name = book_name
        self.options = options
        self._sheets: Dict[int, Worksheet] = {}
        self._formats: Dict[int, Format] = {}

        tmp_base_path = os.path.join(os.getcwd(), "resources", "tmp")

        self._c_book = self.xlsx.workbook_new_opt(
            cstring(os.path.join(tmp_base_path, self.name)),
            cref(self.options.to_cstruct())
        )

    def add_sheet(self, sheet: Worksheet) -> Worksheet:
        c_sheet = self.xlsx.workbook_add_worksheet(self._c_book, cstring(sheet.name))
        self._sheets[c_sheet] = sheet
        sheet.owner_book = self
        sheet.c_id = c_sheet
        return sheet

cstring does a ctypes.c_char_p, it fails on workbook_add_worksheet

Originally created by @BinarSkugga on GitHub (Oct 25, 2024). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/461 Hello, I am trying to use and run this library using Python. It works great on my computer but when I try to do it inside of a docker container it results in a segfault. I'm using `ctypes` and `libxlsxwriter 1.1.8`. I ran it using `gdb`, here's the full log: ``` Starting program: /work/venv/bin/python -u service.py -ex warning: Error disabling address space randomization: Operation not permitted [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". [New Thread 0x7f3f1ea00700 (LWP 37)] [New Thread 0x7f3f1e000700 (LWP 38)] [New Thread 0x7f3f19600700 (LWP 39)] [New Thread 0x7f3f16c00700 (LWP 40)] [New Thread 0x7f3f16200700 (LWP 41)] [New Thread 0x7f3f11800700 (LWP 42)] [New Thread 0x7f3f0ee00700 (LWP 43)] [New Thread 0x7f3f0c400700 (LWP 44)] [New Thread 0x7f3f09a00700 (LWP 45)] [New Thread 0x7f3f07000700 (LWP 46)] [New Thread 0x7f3f04600700 (LWP 47)] [New Thread 0x7f3f01c00700 (LWP 48)] [New Thread 0x7f3eff200700 (LWP 49)] [New Thread 0x7f3efc800700 (LWP 50)] [New Thread 0x7f3ef9e00700 (LWP 51)] [New Thread 0x7f3ef9400700 (LWP 52)] [New Thread 0x7f3ef4a00700 (LWP 53)] [New Thread 0x7f3ef2000700 (LWP 54)] [New Thread 0x7f3eef600700 (LWP 55)] [New Thread 0x7f3eecc00700 (LWP 56)] [New Thread 0x7f3eea200700 (LWP 57)] [New Thread 0x7f3ee7800700 (LWP 58)] [New Thread 0x7f3ee6e00700 (LWP 59)] Test 1 Thread 1 "python" received signal SIGSEGV, Segmentation fault. 0x00007f3f22acd10c in workbook_get_worksheet_by_name ( self=self@entry=0x610b5d30, name=name@entry=0x55ba615b8e90 "test") at workbook.c:2669 2669 if (!name) ``` Here's my dockerfile for building and running it: ```Dockerfile # Seconds stage builds the final image. It packages the application and the venv created in the previous stage. FROM python:3.12.7-slim-bullseye WORKDIR /work RUN apt-get update -y RUN apt-get install -y git cmake zlib1g-dev # Debug setup RUN apt-get install -y gdb strace ENV CFLAGS="-g -O0" # Copy our service inside the final image. COPY service.py . COPY setup.cfg . COPY src ./src # RUN git clone https://github.com/jmcnamara/libxlsxwriter.git COPY resources ./resources RUN cd resources/libxlsxwriter-1.1.8 && make V=1 RUN cp ./resources/libxlsxwriter-1.1.8/lib/libxlsxwriter.so ./resources/libxlsxwriter.so EXPOSE 8080 # Set our service's entrypoint as the command to be ran upon start of a container using this image. CMD gdb --ex run --args python -u service.py -ex ``` And finally the python code, although it works on my host: ```python class Workbook: def __init__(self, book_name: str, options: WorkbookOptions, xlsxlib: Any): self.xlsx = xlsxlib self.name = book_name self.options = options self._sheets: Dict[int, Worksheet] = {} self._formats: Dict[int, Format] = {} tmp_base_path = os.path.join(os.getcwd(), "resources", "tmp") self._c_book = self.xlsx.workbook_new_opt( cstring(os.path.join(tmp_base_path, self.name)), cref(self.options.to_cstruct()) ) def add_sheet(self, sheet: Worksheet) -> Worksheet: c_sheet = self.xlsx.workbook_add_worksheet(self._c_book, cstring(sheet.name)) self._sheets[c_sheet] = sheet sheet.owner_book = self sheet.c_id = c_sheet return sheet ``` `cstring` does a `ctypes.c_char_p`, it fails on `workbook_add_worksheet`
Author
Owner

@jmcnamara commented on GitHub (Oct 25, 2024):

Why not just use the Python version of the library, XlsxWriter: https://xlsxwriter.readthedocs.io/index.html

<!-- gh-comment-id:2438935123 --> @jmcnamara commented on GitHub (Oct 25, 2024): Why not just use the Python version of the library, XlsxWriter: https://xlsxwriter.readthedocs.io/index.html
Author
Owner

@BinarSkugga commented on GitHub (Oct 26, 2024):

My reports have upward of 100k lines and it can takes 3-5 minutes to generate them using the python library. I am trying to gain some performance so the delay is not as bad. I tried using constant memory and the port for this in python seems abandoned.

<!-- gh-comment-id:2439175636 --> @BinarSkugga commented on GitHub (Oct 26, 2024): My reports have upward of 100k lines and it can takes 3-5 minutes to generate them using the python library. I am trying to gain some performance so the delay is not as bad. I tried using constant memory and the port for this in python seems abandoned.
Author
Owner

@jmcnamara commented on GitHub (Oct 26, 2024):

My reports have upward of 100k lines and it can takes 3-5 minutes to generate them using the python library.

It shouldn't take that long. Here is a quick test I did with the performance example in the XlsxWriter repo:

python dev/performance/perf_pyx.py 100000
100000,  50,  34.07, 0

It writes 100,000 rows by 50 columns of mixed numbers and strings in around 30 seconds. Try it out on your test machine. If you get similar results but your overall program takes 3 minutes then the bottleneck is elsewhere.

I tried using constant memory and the port for this in python seems abandoned.

It isn't abandoned. I maintain both XlsxWriter and libxlsxwriter and the constant_memory functionality is exactly the same in both.

<!-- gh-comment-id:2439456170 --> @jmcnamara commented on GitHub (Oct 26, 2024): > My reports have upward of 100k lines and it can takes 3-5 minutes to generate them using the python library. It shouldn't take that long. Here is a quick test I did with the performance example in the XlsxWriter [repo](https://github.com/jmcnamara/XlsxWriter/blob/main/dev/performance/perf_pyx.py): ``` python dev/performance/perf_pyx.py 100000 100000, 50, 34.07, 0 ``` It writes 100,000 rows by 50 columns of mixed numbers and strings in around 30 seconds. Try it out on your test machine. If you get similar results but your overall program takes 3 minutes then the bottleneck is elsewhere. > I tried using constant memory and the port for this in python seems abandoned. It isn't abandoned. I maintain both `XlsxWriter` and `libxlsxwriter` and the `constant_memory` functionality is exactly the same in both.
Author
Owner

@BinarSkugga commented on GitHub (Oct 26, 2024):

I'll try that inside of our docker. I believe it's limited in RAM & CPU, that might be the issue, that or some abstraction around the library that we made.

I didn't mean the python library is abandoned, sorry about that. I meant the python port around the library in C is: https://github.com/pyexcel/libxlsxwpy

Apart from this, do you have any idea about the segfault ? I still want an alternative if the issue is something I have no control over.

<!-- gh-comment-id:2439639932 --> @BinarSkugga commented on GitHub (Oct 26, 2024): I'll try that inside of our docker. I believe it's limited in RAM & CPU, that might be the issue, that or some abstraction around the library that we made. I didn't mean the python library is abandoned, sorry about that. I meant the python port around the library in C is: https://github.com/pyexcel/libxlsxwpy Apart from this, do you have any idea about the segfault ? I still want an alternative if the issue is something I have no control over.
Author
Owner

@jmcnamara commented on GitHub (Oct 26, 2024):

Apart from this, do you have any idea about the segfault ?

I don't. It is not something that I have encountered or have seen reported.

I still want an alternative if the issue is something I have no control over.

That is reasonable. If it is an option then the Rust version of this library rust_xlsxwriter has the speed of the C library and (if you use Rust)) the usability of the Python version. It also supports constant memory mode if needed: https://github.com/jmcnamara/rust_xlsxwriter

<!-- gh-comment-id:2439655879 --> @jmcnamara commented on GitHub (Oct 26, 2024): > Apart from this, do you have any idea about the segfault ? I don't. It is not something that I have encountered or have seen reported. > I still want an alternative if the issue is something I have no control over. That is reasonable. If it is an option then the Rust version of this library `rust_xlsxwriter` has the speed of the C library and (if you use Rust)) the usability of the Python version. It also supports constant memory mode if needed: https://github.com/jmcnamara/rust_xlsxwriter
Author
Owner

@jmcnamara commented on GitHub (Oct 26, 2024):

I will need to close this because I don't believe it is a bug in libxlsxwriter. If you do find the source of the issue let me know.

<!-- gh-comment-id:2439656098 --> @jmcnamara commented on GitHub (Oct 26, 2024): I will need to close this because I don't believe it is a bug in libxlsxwriter. If you do find the source of the issue let me know.
Author
Owner

@BinarSkugga commented on GitHub (Oct 27, 2024):

No worries, thank for your help still :)

<!-- gh-comment-id:2439805834 --> @BinarSkugga commented on GitHub (Oct 27, 2024): No worries, thank for your help still :)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#363
No description provided.