[GH-ISSUE #260] External MD5 Library #205

Closed
opened 2026-05-05 11:55:16 -06:00 by gitea-mirror · 15 comments
Owner

Originally created by @squinky86 on GitHub (Jan 2, 2020).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/260

Originally assigned to: @jmcnamara on GitHub.

With the latest libxlsxwriter (0.9.1), my software has a linking error when I try to link it against openssl and libxlsxwriter:

/home/squinky86/w64/openssl-1.1.1d/lib/libcrypto.a(md5_dgst.o):md5_dgst.c:(.text+0x0): multiple definition of `MD5_Init'
/home/squinky86/w64/libxlsxwriter-RELEASE_0.9.1/libxlsxwriter.a(md5.c.obj):md5.c:(.text+0xca9): first defined here
/home/squinky86/w64/openssl-1.1.1d/lib/libcrypto.a(md5_dgst.o):md5_dgst.c:(.text+0x8b9): multiple definition of `MD5_Update'
/home/squinky86/w64/libxlsxwriter-RELEASE_0.9.1/libxlsxwriter.a(md5.c.obj):md5.c:(.text+0xcf5): first defined here
/home/squinky86/w64/openssl-1.1.1d/lib/libcrypto.a(md5_dgst.o):md5_dgst.c:(.text+0x9b5): multiple definition of `MD5_Final'
/home/squinky86/w64/libxlsxwriter-RELEASE_0.9.1/libxlsxwriter.a(md5.c.obj):md5.c:(.text+0xe3d): first defined here
collect2: error: ld returned 1 exit status

This is because openssl defines the MD5 functions that libxlsxwriter defines. I can fix this for this particular software by adding the "-DUSE_NO_MD5=ON" cmake flag, but then my software does not take advantage of the duplicate-image-detection optimizations of the latest libxlsxwriter release.

Suggested fix: libxlsxwriter should allow the external linking of a library that provides the md5 hash functions (similar to -DZLIB_LIBRARY=) or should rename its provided md5 functions to prevent a collision.

Originally created by @squinky86 on GitHub (Jan 2, 2020). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/260 Originally assigned to: @jmcnamara on GitHub. With the latest libxlsxwriter (0.9.1), my software has a linking error when I try to link it against openssl and libxlsxwriter: ``` /home/squinky86/w64/openssl-1.1.1d/lib/libcrypto.a(md5_dgst.o):md5_dgst.c:(.text+0x0): multiple definition of `MD5_Init' /home/squinky86/w64/libxlsxwriter-RELEASE_0.9.1/libxlsxwriter.a(md5.c.obj):md5.c:(.text+0xca9): first defined here /home/squinky86/w64/openssl-1.1.1d/lib/libcrypto.a(md5_dgst.o):md5_dgst.c:(.text+0x8b9): multiple definition of `MD5_Update' /home/squinky86/w64/libxlsxwriter-RELEASE_0.9.1/libxlsxwriter.a(md5.c.obj):md5.c:(.text+0xcf5): first defined here /home/squinky86/w64/openssl-1.1.1d/lib/libcrypto.a(md5_dgst.o):md5_dgst.c:(.text+0x9b5): multiple definition of `MD5_Final' /home/squinky86/w64/libxlsxwriter-RELEASE_0.9.1/libxlsxwriter.a(md5.c.obj):md5.c:(.text+0xe3d): first defined here collect2: error: ld returned 1 exit status ``` This is because openssl defines the MD5 functions that libxlsxwriter defines. I can fix this for this particular software by adding the "-DUSE_NO_MD5=ON" cmake flag, but then my software does not take advantage of the duplicate-image-detection optimizations of the latest libxlsxwriter release. Suggested fix: libxlsxwriter should allow the external linking of a library that provides the md5 hash functions (similar to -DZLIB_LIBRARY=) or should rename its provided md5 functions to prevent a collision.
gitea-mirror 2026-05-05 11:55:16 -06:00
Author
Owner

@jmcnamara commented on GitHub (Jan 2, 2020):

Ack. I should have thought of this. The MD5 lib I included uses OpenSSL compatible MD5 functions.

Suggested fix: libxlsxwriter should allow the external linking of a library that provides the md5 hash functions (similar to -DZLIB_LIBRARY=) or should rename its provided md5 functions to prevent a collision.

I don't know which is the best approach. Renaming the MD5 functions is the simplest. Linking against OpenSSL would remove duplication but will complicate the build systems and prerequisites.

I'll need to think about this before I make the change.

<!-- gh-comment-id:570340378 --> @jmcnamara commented on GitHub (Jan 2, 2020): Ack. I should have thought of this. The MD5 lib I included uses OpenSSL compatible MD5 functions. > Suggested fix: libxlsxwriter should allow the external linking of a library that provides the md5 hash functions (similar to -DZLIB_LIBRARY=) or should rename its provided md5 functions to prevent a collision. I don't know which is the best approach. Renaming the MD5 functions is the simplest. Linking against OpenSSL would remove duplication but will complicate the build systems and prerequisites. I'll need to think about this before I make the change.
Author
Owner

@jmcnamara commented on GitHub (Jan 2, 2020):

@Alexhuszagh, @evanmiller Any thoughts on this issue?

<!-- gh-comment-id:570340896 --> @jmcnamara commented on GitHub (Jan 2, 2020): @Alexhuszagh, @evanmiller Any thoughts on this issue?
Author
Owner

@evanmiller commented on GitHub (Jan 2, 2020):

Should something in the build be defining HAVE_OPENSSL?

1290420818/third_party/md5/md5.c (L38)

<!-- gh-comment-id:570390862 --> @evanmiller commented on GitHub (Jan 2, 2020): Should something in the build be defining `HAVE_OPENSSL`? https://github.com/jmcnamara/libxlsxwriter/blob/12904208185fb9d3fac54da33a154f07a4df906d/third_party/md5/md5.c#L38
Author
Owner

@jmcnamara commented on GitHub (Jan 6, 2020):

@evanmiller I don't think that is the issue. I think this is probably a link rather than build issue.

@squinky86 I can't reproduce this. I tried the following:

cd libxlsxwriter

make clean
make 
sudo make install

cd examples/

# Dynamic linking.
cc -g -Wall -Wextra demo.c -o demo -lxlsxwriter -lcrypto

# Static linking.
cc -g -Wall -Wextra demo.c -o demo /usr/local/lib/libxlsxwriter.a /usr/local/lib/openssl/libcrypto.a -lz

Do you have an example to reproduce it.

I'm currently leaning towards renaming the MD5 functions to avoid a conflict since I don't think changing the build systems to link against openssl/libcrypto is worth it. However, I'd link to be able to reproduce this first.

<!-- gh-comment-id:571101628 --> @jmcnamara commented on GitHub (Jan 6, 2020): @evanmiller I don't think that is the issue. I think this is probably a link rather than build issue. @squinky86 I can't reproduce this. I tried the following: ```bash cd libxlsxwriter make clean make sudo make install cd examples/ # Dynamic linking. cc -g -Wall -Wextra demo.c -o demo -lxlsxwriter -lcrypto # Static linking. cc -g -Wall -Wextra demo.c -o demo /usr/local/lib/libxlsxwriter.a /usr/local/lib/openssl/libcrypto.a -lz ``` Do you have an example to reproduce it. I'm currently leaning towards renaming the MD5 functions to avoid a conflict since I don't think changing the build systems to link against openssl/libcrypto is worth it. However, I'd link to be able to reproduce this first.
Author
Owner

@jmcnamara commented on GitHub (Jan 6, 2020):

I had another go at reproducing this with a simple openssl based example:

$ cat mymd5.c

#include <stdio.h>
#include <string.h>
#include <openssl/md5.h>

int
main(int argc, char **argv)
{
    MD5_CTX context;
    unsigned char checksum[16];
    int i;
    int j;

    if (argc < 2) {
        fprintf(stderr, "usage: %s string-to-hash\n", argv[0]);
        return (1);
    }
    for (j = 1; j < argc; ++j) {
        printf("MD5 (\"%s\") = ", argv[j]);
        MD5_Init(&context);
        MD5_Update(&context, argv[j], strlen(argv[j]));
        MD5_Final(checksum, &context);
        for (i = 0; i < 16; i++) {
            printf("%02x", (unsigned int) checksum[i]);
        }
        printf("\n");
    }
    return 0;
}

This will fail complation since we don't link the MD5 functions.

$ cc     mymd5.c   -o mymd5
/tmp/ccV4g3zd.o: In function `main':
mymd5.c:(.text+0x8e): undefined reference to `MD5_Init'
mymd5.c:(.text+0xdf): undefined reference to `MD5_Update'
mymd5.c:(.text+0xf2): undefined reference to `MD5_Final'
collect2: error: ld returned 1 exit status

Then link the MD5 functions from libcrypto and libxlsxwriter:

$ cc     mymd5.c   -o mymd5  /usr/local/lib/openssl/libcrypto.a # Works
$ cc     mymd5.c   -o mymd5  /usr/local/lib/libxlsxwriter.a # Also works
$ cc     mymd5.c   -o mymd5  /usr/local/lib/libxlsxwriter.a /usr/local/lib/openssl/libcrypto.a # No conflict

Again, I can't reproduce your error. I'm not saying that it doesn't exist, just that I can't reproduce it.

<!-- gh-comment-id:571115112 --> @jmcnamara commented on GitHub (Jan 6, 2020): I had another go at reproducing this with a simple openssl based example: ```C $ cat mymd5.c #include <stdio.h> #include <string.h> #include <openssl/md5.h> int main(int argc, char **argv) { MD5_CTX context; unsigned char checksum[16]; int i; int j; if (argc < 2) { fprintf(stderr, "usage: %s string-to-hash\n", argv[0]); return (1); } for (j = 1; j < argc; ++j) { printf("MD5 (\"%s\") = ", argv[j]); MD5_Init(&context); MD5_Update(&context, argv[j], strlen(argv[j])); MD5_Final(checksum, &context); for (i = 0; i < 16; i++) { printf("%02x", (unsigned int) checksum[i]); } printf("\n"); } return 0; } ``` This will fail complation since we don't link the MD5 functions. ```bash $ cc mymd5.c -o mymd5 /tmp/ccV4g3zd.o: In function `main': mymd5.c:(.text+0x8e): undefined reference to `MD5_Init' mymd5.c:(.text+0xdf): undefined reference to `MD5_Update' mymd5.c:(.text+0xf2): undefined reference to `MD5_Final' collect2: error: ld returned 1 exit status ``` Then link the MD5 functions from libcrypto and libxlsxwriter: ```bash $ cc mymd5.c -o mymd5 /usr/local/lib/openssl/libcrypto.a # Works $ cc mymd5.c -o mymd5 /usr/local/lib/libxlsxwriter.a # Also works $ cc mymd5.c -o mymd5 /usr/local/lib/libxlsxwriter.a /usr/local/lib/openssl/libcrypto.a # No conflict ``` Again, I can't reproduce your error. I'm not saying that it doesn't exist, just that I can't reproduce it.
Author
Owner

@Alexhuszagh commented on GitHub (Jan 6, 2020):

@jmcnamara I think the issue only occurs when links to libxlsxwriter and to OpenSSL, that is, both OpenSSL and libxlsxwriter will build fine, but downstream projects using both may have issues.

I currently don't have a test computer to work with (mine's in the repair shop, and I'm trying to find a way to get a Unix-like environment at the library), but I think this may be a decently trivial fix, one of 2 ways:

Most Compact Fix

If we want to minimize the binary size, ideally we'd avoiding duplicating any functionality whenever possible. To do this, we'd have to make the build system aware of OpenSSL, so we'd have:

1). An option to use MD5 or not (USE_MD5).
2). An option to use OpenSSL or not for all cryptographic functions (USE_OPENSSL).

If it uses OpenSSL, then it links to the OpenSSL MD5 functions, otherwise, it will link to the imported symbols (optionally renamed), and use certain macros to ensure the proper symbol names are used.

This, however, increases build system complexity, which may not be desired. If the symbols are not renamed, and the user accidentally uses both OpenSSL and libsxlwriter without making the build system aware of it, it may become the source of a lot of frustration and spurious bug reports.

Make OpenSSL a Dependency

Another solution would be to make OpenSSL a dependency if the MD5 option is used. I understand there are other reasons why this is not desired.

Simple Option

Just rename all the symbols. This will always work, with no issues, but will lead to slightly larger binary sizes if both OpenSSL and libxlsxwriter are both used in a project.

<!-- gh-comment-id:571168141 --> @Alexhuszagh commented on GitHub (Jan 6, 2020): @jmcnamara I think the issue only occurs when links to libxlsxwriter and to OpenSSL, that is, both OpenSSL and libxlsxwriter will build fine, but downstream projects using both may have issues. I currently don't have a test computer to work with (mine's in the repair shop, and I'm trying to find a way to get a Unix-like environment at the library), but I think this may be a decently trivial fix, one of 2 ways: **Most Compact Fix** If we want to minimize the binary size, ideally we'd avoiding duplicating any functionality whenever possible. To do this, we'd have to make the build system aware of OpenSSL, so we'd have: 1). An option to use MD5 or not (`USE_MD5`). 2). An option to use OpenSSL or not for all cryptographic functions (`USE_OPENSSL`). If it uses OpenSSL, then it links to the OpenSSL MD5 functions, otherwise, it will link to the imported symbols (optionally renamed), and use certain macros to ensure the proper symbol names are used. This, however, increases build system complexity, which may not be desired. If the symbols are not renamed, and the user accidentally uses both OpenSSL and libsxlwriter without making the build system aware of it, it may become the source of a lot of frustration and spurious bug reports. **Make OpenSSL a Dependency** Another solution would be to make OpenSSL a dependency if the MD5 option is used. I understand there are other reasons why this is not desired. **Simple Option** Just rename all the symbols. This will always work, with no issues, but will lead to slightly larger binary sizes if both OpenSSL and libxlsxwriter are both used in a project.
Author
Owner

@squinky86 commented on GitHub (Jan 6, 2020):

@Alexhuszagh is correct - this only occurs in downstream projects that link to both libxlsxwriter and openssl due to the name conflict. I'm about to create a pull request for the Simple Option - it makes the binary no bigger than it would be if the md5 functions that are in the third party folder were built-in. Though I do believe the OpenSSL dependency is more elegant.

<!-- gh-comment-id:571232051 --> @squinky86 commented on GitHub (Jan 6, 2020): @Alexhuszagh is correct - this only occurs in downstream projects that link to both libxlsxwriter and openssl due to the name conflict. I'm about to create a pull request for the Simple Option - it makes the binary no bigger than it would be if the md5 functions that are in the third party folder were built-in. Though I do believe the OpenSSL dependency is more elegant.
Author
Owner

@squinky86 commented on GitHub (Jan 6, 2020):

Another option: zlib provides a crc32 and adler32 which could be used instead of md5 without introducing additional overhead, since zlib is already a dependency.

<!-- gh-comment-id:571240222 --> @squinky86 commented on GitHub (Jan 6, 2020): Another option: zlib provides a crc32 and adler32 which could be used instead of md5 without introducing additional overhead, since zlib is already a dependency.
Author
Owner

@Alexhuszagh commented on GitHub (Jan 6, 2020):

@squinky86 I'm slightly worried about the number of bits possible with CRC32 and Adler32, both of which produce a 32-bit hash. MD5 has its own issues, but if we're worried about hash collisions, then we should probably prefer a more modern hash function, and preferably with more bits.

If it's purely accidental, the chance of collision being >= 50% is ~77,000 images with Adler32 or CRC32, while with MD5 it's ~2.17e19 images. A lot, but still, the chance of accidentally removing an image and replacing it with a false duplicate is noticeably non-zero with random data with a 32-bit checksum, but practically 0 for any 128-bit hash. There is a 1% collision chance with ADLER32 and CRC32 at ~9200 images. With enough users and enough data, this will eventually occur.

This is using the fast approximation of the birthday problem, where the approximate number of values required to reach a collision is approximately math.sqrt(-2*n*math.log(1-p, math.e)) (using Python code), where n is the number of possible hash values, and p is the probability of a collision.

I understand these probabilities are very low, and that pretty much any client chokes on larger number of images, but I'd like to mitigate any potential chance of a hash collision if possible.

<!-- gh-comment-id:571261206 --> @Alexhuszagh commented on GitHub (Jan 6, 2020): @squinky86 I'm slightly worried about the number of bits possible with CRC32 and Adler32, both of which produce a 32-bit hash. MD5 has its own issues, but if we're worried about hash collisions, then we should probably prefer a more modern hash function, and preferably with more bits. If it's purely accidental, the chance of collision being >= 50% is ~77,000 images with Adler32 or CRC32, while with MD5 it's ~2.17e19 images. A lot, but still, the chance of accidentally removing an image and replacing it with a false duplicate is noticeably non-zero with random data with a 32-bit checksum, but practically 0 for any 128-bit hash. There is a 1% collision chance with ADLER32 and CRC32 at ~9200 images. With enough users and enough data, this will eventually occur. This is using the fast approximation of the birthday problem, where the approximate number of values required to reach a collision is approximately `math.sqrt(-2*n*math.log(1-p, math.e))` (using Python code), where `n` is the number of possible hash values, and `p` is the probability of a collision. I understand these probabilities are very low, and that pretty much any client chokes on larger number of images, but I'd like to mitigate any potential chance of a hash collision if possible.
Author
Owner

@RaFaeL-NN commented on GitHub (Jan 6, 2020):

What if generate both 32-bit checksums and check their comparison? It will be something like 64-bit

<!-- gh-comment-id:571265818 --> @RaFaeL-NN commented on GitHub (Jan 6, 2020): What if generate both 32-bit checksums and check their comparison? It will be something like 64-bit
Author
Owner

@squinky86 commented on GitHub (Jan 6, 2020):

@jmcnamara: I was able to reproduce by compiling a static openssl (./Configure mingw64 no-shared no-asm no-err no-tests CROSS_COMPILE=x86_64-w64-mingw32-) and static libxlsxwriter (cmake ./ -DUSE_STANDARD_TMPFILE=OFF -DCMAKE_TOOLCHAIN_FILE=../cross-compile.cmake -DBUILD_SHARED_LIBS=OFF -DZLIB_INCLUDE_DIR=/home/squinky86/w64/zlib-1.2.11 -DZLIB_LIBRARY=/home/squinky86/w64/zlib-1.2.11/libzlibstatic.a) then link the hello world binary (https://libxlsxwriter.github.io/hello_8c-example.html) against both the libcrypto and libxlsxwriter libraries.

@Alexhuszagh: SHA/BLAKE2 make more sense as the "correct" solutions. MD5 is an acceptable solution, but you can always create a use case where there's a collision (https://www.hacksandsecurity.org/posts/two-images-have-same-md5-hash-md5-collision-example). Adler and CRC combination makes sense as the most efficient solution without increasing static binary size.

<!-- gh-comment-id:571271323 --> @squinky86 commented on GitHub (Jan 6, 2020): @jmcnamara: I was able to reproduce by compiling a static openssl (./Configure mingw64 no-shared no-asm no-err no-tests CROSS_COMPILE=x86_64-w64-mingw32-) and static libxlsxwriter (cmake ./ -DUSE_STANDARD_TMPFILE=OFF -DCMAKE_TOOLCHAIN_FILE=../cross-compile.cmake -DBUILD_SHARED_LIBS=OFF -DZLIB_INCLUDE_DIR=/home/squinky86/w64/zlib-1.2.11 -DZLIB_LIBRARY=/home/squinky86/w64/zlib-1.2.11/libzlibstatic.a) then link the hello world binary (https://libxlsxwriter.github.io/hello_8c-example.html) against both the libcrypto and libxlsxwriter libraries. @Alexhuszagh: SHA/BLAKE2 make more sense as the "correct" solutions. MD5 is an acceptable solution, but you can always create a use case where there's a collision (https://www.hacksandsecurity.org/posts/two-images-have-same-md5-hash-md5-collision-example). Adler and CRC combination makes sense as the most efficient solution without increasing static binary size.
Author
Owner

@Alexhuszagh commented on GitHub (Jan 6, 2020):

@RaFaeL-NN That's not truly an acceptable solution, because it would assume both checksums are independent, which I'm not sure they are. In fact, I'm highly dubious they are.

@squinky86 Yeah, ideally we'd use SHA-2 or SHA-3, but MD5 works as long as we're assuming no malicious data (malicious data would then reduce the collision chance to 2^24, which is even worse than the checksum). Anyway, it's up to @jmcnamara to ultimately decide what hash algorithm to use. Ideally, I'd be a proponent of SHA-2 or SHA-3, which can also be found in OpenSSL (or elsewhere). Blake would work too, for sure, but that may be overkill.

<!-- gh-comment-id:571332430 --> @Alexhuszagh commented on GitHub (Jan 6, 2020): @RaFaeL-NN That's not truly an acceptable solution, because it would assume both checksums are independent, which I'm not sure they are. In fact, I'm highly dubious they are. @squinky86 Yeah, ideally we'd use SHA-2 or SHA-3, but MD5 works as long as we're assuming no malicious data (malicious data would then reduce the collision chance to 2^24, which is even worse than the checksum). Anyway, it's up to @jmcnamara to ultimately decide what hash algorithm to use. Ideally, I'd be a proponent of SHA-2 or SHA-3, which can also be found in OpenSSL (or elsewhere). Blake would work too, for sure, but that may be overkill.
Author
Owner

@jmcnamara commented on GitHub (Jan 7, 2020):

Thanks everyone for the input. It was a good discussion.

I'm going to avoid getting involved with OpenSSL/libcrypto and just rename the embedded MD5 functions, which I should probably have done from the start.

<!-- gh-comment-id:571507912 --> @jmcnamara commented on GitHub (Jan 7, 2020): Thanks everyone for the input. It was a good discussion. I'm going to avoid getting involved with OpenSSL/libcrypto and just rename the embedded MD5 functions, which I should probably have done from the start.
Author
Owner

@jmcnamara commented on GitHub (Jan 7, 2020):

I've push a fix for this to master. @squinky86 can you try it when you get a chance.

<!-- gh-comment-id:571826567 --> @jmcnamara commented on GitHub (Jan 7, 2020): I've push a fix for this to master. @squinky86 can you try it when you get a chance.
Author
Owner

@jmcnamara commented on GitHub (Jan 9, 2020):

Fixed upstream. Closing.

<!-- gh-comment-id:572468889 --> @jmcnamara commented on GitHub (Jan 9, 2020): Fixed upstream. Closing.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#205
No description provided.