[GH-ISSUE #272] Add third party dtoa library #219

Closed
opened 2026-05-05 11:57:26 -06:00 by gitea-mirror · 21 comments
Owner

Originally created by @jmcnamara on GitHub (Jan 26, 2020).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/272

Originally assigned to: @jmcnamara on GitHub.

I have started to look at using a third party library to do dtoa (double to string) formatting. Currently this is on the dtoa branch.

This is in order to avoid locale issues with double sprintf() formatting (for example getting "1,234" instead of the "1.234" required by Excel. For more background and a discussion of other workarounds see #64

The code is working on the dtoa branch with Mac/Linux/Window but I'm still testing it. If you would like to test it then please let me know how you get on.

Originally created by @jmcnamara on GitHub (Jan 26, 2020). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/272 Originally assigned to: @jmcnamara on GitHub. I have started to look at using a third party library to do dtoa (double to string) formatting. Currently this is on the dtoa branch. This is in order to avoid locale issues with double sprintf() formatting (for example getting "1,234" instead of the "1.234" required by Excel. For more background and a discussion of other workarounds see #64 The code is working on the dtoa branch with Mac/Linux/Window but I'm still testing it. If you would like to test it then please let me know how you get on.
gitea-mirror 2026-05-05 11:57:26 -06:00
Author
Owner

@utelle commented on GitHub (Jan 27, 2020):

Great to see that you started to work on this feature. 😄

To make the emyg_dtoa code work with older C compilers, too, I had to make some further adjustments (see emyg_dtoa.c). Maybe you could consider to adopt those modifications.

<!-- gh-comment-id:578716010 --> @utelle commented on GitHub (Jan 27, 2020): Great to see that you started to work on this feature. 😄 To make the *emyg_dtoa* code work with older C compilers, too, I had to make some further adjustments (see [emyg_dtoa.c](https://github.com/utelle/libxlsxwriter/blob/master/third_party/emyg/emyg_dtoa.c)). Maybe you could consider to adopt those modifications.
Author
Owner

@jmcnamara commented on GitHub (Jan 27, 2020):

Great to see that you started to work on this feature.

I'm still testing it out. I need to add some double specific tests as well to see if there are any issues.

To make the emyg_dtoa code work with older C compilers, too,

Do you mean including a version of stdint.h? Are there other changes as well?

I don't plan to include a copy of stdint.h so people with Windows compilers older than VS 2010 will either need to figure out a way of including it or use the standard dtoa formatting. I think that this be a small subset of potential users.

<!-- gh-comment-id:578732568 --> @jmcnamara commented on GitHub (Jan 27, 2020): > Great to see that you started to work on this feature. I'm still testing it out. I need to add some double specific tests as well to see if there are any issues. > To make the _emyg_dtoa_ code work with older C compilers, too, Do you mean including a version of stdint.h? Are there other changes as well? I don't plan to include a copy of stdint.h so people with Windows compilers older than VS 2010 will either need to figure out a way of including it or use the standard dtoa formatting. I think that this be a small subset of potential users.
Author
Owner

@utelle commented on GitHub (Jan 27, 2020):

I'm still testing it out. I need to add some double specific tests as well to see if there are any issues.

Sure, I understand that very well.

Do you mean including a version of stdint.h? Are there other changes as well?

Yes, at several places I had to move variable definitions to the top of a code block, because older compilers (not compatible with C99) complain otherwise.

I don't plan to include a copy of stdint.h so people with Windows compilers older than VS 2010 ...

Fair comment. However, quite a few people are still using VS 2010, even though it is a rather dated compiler, and VS 2010 for example does not support C99 (as far as I know, no VS compiler version does fully support C99). So, if you want to support VS 2010 you need to address its incompatibilities (like not having stdint.h) somehow.

<!-- gh-comment-id:578755735 --> @utelle commented on GitHub (Jan 27, 2020): > I'm still testing it out. I need to add some double specific tests as well to see if there are any issues. Sure, I understand that very well. > Do you mean including a version of stdint.h? Are there other changes as well? Yes, at several places I had to move variable definitions to the top of a code block, because older compilers (not compatible with C99) complain otherwise. > I don't plan to include a copy of stdint.h so people with Windows compilers older than VS 2010 ... Fair comment. However, quite a few people are still using VS 2010, even though it is a rather dated compiler, and VS 2010 for example does not support C99 (as far as I know, no VS compiler version does fully support C99). So, if you want to support VS 2010 you need to address its incompatibilities (like not having `stdint.h`) somehow.
Author
Owner

@znakeeye commented on GitHub (Apr 20, 2020):

Why choose an outdated less performant version of dtoa?!

Since 2018, most C++ compilers switched to the brand new Ryu algorithm (paper can be found here). And here's a post from the author of grisu himself, citing this new algorithm. The author of Ryu, Ulf Adams, has an implementation in C here at github.

Visual Studio incorporated this algorithm back in 2017. Release notes here.

We implemented the shortest round-trip decimal overloads of floating-point to_chars() in C++17's charconv header. For scientific notation, it is approximately 10x as fast as sprintf_s() "%.8e" for floats, and 30x as fast as sprintf_s() "%.16e" for doubles. This uses Ulf Adams' new algorithm, Ryu.

<!-- gh-comment-id:616502297 --> @znakeeye commented on GitHub (Apr 20, 2020): Why choose an outdated less performant version of `dtoa`?! Since 2018, most C++ compilers switched to the brand new `Ryu` algorithm (paper can be found [here](https://dl.acm.org/doi/10.1145/3192366.3192369)). And [here's a post](https://news.ycombinator.com/item?id=20182632) from the author of `grisu` himself, citing this new algorithm. The author of Ryu, Ulf Adams, has an implementation in `C` here [at github](https://github.com/ulfjack/ryu). Visual Studio incorporated this algorithm back in 2017. Release notes [here](https://docs.microsoft.com/en-us/visualstudio/releasenotes/vs2017-relnotes). > We implemented the shortest round-trip decimal overloads of floating-point to_chars() in C++17's charconv header. For scientific notation, it is approximately 10x as fast as sprintf_s() "%.8e" for floats, and 30x as fast as sprintf_s() "%.16e" for doubles. This uses Ulf Adams' new algorithm, Ryu.
Author
Owner

@utelle commented on GitHub (Apr 20, 2020):

I mentioned Ryu already in January in one of my comments to issue 64.

It is true that Visual C++ 2017 adopted Ryu for to_chars. However, this function is only accessible from C++17 code. And libxlsxwriter is written in C.

Of course, it would be possible to replace the emyg_dtoa code by Ulf Adams' ryu C code. However, I found out that the conversion results differ, although only slightly. While emyg_dtoa tries to produce the shortest possible string, ryu always appends the exponent. Examples:

double x1 = 1.23456789012345678;
/* emyg_dtoa(x1) => "1.2345678901234568" */
/* ryu(x1)       => "1.2345678901234567E0" */
double x2 = 0.61728394506172835;
/* emyg_dtoa(x2) => "0.6172839450617284" */
/* ryu(x2)       => "6.172839450617283E-1" */

The example shows another effect: obviously the last significant digit is (at least sometimes) rounded differently by the two algorithms.

Most likely both differences don't matter much in practice.

<!-- gh-comment-id:616602799 --> @utelle commented on GitHub (Apr 20, 2020): I mentioned _Ryu_ already in January in one of my comments to [issue 64](https://github.com/jmcnamara/libxlsxwriter/issues/64#issuecomment-576413310). It is true that Visual C++ 2017 adopted _Ryu_ for `to_chars`. However, this function is only accessible from C++17 code. And **libxlsxwriter** is written in C. Of course, it would be possible to replace the _emyg\_dtoa_ code by Ulf Adams' _ryu_ C code. However, I found out that the conversion results differ, although only slightly. While _emyg\_dtoa_ tries to produce the shortest possible string, _ryu_ always appends the exponent. Examples: ```c double x1 = 1.23456789012345678; /* emyg_dtoa(x1) => "1.2345678901234568" */ /* ryu(x1) => "1.2345678901234567E0" */ double x2 = 0.61728394506172835; /* emyg_dtoa(x2) => "0.6172839450617284" */ /* ryu(x2) => "6.172839450617283E-1" */ ``` The example shows another effect: obviously the last significant digit is (at least sometimes) rounded differently by the two algorithms. Most likely both differences don't matter much in practice.
Author
Owner

@znakeeye commented on GitHub (Apr 20, 2020):

Interesting. And Excel seems to format the latter as 0,617283945061728.

<!-- gh-comment-id:616647631 --> @znakeeye commented on GitHub (Apr 20, 2020): Interesting. And Excel seems to format the latter as `0,617283945061728`.
Author
Owner

@utelle commented on GitHub (Apr 20, 2020):

In the GUI Excel displays at most 15 significant digits. However, internally (that is, for the representation in the file itself) up to 17 significant digits are used.

In the GUI the decimal separator depends on the user's locale (or the user's settings). In the file always a point is used as the decimal separator.

<!-- gh-comment-id:616753991 --> @utelle commented on GitHub (Apr 20, 2020): In the GUI Excel displays at most 15 significant digits. However, internally (that is, for the representation in the file itself) up to 17 significant digits are used. In the GUI the decimal separator depends on the user's locale (or the user's settings). In the file always a point is used as the decimal separator.
Author
Owner

@znakeeye commented on GitHub (Apr 20, 2020):

(I used a custom cell decimal format with 30 decimals. That didn't change the UI.)

<!-- gh-comment-id:616755123 --> @znakeeye commented on GitHub (Apr 20, 2020): (I used a custom cell decimal format with 30 decimals. That didn't change the UI.)
Author
Owner

@utelle commented on GitHub (Apr 20, 2020):

Such a custom cell format allows you to display a value like 1.23456789E-22 as 0.000000000000000000000123456789. However, it does not affect the number of significant digits. If you enter more than 15 significant digits in the Excel UI, the exceeding digits will be cut off.

<!-- gh-comment-id:616771784 --> @utelle commented on GitHub (Apr 20, 2020): Such a custom cell format allows you to display a value like `1.23456789E-22` as `0.000000000000000000000123456789`. However, it does not affect the number of significant digits. If you enter more than 15 significant digits in the Excel UI, the exceeding digits will be cut off.
Author
Owner

@utelle commented on GitHub (Apr 21, 2020):

In the meantime I added Ryu as an alternative to emyg to my local libxlsxwriter repository, compiled it, and ran the functional tests. It works as expected, and Excel opens the resulting xlsx files flawlessly.

Resulting file sizes are slightly bigger, on average 0.2 percent - that is, neglectable. There was no measurable effect on runtime, but that was to be expected for the small test cases. IMHO replacing emyg by Ryu will have an effect only, if the generated Excel file contains many floating point values.

<!-- gh-comment-id:617266889 --> @utelle commented on GitHub (Apr 21, 2020): In the meantime I added `Ryu` as an alternative to `emyg` to my local `libxlsxwriter` repository, compiled it, and ran the functional tests. It works as expected, and Excel opens the resulting xlsx files flawlessly. Resulting file sizes are slightly bigger, on average 0.2 percent - that is, neglectable. There was no measurable effect on runtime, but that was to be expected for the small test cases. IMHO replacing `emyg` by `Ryu` will have an effect only, if the generated Excel file contains **many** floating point values.
Author
Owner

@jmcnamara commented on GitHub (Apr 22, 2020):

@utelle Can you push the RYU alternative to a branch of your repo so that I test it.

<!-- gh-comment-id:617682939 --> @jmcnamara commented on GitHub (Apr 22, 2020): @utelle Can you push the RYU alternative to a branch of your repo so that I test it.
Author
Owner

@znakeeye commented on GitHub (Apr 22, 2020):

In the meantime I added Ryu as an alternative to emyg to my local libxlsxwriter repository, compiled it, and ran the functional tests. It works as expected, and Excel opens the resulting xlsx files flawlessly.

Resulting file sizes are slightly bigger, on average 0.2 percent - that is, neglectable. There was no measurable effect on runtime, but that was to be expected for the small test cases. IMHO replacing emyg by Ryu will have an effect only, if the generated Excel file contains many floating point values.

How many? 😛

Will it have an impact on, let's say, 200 000 values?

<!-- gh-comment-id:617687615 --> @znakeeye commented on GitHub (Apr 22, 2020): > In the meantime I added `Ryu` as an alternative to `emyg` to my local `libxlsxwriter` repository, compiled it, and ran the functional tests. It works as expected, and Excel opens the resulting xlsx files flawlessly. > > Resulting file sizes are slightly bigger, on average 0.2 percent - that is, neglectable. There was no measurable effect on runtime, but that was to be expected for the small test cases. IMHO replacing `emyg` by `Ryu` will have an effect only, if the generated Excel file contains **many** floating point values. How many? :stuck_out_tongue: Will it have an impact on, let's say, 200 000 values?
Author
Owner

@utelle commented on GitHub (Apr 22, 2020):

Can you push the RYU alternative to a branch of your repo so that I test it.

@jmcnamara I'll do that later today.

Since RYU adds an exponent field to all floating point numbers, the resulting Excel files differ from the ones that are used for comparison in the test cases. That is, all tests fail in that respect. Nevertheless, Excel can successfully open all generated files and from the user's perspective they are identical.

<!-- gh-comment-id:617689898 --> @utelle commented on GitHub (Apr 22, 2020): > Can you push the RYU alternative to a branch of your repo so that I test it. @jmcnamara I'll do that later today. Since RYU adds an exponent field to **all** floating point numbers, the resulting Excel files differ from the ones that are used for comparison in the test cases. That is, all tests fail in that respect. Nevertheless, Excel can successfully open all generated files and from the user's perspective they are identical.
Author
Owner

@utelle commented on GitHub (Apr 22, 2020):

How many? 😛

Good question. I have not done any performance tests yet.

Will it have an impact on, let's say, 200 000 values?

Probably yes, but my guess is that it will be smaller than you may expect. In respect to speed the emyg_dtoa algorithm , grisu, is already much better than sprintf.

<!-- gh-comment-id:617694103 --> @utelle commented on GitHub (Apr 22, 2020): > How many? 😛 Good question. I have not done any performance tests yet. > Will it have an impact on, let's say, 200 000 values? Probably yes, but my guess is that it will be smaller than you may expect. In respect to speed the `emyg_dtoa` algorithm , _grisu_, is already much better than `sprintf`.
Author
Owner

@utelle commented on GitHub (Apr 22, 2020):

@jmcnamara I created branch ryu_test in my repository. On invoking make specify either USE_DTOA_EMYG or USE_DTOA_RYU in addition to USE_DOUBLE_FUNCTION to select which double formatting should be used, EMYG or RYU.

<!-- gh-comment-id:617765617 --> @utelle commented on GitHub (Apr 22, 2020): @jmcnamara I created branch [ryu_test](https://github.com/utelle/libxlsxwriter/tree/ryu_test) in my repository. On invoking `make` specify either `USE_DTOA_EMYG` or `USE_DTOA_RYU` in addition to `USE_DOUBLE_FUNCTION` to select which double formatting should be used, _EMYG_ or _RYU_.
Author
Owner

@jmcnamara commented on GitHub (Jul 11, 2021):

@utelle I've dusted off this work again with e EMYG library on the dtoa branch and rebased it to main. Can you try it when/if you get a chance and let me know if you encounter any issues.

It is an option compilation so you will need to pass "USE_DTOA_LIBRARY=1" to make or "-DUSE_DTOA_LIBRARY=ON" to cmake.

If there are no issues I'll merge it into main and put it in the next release.

<!-- gh-comment-id:877796283 --> @jmcnamara commented on GitHub (Jul 11, 2021): @utelle I've dusted off this work again with e EMYG library on the dtoa branch and rebased it to main. Can you try it when/if you get a chance and let me know if you encounter any issues. It is an option compilation so you will need to pass "USE_DTOA_LIBRARY=1" to `make` or "-DUSE_DTOA_LIBRARY=ON" to cmake. If there are no issues I'll merge it into main and put it in the next release.
Author
Owner

@utelle commented on GitHub (Jul 11, 2021):

I tested the dtoa branch. Unfortunately, the dtoa implementation is flawed. In line 429 and line 437 of emyg_dtoa.c you added an explicit plus sign for the exponent to the output (not present in the original implementation). This leads to invalid output in case of a negative exponent. For example, setting a cell to 1.2e-17 results in 1.2E+-17 in the resulting Excel file ... and Excel chokes on such a file and can't repair it.

Lines 429 and 437 should be removed and the array index has to be adjusted in line 430 to

		WriteExponent(kk - 1, &buffer[2]);

resp line 438 to

		WriteExponent(kk - 1, &buffer[0 + length + 2]);

If you want an explicit plus sign in the exponent, you should modify function WriteExponent to do so.

<!-- gh-comment-id:877821763 --> @utelle commented on GitHub (Jul 11, 2021): I tested the `dtoa` branch. Unfortunately, the `dtoa` implementation is flawed. In [line 429](https://github.com/jmcnamara/libxlsxwriter/blob/6d2fe14ecc1bfb3e947dbb65feed7d5096b8ee64/third_party/dtoa/emyg_dtoa.c#L429) and [line 437](https://github.com/jmcnamara/libxlsxwriter/blob/6d2fe14ecc1bfb3e947dbb65feed7d5096b8ee64/third_party/dtoa/emyg_dtoa.c#L437) of `emyg_dtoa.c` you added an explicit plus sign for the exponent to the output (not present in the original implementation). This leads to invalid output in case of a negative exponent. For example, setting a cell to `1.2e-17` results in `1.2E+-17` in the resulting Excel file ... and Excel chokes on such a file and can't repair it. Lines **429** and **437** should be **removed** and the **array index has to be adjusted** in line 430 to ```c WriteExponent(kk - 1, &buffer[2]); ``` resp line 438 to ```C WriteExponent(kk - 1, &buffer[0 + length + 2]); ``` If you want an explicit plus sign in the exponent, you should modify function `WriteExponent` to do so.
Author
Owner

@jmcnamara commented on GitHub (Jul 11, 2021):

Thanks. I'll fix that.

<!-- gh-comment-id:877824023 --> @jmcnamara commented on GitHub (Jul 11, 2021): Thanks. I'll fix that.
Author
Owner

@jmcnamara commented on GitHub (Jul 11, 2021):

@utelle I've pushed a fix, with a test case. I used a force push so you will need to get the latest code from the branch again.

<!-- gh-comment-id:877833134 --> @jmcnamara commented on GitHub (Jul 11, 2021): @utelle I've pushed a fix, with a test case. I used a force push so you will need to get the latest code from the branch again.
Author
Owner

@utelle commented on GitHub (Jul 11, 2021):

The new version works now as expected.

<!-- gh-comment-id:877835522 --> @utelle commented on GitHub (Jul 11, 2021): The new version works now as expected.
Author
Owner

@jmcnamara commented on GitHub (Jul 12, 2021):

This has been merged to main and released in libxlsxwriter version 1.1.1.

<!-- gh-comment-id:878643609 --> @jmcnamara commented on GitHub (Jul 12, 2021): This has been merged to main and released in libxlsxwriter version 1.1.1.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#219
No description provided.