mirror of
https://github.com/jmcnamara/libxlsxwriter.git
synced 2026-05-15 14:15:54 -06:00
[GH-ISSUE #272] Add third party dtoa library #219
Labels
No labels
awaiting user feedback
bug
cmake
cmake
docs
feature request
in progress
long term
medium term
medium term
pull-request
question
question
ready to close
short term
under investigation
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: github-starred/libxlsxwriter#219
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @jmcnamara on GitHub (Jan 26, 2020).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/272
Originally assigned to: @jmcnamara on GitHub.
I have started to look at using a third party library to do dtoa (double to string) formatting. Currently this is on the dtoa branch.
This is in order to avoid locale issues with double sprintf() formatting (for example getting "1,234" instead of the "1.234" required by Excel. For more background and a discussion of other workarounds see #64
The code is working on the dtoa branch with Mac/Linux/Window but I'm still testing it. If you would like to test it then please let me know how you get on.
@utelle commented on GitHub (Jan 27, 2020):
Great to see that you started to work on this feature. 😄
To make the emyg_dtoa code work with older C compilers, too, I had to make some further adjustments (see emyg_dtoa.c). Maybe you could consider to adopt those modifications.
@jmcnamara commented on GitHub (Jan 27, 2020):
I'm still testing it out. I need to add some double specific tests as well to see if there are any issues.
Do you mean including a version of stdint.h? Are there other changes as well?
I don't plan to include a copy of stdint.h so people with Windows compilers older than VS 2010 will either need to figure out a way of including it or use the standard dtoa formatting. I think that this be a small subset of potential users.
@utelle commented on GitHub (Jan 27, 2020):
Sure, I understand that very well.
Yes, at several places I had to move variable definitions to the top of a code block, because older compilers (not compatible with C99) complain otherwise.
Fair comment. However, quite a few people are still using VS 2010, even though it is a rather dated compiler, and VS 2010 for example does not support C99 (as far as I know, no VS compiler version does fully support C99). So, if you want to support VS 2010 you need to address its incompatibilities (like not having
stdint.h) somehow.@znakeeye commented on GitHub (Apr 20, 2020):
Why choose an outdated less performant version of
dtoa?!Since 2018, most C++ compilers switched to the brand new
Ryualgorithm (paper can be found here). And here's a post from the author ofgrisuhimself, citing this new algorithm. The author of Ryu, Ulf Adams, has an implementation inChere at github.Visual Studio incorporated this algorithm back in 2017. Release notes here.
@utelle commented on GitHub (Apr 20, 2020):
I mentioned Ryu already in January in one of my comments to issue 64.
It is true that Visual C++ 2017 adopted Ryu for
to_chars. However, this function is only accessible from C++17 code. And libxlsxwriter is written in C.Of course, it would be possible to replace the emyg_dtoa code by Ulf Adams' ryu C code. However, I found out that the conversion results differ, although only slightly. While emyg_dtoa tries to produce the shortest possible string, ryu always appends the exponent. Examples:
The example shows another effect: obviously the last significant digit is (at least sometimes) rounded differently by the two algorithms.
Most likely both differences don't matter much in practice.
@znakeeye commented on GitHub (Apr 20, 2020):
Interesting. And Excel seems to format the latter as
0,617283945061728.@utelle commented on GitHub (Apr 20, 2020):
In the GUI Excel displays at most 15 significant digits. However, internally (that is, for the representation in the file itself) up to 17 significant digits are used.
In the GUI the decimal separator depends on the user's locale (or the user's settings). In the file always a point is used as the decimal separator.
@znakeeye commented on GitHub (Apr 20, 2020):
(I used a custom cell decimal format with 30 decimals. That didn't change the UI.)
@utelle commented on GitHub (Apr 20, 2020):
Such a custom cell format allows you to display a value like
1.23456789E-22as0.000000000000000000000123456789. However, it does not affect the number of significant digits. If you enter more than 15 significant digits in the Excel UI, the exceeding digits will be cut off.@utelle commented on GitHub (Apr 21, 2020):
In the meantime I added
Ryuas an alternative toemygto my locallibxlsxwriterrepository, compiled it, and ran the functional tests. It works as expected, and Excel opens the resulting xlsx files flawlessly.Resulting file sizes are slightly bigger, on average 0.2 percent - that is, neglectable. There was no measurable effect on runtime, but that was to be expected for the small test cases. IMHO replacing
emygbyRyuwill have an effect only, if the generated Excel file contains many floating point values.@jmcnamara commented on GitHub (Apr 22, 2020):
@utelle Can you push the RYU alternative to a branch of your repo so that I test it.
@znakeeye commented on GitHub (Apr 22, 2020):
How many? 😛
Will it have an impact on, let's say, 200 000 values?
@utelle commented on GitHub (Apr 22, 2020):
@jmcnamara I'll do that later today.
Since RYU adds an exponent field to all floating point numbers, the resulting Excel files differ from the ones that are used for comparison in the test cases. That is, all tests fail in that respect. Nevertheless, Excel can successfully open all generated files and from the user's perspective they are identical.
@utelle commented on GitHub (Apr 22, 2020):
Good question. I have not done any performance tests yet.
Probably yes, but my guess is that it will be smaller than you may expect. In respect to speed the
emyg_dtoaalgorithm , grisu, is already much better thansprintf.@utelle commented on GitHub (Apr 22, 2020):
@jmcnamara I created branch ryu_test in my repository. On invoking
makespecify eitherUSE_DTOA_EMYGorUSE_DTOA_RYUin addition toUSE_DOUBLE_FUNCTIONto select which double formatting should be used, EMYG or RYU.@jmcnamara commented on GitHub (Jul 11, 2021):
@utelle I've dusted off this work again with e EMYG library on the dtoa branch and rebased it to main. Can you try it when/if you get a chance and let me know if you encounter any issues.
It is an option compilation so you will need to pass "USE_DTOA_LIBRARY=1" to
makeor "-DUSE_DTOA_LIBRARY=ON" to cmake.If there are no issues I'll merge it into main and put it in the next release.
@utelle commented on GitHub (Jul 11, 2021):
I tested the
dtoabranch. Unfortunately, thedtoaimplementation is flawed. In line 429 and line 437 ofemyg_dtoa.cyou added an explicit plus sign for the exponent to the output (not present in the original implementation). This leads to invalid output in case of a negative exponent. For example, setting a cell to1.2e-17results in1.2E+-17in the resulting Excel file ... and Excel chokes on such a file and can't repair it.Lines 429 and 437 should be removed and the array index has to be adjusted in line 430 to
resp line 438 to
If you want an explicit plus sign in the exponent, you should modify function
WriteExponentto do so.@jmcnamara commented on GitHub (Jul 11, 2021):
Thanks. I'll fix that.
@jmcnamara commented on GitHub (Jul 11, 2021):
@utelle I've pushed a fix, with a test case. I used a force push so you will need to get the latest code from the branch again.
@utelle commented on GitHub (Jul 11, 2021):
The new version works now as expected.
@jmcnamara commented on GitHub (Jul 12, 2021):
This has been merged to main and released in libxlsxwriter version 1.1.1.