[GH-ISSUE #64] Corrupted Excel files due to locale dependent conversion of floating point values #55

Closed
opened 2026-05-05 11:32:46 -06:00 by gitea-mirror · 37 comments
Owner

Originally created by @utelle on GitHub (Jul 8, 2016).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/64

Originally assigned to: @jmcnamara on GitHub.

I develop an i18n application. libxlsxwriter works fine as long as the default C locale or a locale that uses decimal points as the delimiter for the fractional part of a floating point value are used.

The problem is that the C functions used in libxlsxwriter, namely fprintf, lxw_snprintf, are not locale independent. That is, for example if a German locale is established, a decimal comma is used as the delimiter in formatting floating point numbers. However, Excel expects that floating point values are formatted with decimal points. Thus, libxlsxwriter creates corrupted Excel files.

To overcome this problem, MSVC provides _sprintf_l that takes a locale argument, and MinGW provides sprintf_l (with a slightly different function signature). However, I don't know whether a generic portable solution exists.

Originally created by @utelle on GitHub (Jul 8, 2016). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/64 Originally assigned to: @jmcnamara on GitHub. I develop an i18n application. `libxlsxwriter` works fine as long as the default C locale or a locale that uses **decimal points** as the delimiter for the fractional part of a floating point value are used. The problem is that the C functions used in `libxlsxwriter`, namely `fprintf`, `lxw_snprintf`, are not locale independent. That is, for example if a German locale is established, a **decimal comma** is used as the delimiter in formatting floating point numbers. However, Excel expects that floating point values are formatted with **decimal points**. Thus, `libxlsxwriter` creates corrupted Excel files. To overcome this problem, MSVC provides `_sprintf_l` that takes a locale argument, and MinGW provides `sprintf_l` (with a slightly different function signature). However, I don't know whether a generic portable solution exists.
gitea-mirror 2026-05-05 11:32:46 -06:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@jmcnamara commented on GitHub (Jul 8, 2016):

Can you override the locale at the application level, or at least around the Excel writing part?

<!-- gh-comment-id:231469739 --> @jmcnamara commented on GitHub (Jul 8, 2016): Can you override the locale at the application level, or at least around the Excel writing part?
Author
Owner

@evanmiller commented on GitHub (Jul 8, 2016):

@jmcnamara You should be able to save the current locale, change the locale to "C" or "POSIX", do your printing, then restore the original locale when done printing. E.g. something like

char *orig_locale = strdup(setlocale(LC_NUMERIC, NULL));
setlocale(LC_NUMERIC, "C");
printf(...)
setlocale(LC_NUMERIC, orig_locale);
free(orig_locale);
<!-- gh-comment-id:231473952 --> @evanmiller commented on GitHub (Jul 8, 2016): @jmcnamara You should be able to save the current locale, change the locale to "C" or "POSIX", do your printing, then restore the original locale when done printing. E.g. something like ``` C char *orig_locale = strdup(setlocale(LC_NUMERIC, NULL)); setlocale(LC_NUMERIC, "C"); printf(...) setlocale(LC_NUMERIC, orig_locale); free(orig_locale); ```
Author
Owner

@utelle commented on GitHub (Jul 8, 2016):

How can I encourage you?

Can you override the locale at the application level, or at least around the Excel writing part?

Well, in fact, it's my application that sets the locale - dependent on the choice of the user. That is, even on a German system it will use decimal points, if the user switches the application to English.

Overriding the locale around the Excel writing part is not such a good idea, since for C the locale is a global beast - changing it would for example affect also different threads ... with unpredictable consequences.

Since I had to make a release of my application (which uses libxlsxwriter internally) yesterday I implemented a quick hack for libxlsxwriter by defining a function in utility.c that converts a double to a string in a locale independent way. And that function is called from all places where libxlsxwriterneeds to format doubles. It works, but the solution is not generic.

In general, I think the approach to separate the formatting of the doubles into its own function is the right approach. However, the implementation should work on all supported platforms.

For my Open Source project wxPdfDocument, where I also have to format doubles in a locale independent way for output to PDF, I wrote my own methods to handle this. However, in fact, it's a bit like re-inventing the wheel.

It might be easier to determine the decimal delimiter of the current locale, format the double using for example sprintf, and replace the decimal delimiter by a decimal point afterwards.

<!-- gh-comment-id:231474948 --> @utelle commented on GitHub (Jul 8, 2016): How can I encourage you? > Can you override the locale at the application level, or at least around the Excel writing part? Well, in fact, it's my application that sets the locale - dependent on the choice of the user. That is, even on a German system it will use decimal points, if the user switches the application to English. Overriding the locale around the Excel writing part is not such a good idea, since for C the locale is a global beast - changing it would for example affect also different threads ... with unpredictable consequences. Since I had to make a release of my application (which uses `libxlsxwriter` internally) yesterday I implemented a quick hack for `libxlsxwriter` by defining a function in `utility.c` that converts a double to a string in a locale independent way. And that function is called from all places where `libxlsxwriter`needs to format doubles. It works, but the solution is not generic. In general, I think the approach to separate the formatting of the doubles into its own function is the right approach. However, the implementation should work on all supported platforms. For my Open Source project [wxPdfDocument](https://github.com/utelle/wxpdfdoc), where I also have to format doubles in a locale independent way for output to PDF, I wrote my own methods to handle this. However, in fact, it's a bit like re-inventing the wheel. It might be easier to determine the decimal delimiter of the current locale, format the double using for example `sprintf`, and replace the decimal delimiter by a decimal point afterwards.
Author
Owner

@utelle commented on GitHub (Jul 8, 2016):

@evanmiller: Changing the locale in that way you proposed can have unwanted side effects in a multi-threaded environment, since the locale is a global instance.

<!-- gh-comment-id:231475635 --> @utelle commented on GitHub (Jul 8, 2016): @evanmiller: Changing the locale in that way you proposed can have unwanted side effects in a multi-threaded environment, since the locale is a global instance.
Author
Owner

@evanmiller commented on GitHub (Jul 8, 2016):

@utelle You are right, my proposal will wreck the locale in a multi-threaded environment.

<!-- gh-comment-id:231476158 --> @evanmiller commented on GitHub (Jul 8, 2016): @utelle You are right, my proposal will wreck the locale in a multi-threaded environment.
Author
Owner

@evanmiller commented on GitHub (Jul 8, 2016):

What about uselocale? It is a thread-specific alternative, and part of POSIX.1-2008.

http://man7.org/linux/man-pages/man3/uselocale.3.html

<!-- gh-comment-id:231481249 --> @evanmiller commented on GitHub (Jul 8, 2016): What about `uselocale`? It is a thread-specific alternative, and part of POSIX.1-2008. http://man7.org/linux/man-pages/man3/uselocale.3.html
Author
Owner

@utelle commented on GitHub (Jul 8, 2016):

A function to convert a double to a string in a locale independent way could look like the following:

char* lxw_print_double(char* buffer, double value)
{
  struct lconv* lc;
  sprintf(buffer, "%.16g", value);
  lc = localeconv();
  if (lc->decimal_point[0] != '.')
  {
    char* pos = strchr(buffer, lc->decimal_point[0]);
    if (pos != NULL)
    {
      buffer[pos] = '.';
    }
  }
  return buffer;
}

The passed buffer has to be large enough to hold the result of the conversion. So maybe the size of the buffer should be passed to the function, too, and then snprintfresp lxw_snprint should be used instead of sprintf.

A local buffer should not be used to avoid problems in multi-threaded environments.

<!-- gh-comment-id:231481267 --> @utelle commented on GitHub (Jul 8, 2016): A function to convert a double to a string in a locale independent way could look like the following: ``` char* lxw_print_double(char* buffer, double value) { struct lconv* lc; sprintf(buffer, "%.16g", value); lc = localeconv(); if (lc->decimal_point[0] != '.') { char* pos = strchr(buffer, lc->decimal_point[0]); if (pos != NULL) { buffer[pos] = '.'; } } return buffer; } ``` The passed `buffer` has to be large enough to hold the result of the conversion. So maybe the size of the buffer should be passed to the function, too, and then `snprintf`resp `lxw_snprint` should be used instead of `sprintf`. A local buffer should not be used to avoid problems in multi-threaded environments.
Author
Owner

@utelle commented on GitHub (Jul 8, 2016):

@evanmiller

What about uselocale? It is a thread-specific alternative, and part of POSIX.1-2008.

In principle, a good idea. Unfortunately, the MSVC runtime library does not provide the function uselocale. You would have to take special measures to enable thread-specific locales. Not impossible, but cumbersome. So under Windows it takes more efforts to solve this issue.

<!-- gh-comment-id:231483325 --> @utelle commented on GitHub (Jul 8, 2016): @evanmiller > What about uselocale? It is a thread-specific alternative, and part of POSIX.1-2008. In principle, a good idea. Unfortunately, the MSVC runtime library does not provide the function `uselocale`. You would have to take special measures to enable thread-specific locales. Not impossible, but cumbersome. So under Windows it takes more efforts to solve this issue.
Author
Owner

@evanmiller commented on GitHub (Jul 9, 2016):

@utelle This StackOverflow thread may be useful:

http://stackoverflow.com/questions/6561723/thread-specific-locale-manipulation-in-c

Win32 provides _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) which makes setlocale behave like uselocale.

<!-- gh-comment-id:231512909 --> @evanmiller commented on GitHub (Jul 9, 2016): @utelle This StackOverflow thread may be useful: http://stackoverflow.com/questions/6561723/thread-specific-locale-manipulation-in-c Win32 provides `_configthreadlocale(_ENABLE_PER_THREAD_LOCALE)` which makes `setlocale` behave like `uselocale`.
Author
Owner

@utelle commented on GitHub (Jul 9, 2016):

The more I think about this the more I get the impression that manipulating the global or thread-specific locale might not be the best approach. At least not on Windows.

libxlsxwriter is a library, and a library should not interfere with application specific settings. Using _configthreadlocale(_ENABLE_PER_THREAD_LOCALE) within libxlsxwriter might not be what the application wants.

IMHO the way to do it is to use functions like sprintf_l or to mimic their behaviour.

My code sample posted here might be a too simple approach, since there might exist locales where the decimal separator is not a single character. So one would have to replace the corresponding string, instead of just 1 character. Additionally, the active locale might request thousands separators - just to complicate matters.

<!-- gh-comment-id:231522079 --> @utelle commented on GitHub (Jul 9, 2016): The more I think about this the more I get the impression that manipulating the global or thread-specific locale might not be the best approach. At least not on Windows. `libxlsxwriter` is a library, and a library should not interfere with application specific settings. Using `_configthreadlocale(_ENABLE_PER_THREAD_LOCALE)` within `libxlsxwriter` might not be what the application wants. IMHO the way to do it is to use functions like `sprintf_l` or to mimic their behaviour. My code sample posted [here](#issuecomment-231481267) might be a too simple approach, since there might exist locales where the decimal separator is not a single character. So one would have to replace the corresponding string, instead of just 1 character. Additionally, the active locale might request thousands separators - just to complicate matters.
Author
Owner

@jmcnamara commented on GitHub (Jul 9, 2016):

libxlsxwriter is a library, and a library should not interfere with application specific settings.

I don't think locale changes should be handled in the library either. My current thoughts on how to handle this are:

  1. Do nothing, declare defeat and go home.
  2. Document the locale limitation and suggest ways that this can be handled at the application layer via setlocale() and uselocale() (thanks @evanmiller).
  3. Macro-ize all (or some) of the fprintf() calls, like all the snprintf calls, and allow Windows users to override them with the _l alternatives. Perhaps with some sample functions embedded in the code.
  4. Don't handle "%g" in fprintf() so that a modification to snprintf() is sufficient. The direct fprintf() handling was added in 5173f40ac7 (diff-20827109fa4f6c0bac299b5c9184f20fL2179) as a performance optimization. This could be #defined code so that the optimization is still available.
  5. A combination of all of the above, apart from number 1.

@utelle could you investigate if replacing lxw_snprintf() on Windows with a _l() variant will work. You can replace the code in write_number_cell() with the code from the diff above in number 4.

<!-- gh-comment-id:231536920 --> @jmcnamara commented on GitHub (Jul 9, 2016): > libxlsxwriter is a library, and a library should not interfere with application specific settings. I don't think locale changes should be handled in the library either. My current thoughts on how to handle this are: 1. Do nothing, declare defeat and go home. 2. Document the locale limitation and suggest ways that this can be handled at the application layer via `setlocale()` and `uselocale()` (thanks @evanmiller). 3. Macro-ize all (or some) of the `fprintf()` calls, like all the `snprintf` calls, and allow Windows users to override them with the `_l` alternatives. Perhaps with some sample functions embedded in the code. 4. Don't handle `"%g"` in `fprintf()` so that a modification to `snprintf()` is sufficient. The direct `fprintf()` handling was added in https://github.com/jmcnamara/libxlsxwriter/commit/5173f40ac73009472467b199f5bae1085612ff72#diff-20827109fa4f6c0bac299b5c9184f20fL2179 as a performance optimization. This could be `#defined` code so that the optimization is still available. 5. A combination of all of the above, apart from number 1. @utelle could you investigate if replacing `lxw_snprintf()` on Windows with a `_l()` variant will work. You can replace the code in `write_number_cell()` with the code from the diff above in number 4.
Author
Owner

@utelle commented on GitHub (Jul 9, 2016):

I don't think locale changes should be handled in the library either. My current thoughts on how to handle this are:
(1) Do nothing, declare defeat and go home.

Please don't. For developers like me support for i18n applications is important, and it would be a big advantage for libxlsxwriter to support i18n out of the box. I will certainly help to find solutions.

(2) Document the locale limitation and suggest ways that this can be handled at the application layer via setlocale() and uselocale() (thanks @evanmiller).

For i18n applications it would make life difficult for developers. One would always have to keep in mind that locales have to be changed before and reverted after doing Excel export. I would certainly prefer a solution that hides this difficulty from the developer using libxlsxwriter.

And IMHO it is not that difficult: encapsulate the formatting of doubles into a separate function and provide implementations that work for most if not all platforms and compilers. I've done it already, but the solution has to be extended to support at least all major platforms.

(3) Macro-ize all (or some) of the fprintf() calls, like all the snprintf calls, and allow Windows users to override them with the _l alternatives. Perhaps with some sample functions embedded in the code.

As already said above: just encapsulate the double formatting into a function that returns the formatted string. Then the rest of the code can stay as it is by just using %s instead of %.16g.

(4) Don't handle "%g" in fprintf() so that a modification to snprintf() is sufficient. The direct fprintf() handling was added in 5173f40#diff-20827109fa4f6c0bac299b5c9184f20fL2179 as a performance optimization. This could be #defined code so that the optimization is still available.

See above.

could you investigate if replacing lxw_snprintf() on Windows with a _l() variant will work. You can replace the code in write_number_cell() with the code from the diff above in number 4.

The _l variant works definitely for MSVC. However, as mentioned earlier MinGW defines a different variant, and other compilers might not have it at all. So I will do some more digging to find a solution that works in a more or less portable way.

<!-- gh-comment-id:231542138 --> @utelle commented on GitHub (Jul 9, 2016): > I don't think locale changes should be handled in the library either. My current thoughts on how to handle this are: > (1) Do nothing, declare defeat and go home. Please don't. For developers like me support for i18n applications is important, and it would be a big advantage for `libxlsxwriter` to support i18n out of the box. I will certainly help to find solutions. > (2) Document the locale limitation and suggest ways that this can be handled at the application layer via setlocale() and uselocale() (thanks @evanmiller). For i18n applications it would make life difficult for developers. One would always have to keep in mind that locales have to be changed before and reverted after doing Excel export. I would certainly prefer a solution that hides this difficulty from the developer using `libxlsxwriter`. And IMHO it is not **that** difficult: encapsulate the formatting of doubles into a separate function and provide implementations that work for most if not all platforms and compilers. I've done it already, but the solution has to be extended to support at least all major platforms. > (3) Macro-ize all (or some) of the fprintf() calls, like all the snprintf calls, and allow Windows users to override them with the _l alternatives. Perhaps with some sample functions embedded in the code. As already said above: just encapsulate the double formatting into a function that returns the formatted string. Then the rest of the code can stay as it is by just using `%s` instead of `%.16g`. > (4) Don't handle "%g" in fprintf() so that a modification to snprintf() is sufficient. The direct fprintf() handling was added in 5173f40#diff-20827109fa4f6c0bac299b5c9184f20fL2179 as a performance optimization. This could be #defined code so that the optimization is still available. See above. > could you investigate if replacing lxw_snprintf() on Windows with a _l() variant will work. You can replace the code in write_number_cell() with the code from the diff above in number 4. The `_l` variant works definitely for MSVC. However, as mentioned earlier MinGW defines a different variant, and other compilers might not have it at all. So I will do some more digging to find a solution that works in a more or less portable way.
Author
Owner

@utelle commented on GitHub (Jul 10, 2016):

I did some research yesterday. Since sprintf_l or variants thereof don't seem to be generally available (MSVC has it, BSD also, but not Linux), and since speed can also be an issue for converting doubles to strings, I would recommend to use a private version of dtoa.

I found this interesting project, dtoa benchmark, which includes a C implementation. The benchmark claims that the latter is about 9 times faster than sprintf. The license is very permissive.

I haven't tested it myself yet, but I would give it a try. That is, encapsulate the formatting of doubles into a separate function, which then uses the emyg_dtoa implementation to perform the actual conversion.

<!-- gh-comment-id:231574169 --> @utelle commented on GitHub (Jul 10, 2016): I did some research yesterday. Since `sprintf_l` or variants thereof don't seem to be generally available (_MSVC has it, BSD also, but not Linux_), and since speed can also be an issue for converting doubles to strings, I would recommend to use a private version of `dtoa`. I found this interesting project, [dtoa benchmark](https://github.com/miloyip/dtoa-benchmark), which includes a [C implementation](https://github.com/miloyip/dtoa-benchmark/tree/master/src/emyg). The benchmark claims that the latter is about 9 times faster than `sprintf`. The license is very permissive. I haven't tested it myself yet, but I would give it a try. That is, encapsulate the formatting of doubles into a separate function, which then uses the `emyg_dtoa` implementation to perform the actual conversion.
Author
Owner

@jmcnamara commented on GitHub (Jul 10, 2016):

After thinking about his for a bit I am now leaning towards handling this in a sprintf wrapper like your code above. I'll also have a look at the dtoa code.

I'm going to tackle the tempfile issue first and then move on to this.

Thanks for the research. That is very helpful.

<!-- gh-comment-id:231577520 --> @jmcnamara commented on GitHub (Jul 10, 2016): After thinking about his for a bit I am now leaning towards handling this in a sprintf wrapper like your code above. I'll also have a look at the dtoa code. I'm going to tackle the tempfile issue first and then move on to this. Thanks for the research. That is very helpful.
Author
Owner

@utelle commented on GitHub (Jul 10, 2016):

Conversion from double to string happens 7 times in your code (1 x chart.c, 1 x custom.c, 1 x xmlwriter.c, 4 x worksheet.c). Except for 2 cases in worksheet.c where you use fprintf with surrounding text elements in the format string, it is always just a plain conversion from doube to string. That is, it should be easy to isolate and encapsulate the code for the conversion.

Personally, I would probably go for the dtoa code, although plain replacement of commas by points should work in 99.99% of all cases. Either point or comma are used as decimal separators all over the world, except for a few East Arabic countries where another character is used (see Wikipedia: Decimal mark).

Solving this issue would certainly give libxlsxwriter a competitive edge - at least for developers of i18n applications. Most Excel export libraries I came across suffer from the very same problem.

<!-- gh-comment-id:231578866 --> @utelle commented on GitHub (Jul 10, 2016): Conversion from double to string happens 7 times in your code (1 x `chart.c`, 1 x `custom.c`, 1 x `xmlwriter.c`, 4 x `worksheet.c`). Except for 2 cases in `worksheet.c` where you use `fprintf` with surrounding text elements in the format string, it is always just a plain conversion from doube to string. That is, it should be easy to isolate and encapsulate the code for the conversion. Personally, I would probably go for the `dtoa` code, although plain replacement of commas by points should work in 99.99% of all cases. Either point or comma are used as decimal separators all over the world, except for a few East Arabic countries where another character is used (see [Wikipedia: Decimal mark](https://en.wikipedia.org/wiki/Decimal_mark)). Solving this issue would certainly give `libxlsxwriter` a competitive edge - at least for developers of i18n applications. Most Excel export libraries I came across suffer from the very same problem.
Author
Owner

@utelle commented on GitHub (Jul 11, 2016):

In the meantime I tested emyg_dtoa in my local version. I had to struggle a bit to get it compiled with MSVC 2010, since the code uses C99 features that are not supported by MSVC, but finally I managed to get it working. However, I wonder whether the emyg_dtoa code was ever compiled successfully with MSVC as is, since the code seems to contain items that are not even known to MSVC 2015. I will check with MSVC 2015 later on, too.

As far as I can tell libxlsxwriter functions properly using that code.

I defined a function lxw_print_double in utility.c with the following signature

char* lxw_print_double(char* buffer, size_t buffer_size, double number);

and replaced lxw_snprintf by lxw_print_double at the places where the format code %.16g was used to convert double to string (except for the calls to fprintf where I changed the format code to %s and replaced the double argument by a call to lxw_print_double).

<!-- gh-comment-id:231703378 --> @utelle commented on GitHub (Jul 11, 2016): In the meantime I tested `emyg_dtoa` in my local version. I had to struggle a bit to get it compiled with MSVC 2010, since the code uses C99 features that are not supported by MSVC, but finally I managed to get it working. However, I wonder whether the `emyg_dtoa` code was ever compiled successfully with MSVC **as is**, since the code seems to contain items that are not even known to MSVC 2015. I will check with MSVC 2015 later on, too. As far as I can tell `libxlsxwriter` functions properly using that code. I defined a function `lxw_print_double` in `utility.c` with the following signature ``` char* lxw_print_double(char* buffer, size_t buffer_size, double number); ``` and replaced `lxw_snprintf` by `lxw_print_double` at the places where the format code `%.16g` was used to convert double to string (except for the calls to `fprintf` where I changed the format code to `%s` and replaced the double argument by a call to `lxw_print_double`).
Author
Owner

@BongoVR commented on GitHub (Aug 2, 2017):

@utelle @jmcnamara
Is the libxlsxwriter code compiled into the executable?

This problem does not seem to arise if the libxlsxwriter code is compiled into a regular DLL (aka USRDLL) statically linking to the C runtime on Windows which is not loaded at startup (compiler switches /D "_USRDLL" and /MT for Microsoft Visual Studio).

My scenario is pretty much the same as the OP's. The application is calling setlocale(LC_ALL, "") during initialisation - whatever that may mean for a particular user.
However, the libxlsxwriter DLL is consumed as a so-called "delay loaded DLL" since I am using the code only for file export (/DELAYLOAD linker option for the exe in Visual Studio). The DLL is thus not loaded until the user actually needs the DLL. Since its DllMain function lacks a call to setlocale, the DLL is happily using the "C" locale. Therefore, I never experienced that problem.

I myself am running the application using a German locale and xlsx creation worked fine so far. No thread-unsafe calls to setlocale nor tinkering with user-defined special functions like lxw_print_double for formatting floating point values have been necessary.
Let me emphasise that either the delay-load feature or the static runtime linking alone might well do the trick.

<!-- gh-comment-id:319582611 --> @BongoVR commented on GitHub (Aug 2, 2017): @utelle @jmcnamara Is the libxlsxwriter code compiled into the executable? This problem does not seem to arise if the libxlsxwriter code is compiled into a regular DLL (aka USRDLL) statically linking to the C runtime on Windows which is not loaded at startup (compiler switches `/D "_USRDLL"` and `/MT` for Microsoft Visual Studio). My scenario is pretty much the same as the OP's. The application is calling `setlocale(LC_ALL, "")` during initialisation - whatever that may mean for a particular user. However, the libxlsxwriter DLL is consumed as a so-called "delay loaded DLL" since I am using the code only for file export (`/DELAYLOAD` linker option for the exe in Visual Studio). The DLL is thus not loaded until the user actually needs the DLL. Since its DllMain function lacks a call to setlocale, the DLL is happily using the "C" locale. Therefore, I never experienced that problem. I myself am running the application using a German locale and xlsx creation worked fine so far. No thread-unsafe calls to `setlocale` nor tinkering with user-defined special functions like `lxw_print_double` for formatting floating point values have been necessary. Let me emphasise that either the delay-load feature or the static runtime linking alone might well do the trick.
Author
Owner

@jmcnamara commented on GitHub (Aug 2, 2017):

@BongoVR thanks for the information.

<!-- gh-comment-id:319604541 --> @jmcnamara commented on GitHub (Aug 2, 2017): @BongoVR thanks for the information.
Author
Owner

@utelle commented on GitHub (Aug 6, 2017):

Using the static runtime library of MSVC in a DLL can cause a lot of problems, if not used very carefully, especially since the DLL will use its own memory management. Besides that this "solution" is MSVC specific.

Therefore I would prefer a generic solution that generates correct output for variables of type double independent of the locale.

As mentioned in my post of July 11, 2016, I adjusted the libxlsxwriter code to use a locale independent function for generating the string output for doubles. And finally I dedicated some time to apply the changes not only in my local development environment for my own application using libxlsxwriter.

I committed the changes to my libxlsxwriter clone - after syncing it with the current version of the original. I struggled a bit with adjusting the various Makefiles, but now the code compiles at least without errors. However, the CI on Travis fails in the step verifying the generated test files. @jmcnamara: it would be nice if you could take a look - maybe I just made a simple stupid mistake; thanks in advance.

<!-- gh-comment-id:320504897 --> @utelle commented on GitHub (Aug 6, 2017): Using the static runtime library of MSVC in a DLL can cause a lot of problems, if not used very carefully, especially since the DLL will use its own memory management. Besides that this "solution" is MSVC specific. Therefore I would prefer a generic solution that generates correct output for variables of type double independent of the locale. As mentioned in my post of July 11, 2016, I adjusted the libxlsxwriter code to use a locale independent function for generating the string output for doubles. And finally I dedicated some time to apply the changes not only in my local development environment for my own application using libxlsxwriter. I committed the changes to my [libxlsxwriter clone](https://github.com/utelle/libxlsxwriter) - after syncing it with the current version of the original. I struggled a bit with adjusting the various Makefiles, but now the code compiles at least without errors. However, the [CI on Travis](https://travis-ci.org/utelle/libxlsxwriter) fails in the step verifying the generated test files. @jmcnamara: it would be nice if you could take a look - maybe I just made a simple stupid mistake; thanks in advance.
Author
Owner

@jmcnamara commented on GitHub (Aug 8, 2017):

@utelle I will probably implement a macro for snprintf() with "%.16g" so that the user can override it with their own function. I'm not in favour of adding dtoa at this time however.

<!-- gh-comment-id:321014206 --> @jmcnamara commented on GitHub (Aug 8, 2017): @utelle I will probably implement a macro for `snprintf()` with `"%.16g"` so that the user can override it with their own function. I'm not in favour of adding dtoa at this time however.
Author
Owner

@utelle commented on GitHub (Aug 8, 2017):

Any specific reason for not using dtoa?

All what I can tell is that I use it in a commercial application and it works flawlessly. That is, the resulting files can be opened by Excel without any problems.

IMHO you should at least isolate the output of double values into a separate function as I did, so that a change is necessary at a single place only.

<!-- gh-comment-id:321025420 --> @utelle commented on GitHub (Aug 8, 2017): Any specific reason for not using dtoa? All what I can tell is that I use it in a commercial application and it works flawlessly. That is, the resulting files can be opened by Excel without any problems. IMHO you should at least isolate the output of double values into a separate function as I did, so that a change is necessary at a single place only.
Author
Owner

@utelle commented on GitHub (Aug 8, 2017):

I still think that libxlsxwriter should provide a locale independent solution for writing double values right out of the box instead of leaving it to the user of the library to build some sort of workaround (by using compiler specific functions like _sprintf_l, by isolating the library into a separate DLL as proposed by @BongoVR, or by any other means).

<!-- gh-comment-id:321031152 --> @utelle commented on GitHub (Aug 8, 2017): I still think that `libxlsxwriter` should provide a locale independent solution for writing double values right out of the box instead of leaving it to the user of the library to build some sort of workaround (by using compiler specific functions like `_sprintf_l`, by isolating the library into a separate DLL as proposed by @BongoVR, or by any other means).
Author
Owner

@utelle commented on GitHub (Aug 8, 2017):

I modified the dtoa code to not emit a decimal point and a trailing zero for doubles representing integer numbers ... and now CI on Travis passes all tests.

<!-- gh-comment-id:321038699 --> @utelle commented on GitHub (Aug 8, 2017): I modified the `dtoa` code to not emit a decimal point and a trailing zero for doubles representing integer numbers ... and now [CI on Travis](https://travis-ci.org/utelle/libxlsxwriter) passes all tests.
Author
Owner

@jmcnamara commented on GitHub (Aug 9, 2017):

Any specific reason for not using dtoa?

I looked at the emyg_dtoa.c implementation last year when you posted it and again today. I couldn't get it to compile without modification and it doesn't have any significant test suite. I see that you also had to patch it to get it to work. So for me it is a non-runner.

Of the others in the benchmark the Gay algorithm should be stable and tested, and it compiles everywhere but it is slower than sprintf(). Although that wouldn't be my main consideration.

Also fpconv compiles okay, runs faster than sprintf and seems to be maintained, so that may be an option.

Either way I would have to write a test suite before I added any of them.

IMHO you should at least isolate the output of double values into a separate function as I did, so that a change is necessary at a single place only

Yes. I will do that as step 1 in the near future.

I still think that libxlsxwriter should provide a locale independent solution for writing double values right out of the box

As above my main concern is finding a library that I can stand over. When/if I identify a suitable library I'll add it as a step 2.

<!-- gh-comment-id:321303587 --> @jmcnamara commented on GitHub (Aug 9, 2017): > Any specific reason for not using dtoa? I looked at the emyg_dtoa.c implementation last year when you posted it and again today. I couldn't get it to compile without modification and it doesn't have any significant test suite. I see that you also had to patch it to get it to work. So for me it is a non-runner. Of the others in the benchmark the Gay algorithm should be stable and tested, and it compiles everywhere but it is slower than `sprintf()`. Although that wouldn't be my main consideration. Also `fpconv` compiles okay, runs faster than `sprintf` and seems to be maintained, so that may be an option. Either way I would have to write a test suite before I added any of them. > IMHO you should at least isolate the output of double values into a separate function as I did, so that a change is necessary at a single place only Yes. I will do that as step 1 in the near future. > I still think that libxlsxwriter should provide a locale independent solution for writing double values right out of the box As above my main concern is finding a library that I can stand over. When/if I identify a suitable library I'll add it as a step 2.
Author
Owner

@utelle commented on GitHub (Aug 9, 2017):

Unfortunately you are right that the dtoa benchmark project seems not to be the best source for a dtoa implementation. Yes, I had to tweak emyg_dtoa to make it compile, but as far as I can tell the algorithm works properly.

If you tend to use the dtoa implementation of David Gay, then do not use the sources from the dtoa benchmark project - they are outdated. Get the latest version from netlib.

fpconv might be preferrable to Gay's dtoa, since it is a lot faster. In fact, it is based on the same algorithm (grisu) - originally developed by Florian Loitsch - as emyg_dtoa.

That you intend to do step 1 (isolating output of doubles) in the near future sounds really good. Then one can drop in whatever dtoa implementation without too much hassle.

<!-- gh-comment-id:321398715 --> @utelle commented on GitHub (Aug 9, 2017): Unfortunately you are right that the [dtoa benchmark](https://github.com/miloyip/dtoa-benchmark) project seems not to be the best source for a `dtoa` implementation. Yes, I had to tweak `emyg_dtoa` to make it compile, but as far as I can tell the algorithm works properly. If you tend to use the `dtoa` implementation of David Gay, then do not use the sources from the [dtoa benchmark](https://github.com/miloyip/dtoa-benchmark) project - they are outdated. Get the latest version from [netlib](http://www.ampl.com/netlib/fp/). `fpconv` might be preferrable to Gay's `dtoa`, since it is a lot faster. In fact, it is based on the same algorithm (`grisu`) - originally developed by Florian Loitsch - as `emyg_dtoa`. That you intend to do step 1 (isolating output of doubles) in the near future sounds really good. Then one can drop in whatever `dtoa` implementation without too much hassle.
Author
Owner

@m1h43l commented on GitHub (Mar 12, 2018):

Hi,

I just want to add my 2 cents. I am using this library on IBM i with the EBCDIC CCSID 273 (German) and the locale de_DE. This setup configures the decimal point as a comma which results in an invalid xlsx document. This library uses snprintf for transforming the double value to a character value. snprintf does not use the DECFMT job value but takes the LC_NUMERIC value into account which can be set as an env var for the job.

ADDENVVAR ENVVAR(LC_NUMERIC) VALUE('/QSYS.LIB/EN_US.LOCALE')

So users with a comma as a decimal point sign can change the env var LC_NUMERIC to get a correct document.

<!-- gh-comment-id:372238527 --> @m1h43l commented on GitHub (Mar 12, 2018): Hi, I just want to add my 2 cents. I am using this library on IBM i with the EBCDIC CCSID 273 (German) and the locale de_DE. This setup configures the decimal point as a comma which results in an invalid xlsx document. This library uses snprintf for transforming the double value to a character value. snprintf does not use the DECFMT job value but takes the LC_NUMERIC value into account which can be set as an env var for the job. ADDENVVAR ENVVAR(LC_NUMERIC) VALUE('/QSYS.LIB/EN_US.LOCALE') So users with a comma as a decimal point sign can change the env var LC_NUMERIC to get a correct document.
Author
Owner

@utelle commented on GitHub (Mar 12, 2018):

@m1h43l: for your use case the solution to set the decimal separator via environment variable may work sufficiently well. However, in the general case this solution has the side effect of changing the decimal separator globally for the system/user. This may not be acceptable for some user interface applications.

IMHO the preferred solution should be to make the library locale independent. In fact, this is not a big deal as I demonstrated in my own fork. In my own applications my approach works flawlessly ever since.

<!-- gh-comment-id:372431289 --> @utelle commented on GitHub (Mar 12, 2018): @m1h43l: for your use case the solution to set the decimal separator via environment variable may work sufficiently well. However, in the general case this solution has the side effect of changing the decimal separator globally for the system/user. This may not be acceptable for some user interface applications. IMHO the preferred solution should be to make the library locale independent. In fact, this is not a big deal as I demonstrated in [my own fork](https://github.com/utelle/libxlsxwriter). In my own applications my approach works flawlessly ever since.
Author
Owner

@m1h43l commented on GitHub (Mar 13, 2018):

@utelle I totally agree. The application should work without having the user to modify the environment. But for the time being it is better to have something than to have nothing and thus the workaround with the env var.

<!-- gh-comment-id:372576640 --> @m1h43l commented on GitHub (Mar 13, 2018): @utelle I totally agree. The application should work without having the user to modify the environment. But for the time being it is better to have something than to have nothing and thus the workaround with the env var.
Author
Owner

@jmcnamara commented on GitHub (Apr 25, 2018):

I've implemented a partial solution for this: I've added a function called lxw_sprintf_dbl() which can be enabled via the USE_DOUBLE_FUNCTION #define:

make USE_DOUBLE_FUNCTION=1

By default (when USE_DOUBLE_FUNCTION is not defined) lxw_sprintf_dbl() is just a macro.

See commit 14421b6629 for details.

The internals of the lxw_sprintf_dbl() function can be replaced with something more complete such as the emyg_dtoa function you use.

I'll look into adding something like dtoa as an optional compile at a later stage.

<!-- gh-comment-id:384224485 --> @jmcnamara commented on GitHub (Apr 25, 2018): I've implemented a partial solution for this: I've added a function called [lxw_sprintf_dbl()](https://github.com/jmcnamara/libxlsxwriter/blob/master/src/utility.c#L540) which can be enabled via the `USE_DOUBLE_FUNCTION` `#define`: ```bash make USE_DOUBLE_FUNCTION=1 ``` By default (when `USE_DOUBLE_FUNCTION` is not defined) `lxw_sprintf_dbl()` is just a macro. See commit 14421b662981423856b5a98eb0fa5415dcdf7925 for details. The internals of the `lxw_sprintf_dbl()` function can be replaced with something more complete such as the `emyg_dtoa` function you use. I'll look into adding something like dtoa as an optional compile at a later stage.
Author
Owner

@utelle commented on GitHub (Apr 25, 2018):

Great news. A step in the right direction towards a general solution of the problem.

However, I really don't understand why you chose to use a macro if USE_DOUBLE_FUNCTION is not defined. IMHO that makes the code less clear (for example in worksheet.c).

The variant of emyg_dtoa I use in my fork works flawlessly producing the exact same results as your original library.

<!-- gh-comment-id:384352831 --> @utelle commented on GitHub (Apr 25, 2018): Great news. A step in the right direction towards a general solution of the problem. However, I really don't understand why you chose to use a macro if `USE_DOUBLE_FUNCTION` is not defined. IMHO that makes the code less clear (for example in `worksheet.c`). The variant of `emyg_dtoa` I use in my fork works flawlessly producing the exact same results as your original library.
Author
Owner

@jmcnamara commented on GitHub (Apr 25, 2018):

I really don't understand why you chose to use a macro if USE_DOUBLE_FUNCTION is not defined.

For performance reasons in _write_number_cell() function which is in the inner loop. Although, having said that, I haven't performance tested both variants yet and an inlined function might be as fast as the macro. I'll look into that in the next few days.

<!-- gh-comment-id:384410574 --> @jmcnamara commented on GitHub (Apr 25, 2018): > I really don't understand why you chose to use a macro if USE_DOUBLE_FUNCTION is not defined. For performance reasons in `_write_number_cell()` function which is in the inner loop. Although, having said that, I haven't performance tested both variants yet and an inlined function might be as fast as the macro. I'll look into that in the next few days.
Author
Owner

@utelle commented on GitHub (Apr 25, 2018):

I doubt that one level of indirection really causes performance problems compared to the time required to do the actual double formatting. At the moment your macro approach looks like premature optimization to me.

BTW, I updated my fork to your latest version, but using my emyg_dtoa based implementation for your new macro/function lxw_sprintf_dbl ... it still passes all tests. 😄

<!-- gh-comment-id:384416860 --> @utelle commented on GitHub (Apr 25, 2018): I doubt that one level of indirection really causes performance problems compared to the time required to do the actual double formatting. At the moment your macro approach looks like premature optimization to me. BTW, I updated my fork to your latest version, but using my `emyg_dtoa` based implementation for your new macro/function `lxw_sprintf_dbl` ... it still passes all tests. :smile:
Author
Owner

@hdijkema commented on GitHub (Aug 15, 2019):

Please document the 'USE_DOUBLE_FUNCTION' somewhere in CMakefile.txt or in some building instructions. I have this problem on a Linux system (not on Windows) and I had to look into the generated excel XML to see what caused the problems. Also it doesn't seem to work with cmake? I did:

mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_SYSTEM_MINIZIP=On 
make USE_DOUBLE_FUNCTION=1 

Eventually I added '#define USE_DOUBLE_FUNCTION 1' in common.h during library compilation and removed it when installing...

<!-- gh-comment-id:521610620 --> @hdijkema commented on GitHub (Aug 15, 2019): Please document the 'USE_DOUBLE_FUNCTION' somewhere in CMakefile.txt or in some building instructions. I have this problem on a Linux system (not on Windows) and I had to look into the generated excel XML to see what caused the problems. Also it doesn't seem to work with cmake? I did: ``` mkdir build cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_SYSTEM_MINIZIP=On make USE_DOUBLE_FUNCTION=1 ``` Eventually I added `'#define USE_DOUBLE_FUNCTION 1'` in `common.h` during library compilation and removed it when installing...
Author
Owner

@petricf commented on GitHub (Sep 19, 2019):

Can you integrate the following patch:
https://bugs.gentoo.org/attachment.cgi?id=590372

For reference the complete bug report is at: https://bugs.gentoo.org/627452

This adds the cmake option USE_DOUBLE_FUNCTION=(ON/OFF). Allows to be defined at cmake call time.
Useful for systems which build automatically from source (like gentoo).

<!-- gh-comment-id:533127676 --> @petricf commented on GitHub (Sep 19, 2019): Can you integrate the following patch: https://bugs.gentoo.org/attachment.cgi?id=590372 For reference the complete bug report is at: https://bugs.gentoo.org/627452 This adds the cmake option USE_DOUBLE_FUNCTION=(ON/OFF). Allows to be defined at cmake call time. Useful for systems which build automatically from source (like gentoo).
Author
Owner

@jmcnamara commented on GitHub (Sep 20, 2019):

@petricf Will do. Thanks.

<!-- gh-comment-id:533417632 --> @jmcnamara commented on GitHub (Sep 20, 2019): @petricf Will do. Thanks.
Author
Owner

@utelle commented on GitHub (Jan 20, 2020):

Instead of continuing the discussion in #265 (under a misleading title) I'd like to add some information here.

Outside of libxlsxwriter I compared the implementations emyg_dtoa and fpconv from dtoa-benchmark. The fpconv implementation is identical to night-shift/fpconv. Not sure, which fpconv implementation you had in mind.

emyg_dtoa as well as fpconv are based on grisu by Florian Loitsch.

Interestingly the results of emyg_dtoa and fpconv differed in roughly 43 % of 100.000 test conversions, although the results varied only in the last digit of the output. It should be tested whether this would have an effect on the results of the libxlsxwriter test suite.

Additionally, I found ryu by Ulf Adams with implementation(s) here.

<!-- gh-comment-id:576413310 --> @utelle commented on GitHub (Jan 20, 2020): Instead of continuing the discussion in #265 (under a misleading title) I'd like to add some information here. Outside of libxlsxwriter I compared the implementations `emyg_dtoa` and `fpconv` from [dtoa-benchmark](https://github.com/miloyip/dtoa-benchmark). The `fpconv` implementation is identical to [night-shift/fpconv](https://github.com/night-shift/fpconv). Not sure, which `fpconv` implementation you had in mind. `emyg_dtoa` as well as `fpconv` are based on [grisu](https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf) by Florian Loitsch. Interestingly the results of `emyg_dtoa` and `fpconv` differed in roughly 43 % of 100.000 test conversions, although the results varied only in the last digit of the output. It should be tested whether this would have an effect on the results of the libxlsxwriter test suite. Additionally, I found [ryu](https://dl.acm.org/doi/pdf/10.1145/3296979.3192369?download=true) by Ulf Adams with implementation(s) [here](https://github.com/ulfjack/ryu).
Author
Owner

@jmcnamara commented on GitHub (Jul 11, 2021):

Closing this as a duplicate of #272

<!-- gh-comment-id:877796666 --> @jmcnamara commented on GitHub (Jul 11, 2021): Closing this as a duplicate of #272
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#55
No description provided.