[GH-ISSUE #230] workbook_validate_sheet_name gives wrong answer, should be case-insensitive #185

Closed
opened 2026-05-05 11:51:13 -06:00 by gitea-mirror · 2 comments
Owner

Originally created by @mewalig on GitHub (Jun 5, 2019).
Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/230

Originally assigned to: @jmcnamara on GitHub.

For example, if a workbook has a worksheet "abcde", workbook_validate_sheet_name("ABCDE") should return LXW_ERROR_SHEETNAME_ALREADY_USED, but does not.

I believe this issue exists because _worksheet_name_cmp() is not case-insensitive, but should be. _chartsheet_name_cmp() might have the same issue.

I understand this issue can be more complicated with multibyte chars. A full solution might use something like ICU or iconv for case-insensitive compare. However, a partial solution, which is easy and at least reduces the size of the problem, could be to do char-by-char compare that is case-insensitive for a-zA-Z and is otherwise strict.

Below is a possible partial solution (only for A-Z) that doesn't require using a library for multibyte case-insensitive compare:

STATIC int
_worksheet_name_cmp(lxw_worksheet_name *name1, lxw_worksheet_name *name2)
{
  /* return strcmp(name1->name, name2->name); */
  const char *n1, *n2;
  for(n1 = name1->name, n2 = name2->name; *n1 && *n2; n1++, n2++) {
    char c1 = (*n1 >= 'A' && *n1 <= 'Z') ? tolower(*n1) : *n1;
    char c2 = (*n2 >= 'A' && *n2 <= 'Z') ? tolower(*n2) : *n2;
    if(c1 != c2)
      return c1 > c2 ? 1 : c1 == c2 ? 0 : -1;
  }
  return *n1 > *n2 ? 1 : *n1 == *n2 ? 0 : -1;
}
Originally created by @mewalig on GitHub (Jun 5, 2019). Original GitHub issue: https://github.com/jmcnamara/libxlsxwriter/issues/230 Originally assigned to: @jmcnamara on GitHub. For example, if a workbook has a worksheet "abcde", workbook_validate_sheet_name("ABCDE") should return LXW_ERROR_SHEETNAME_ALREADY_USED, but does not. I believe this issue exists because _worksheet_name_cmp() is not case-insensitive, but should be. _chartsheet_name_cmp() might have the same issue. I understand this issue can be more complicated with multibyte chars. A full solution might use something like ICU or iconv for case-insensitive compare. However, a partial solution, which is easy and at least reduces the size of the problem, could be to do char-by-char compare that is case-insensitive for a-zA-Z and is otherwise strict. Below is a possible partial solution (only for A-Z) that doesn't require using a library for multibyte case-insensitive compare: ``` STATIC int _worksheet_name_cmp(lxw_worksheet_name *name1, lxw_worksheet_name *name2) { /* return strcmp(name1->name, name2->name); */ const char *n1, *n2; for(n1 = name1->name, n2 = name2->name; *n1 && *n2; n1++, n2++) { char c1 = (*n1 >= 'A' && *n1 <= 'Z') ? tolower(*n1) : *n1; char c2 = (*n2 >= 'A' && *n2 <= 'Z') ? tolower(*n2) : *n2; if(c1 != c2) return c1 > c2 ? 1 : c1 == c2 ? 0 : -1; } return *n1 > *n2 ? 1 : *n1 == *n2 ? 0 : -1; } ```
gitea-mirror 2026-05-05 11:51:13 -06:00
Author
Owner

@jmcnamara commented on GitHub (Jun 5, 2019):

Thanks. That is a bug.

This is checked for in the Perl and Python versions, where the check is easier.

<!-- gh-comment-id:499182437 --> @jmcnamara commented on GitHub (Jun 5, 2019): Thanks. That is a bug. This is checked for in the Perl and Python versions, where the check is easier.
Author
Owner

@jmcnamara commented on GitHub (Jun 7, 2019):

I've push a fix for this to master, along the lines of your suggestion.

From the updated docs:

This function does an ASCII lowercase string comparison to determine
if the sheet name is already in use. It doesn't take UTF-8 characters into
account. Thus it would flag "Café" and "café" as a duplicate (just like
Excel) but it wouldn't catch "CAFÉ". If you need a full UTF-8 case
insensitive check you should use a third party library to implement it.

<!-- gh-comment-id:500029149 --> @jmcnamara commented on GitHub (Jun 7, 2019): I've push a fix for this to master, along the lines of your suggestion. From the updated docs: > This function does an ASCII lowercase string comparison to determine > if the sheet name is already in use. It doesn't take UTF-8 characters into > account. Thus it would flag "Café" and "café" as a duplicate (just like > Excel) but it wouldn't catch "CAFÉ". If you need a full UTF-8 case > insensitive check you should use a third party library to implement it.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/libxlsxwriter#185
No description provided.