]> git.ipfire.org Git - thirdparty/gcc.git/commit
c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]
authorJakub Jelinek <jakub@redhat.com>
Tue, 14 Nov 2023 17:28:34 +0000 (18:28 +0100)
committerJakub Jelinek <jakub@redhat.com>
Tue, 14 Nov 2023 17:28:34 +0000 (18:28 +0100)
commit194825f20619a1c4b51eaea84f20432fefc0db03
tree93e0f44cfa40ba14f7585d7aee9464f25b3f15e7
parent948b8b6e0e50958ecf56d4d9fb7ac16f245d9cc3
c++: Implement C++26 P1854R4 - Making non-encodable string literals ill-formed [PR110341]

This paper voted in as DR makes some multi-character literals ill-formed.
'abcd' stays valid, but e.g. 'รก' is newly invalid in UTF-8 exec charset
while valid e.g. in ISO-8859-1, because it is a single character which needs
2 bytes to be encoded.

The following patch does that by checking (only pedantically, especially
because it is a DR) if we'd emit a -Wmultichar warning because character
constant has more than one byte in it whether the number of source characters
is equal to the number of bytes in the multichar string.
If it is, it is normal multi-character literal constant
and is diagnosed normally with -Wmultichar, otherwise at least one of the
c-chars in the sequence was encoded as 2+ bytes.

2023-11-14  Jakub Jelinek  <jakub@redhat.com>

PR c++/110341
libcpp/
* charset.cc: Implement C++26 P1854R4 - Making non-encodable string
literals ill-formed.
(one_count_chars, convert_count_chars, count_source_chars): New
functions.
(narrow_str_to_charconst): Change last arg type from cpp_ttype to
const cpp_token *.  For C++ if pedantic and i > 1 in CPP_CHAR
interpret token also as CPP_STRING32 and if number of characters
in the CPP_STRING32 is larger than number of bytes in CPP_CHAR,
pedwarn on it.  Make the diagnostics more detailed.
(wide_str_to_charconst): Change last arg type from cpp_ttype to
const cpp_token *.  Make the diagnostics more detailed.
(cpp_interpret_charconst): Adjust narrow_str_to_charconst and
wide_str_to_charconst callers.
gcc/testsuite/
* g++.dg/cpp26/literals1.C: New test.
* g++.dg/cpp26/literals2.C: New test.
* g++.dg/cpp23/wchar-multi1.C: Adjust expected diagnostic wordings.
* g++.dg/cpp23/wchar-multi2.C: Likewise.
* gcc.dg/c23-utf8char-3.c: Likewise.
* gcc.dg/cpp/charconst-4.c: Likewise.
* gcc.dg/cpp/charconst.c: Likewise.
* gcc.dg/cpp/if-2.c: Likewise.
* gcc.dg/utf16-4.c: Likewise.
* gcc.dg/utf32-4.c: Likewise.
* g++.dg/cpp1z/utf8-neg.C: Likewise.
* g++.dg/cpp2a/ucn2.C: Likewise.
* g++.dg/ext/utf16-4.C: Likewise.
* g++.dg/ext/utf32-4.C: Likewise.
15 files changed:
gcc/testsuite/g++.dg/cpp1z/utf8-neg.C
gcc/testsuite/g++.dg/cpp23/wchar-multi1.C
gcc/testsuite/g++.dg/cpp23/wchar-multi2.C
gcc/testsuite/g++.dg/cpp26/literals1.C [new file with mode: 0644]
gcc/testsuite/g++.dg/cpp26/literals2.C [new file with mode: 0644]
gcc/testsuite/g++.dg/cpp2a/ucn2.C
gcc/testsuite/g++.dg/ext/utf16-4.C
gcc/testsuite/g++.dg/ext/utf32-4.C
gcc/testsuite/gcc.dg/c23-utf8char-3.c
gcc/testsuite/gcc.dg/cpp/charconst-4.c
gcc/testsuite/gcc.dg/cpp/charconst.c
gcc/testsuite/gcc.dg/cpp/if-2.c
gcc/testsuite/gcc.dg/utf16-4.c
gcc/testsuite/gcc.dg/utf32-4.c
libcpp/charset.cc