+2018-08-10 Paul Eggert <eggert@cs.ucla.edu>
+
+ regex: Gnulib unibyte RRI uses bytes not chars
+ Adjust the non-glibc code to agree with what Gawk needs for
+ rational range interpretation (RRI) for regular expression ranges.
+ In unibyte locales, Gawk wants ranges to use the underlying byte
+ rather than the character code point. This change does not affect
+ glibc proper.
+ * posix/regcomp.c (parse_byte) [!LIBC && RE_ENABLE_I18N]:
+ In unibyte locales, use the byte value rather than
+ running it through btowc.
+
2018-08-10 Joseph Myers <joseph@codesourcery.com>
* sysdeps/generic/math-tests-snan.h: New file.
# ifdef RE_ENABLE_I18N
/* Convert the byte B to the corresponding wide character. In a
- unibyte locale, treat B as itself if it is an encoding error.
- In a multibyte locale, return WEOF if B is an encoding error. */
+ unibyte locale, treat B as itself. In a multibyte locale, return
+ WEOF if B is an encoding error. */
static wint_t
parse_byte (unsigned char b, re_charset_t *mbcset)
{
- wint_t wc = __btowc (b);
- return wc == WEOF && !mbcset ? b : wc;
+ return mbcset == NULL ? b : __btowc (b);
}
-#endif
+# endif
/* Local function for parse_bracket_exp only used in case of NOT _LIBC.
Build the range expression which starts from START_ELEM, and ends