From: drh Date: Wed, 15 Apr 2020 17:39:39 +0000 (+0000) Subject: Clarification of the byte-order determination for UTF16 inputs to routines X-Git-Tag: version-3.32.0~77 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=c39b121a95c10b4ba36cbde89606037aab8dfc9f;p=thirdparty%2Fsqlite.git Clarification of the byte-order determination for UTF16 inputs to routines like sqlite3_bind_text16() and sqlite3_result_text16() and others that accept UTF16 input strings. FossilOrigin-Name: a42fdcf54bcbd72a301dad4a040346dc48e67cacab43479ec618f5c32108c55f --- diff --git a/manifest b/manifest index 3f838de2fc..4f7d83de2d 100644 --- a/manifest +++ b/manifest @@ -1,5 +1,5 @@ -C Build\sthe\sUINT\scollating\ssequence\sextension\sinto\sthe\sCLI. -D 2020-04-14T15:53:58.512 +C Clarification\sof\sthe\sbyte-order\sdetermination\sfor\sUTF16\sinputs\sto\sroutines\nlike\ssqlite3_bind_text16()\sand\ssqlite3_result_text16()\sand\sothers\sthat\naccept\sUTF16\sinput\sstrings. +D 2020-04-15T17:39:39.862 F .fossil-settings/empty-dirs dbb81e8fc0401ac46a1491ab34a7f2c7c0452f2f06b54ebb845d024ca8283ef1 F .fossil-settings/ignore-glob 35175cdfcf539b2318cb04a9901442804be81cd677d8b889fcc9149c21f239ea F LICENSE.md df5091916dbb40e6e9686186587125e1b2ff51f022cc334e886c19a0e9982724 @@ -534,7 +534,7 @@ F src/resolve.c d36a2b1639e1c33d7b508abfd3452a63e7fd81737f6f3940bfef085fca6f21f4 F src/rowset.c ba9515a922af32abe1f7d39406b9d35730ed65efab9443dc5702693b60854c92 F src/select.c ab4eb1aee1bd066feea5b6eff264220ae54459019654264e9f688368a7d0c0b5 F src/shell.c.in 792b901ae19c7cf8cbac06b3c58ef860328d0ca6d20149bdc469e5c4fc550a1d -F src/sqlite.h.in cc7d0949ac32bb68ed97acdb3e7ae91cd413a24d32d6ff049ef8308d620a4367 +F src/sqlite.h.in 4276f461ed8405630e3089b682dad77ae7a65c345d8daebcee52a13be8cd880c F src/sqlite3.rc 5121c9e10c3964d5755191c80dd1180c122fc3a8 F src/sqlite3ext.h 9c5269260409eb3275324ccace6a13a96f4ad330c708415f70ca6097901ff4ee F src/sqliteInt.h 0f3848c46310d197246003f052985b72d1cdbfc0b31e069db76cb5231062fa1d @@ -1861,7 +1861,7 @@ F vsixtest/vsixtest.tcl 6a9a6ab600c25a91a7acc6293828957a386a8a93 F vsixtest/vsixtest.vcxproj.data 2ed517e100c66dc455b492e1a33350c1b20fbcdc F vsixtest/vsixtest.vcxproj.filters 37e51ffedcdb064aad6ff33b6148725226cd608e F vsixtest/vsixtest_TemporaryKey.pfx e5b1b036facdb453873e7084e1cae9102ccc67a0 -P 6f46c6e3e3c471ca864d7596e0211ee90316b784c8fe22c7ae177c9d29731dc7 -R 55c0a7e7e6364ecba4d144377c4424a0 +P 2b8c6b035a276029850de02651712a5fd69f4dfee45083d24b9d1f998004829b +R 7af907123cde70d72ab14d313300d9ef U drh -Z 63a1ce21bf796da3b1d1d32831983a3d +Z e055c082ef6d50bf436bbdcfa90b28f7 diff --git a/manifest.uuid b/manifest.uuid index 52c41554f5..eebee4beb8 100644 --- a/manifest.uuid +++ b/manifest.uuid @@ -1 +1 @@ -2b8c6b035a276029850de02651712a5fd69f4dfee45083d24b9d1f998004829b \ No newline at end of file +a42fdcf54bcbd72a301dad4a040346dc48e67cacab43479ec618f5c32108c55f \ No newline at end of file diff --git a/src/sqlite.h.in b/src/sqlite.h.in index 86c3223801..f93fab1c59 100644 --- a/src/sqlite.h.in +++ b/src/sqlite.h.in @@ -4260,6 +4260,24 @@ typedef struct sqlite3_context sqlite3_context; ** ^If the third parameter to sqlite3_bind_text() or sqlite3_bind_text16() ** or sqlite3_bind_blob() is a NULL pointer then the fourth parameter ** is ignored and the end result is the same as sqlite3_bind_null(). +** ^If the third parameter to sqlite3_bind_text() is not NULL, then +** it should be a pointer to well-formed UTF8 text. +** ^If the third parameter to sqlite3_bind_text16() is not NULL, then +** it should be a pointer to well-formed UTF16 text. +** ^If the third parameter to sqlite3_bind_text64() is not NULL, then +** it should be a pointer to a well-formed unicode string that is +** either UTF8 if the sixth parameter is SQLITE_UTF8, or UTF16 +** otherwise. +** +** [[byte-order determination rules]] ^The byte-order of +** UTF16 input text is determined by the byte-order mark (BOM, U+FEFF) +** found in first character, which is removed, or in the absence of a BOM +** the byte order is the native byte order of the host +** machine for sqlite3_bind_text16() or the byte order specified in +** the 6th parameter for sqlite3_bind_text64().)^ +** ^If UTF16 input text contains invalid unicode +** characters, then SQLite might change those invalid characters +** into the unicode replacement character: U+FFFD. ** ** ^(In those routines that have a fourth argument, its value is the ** number of bytes in the parameter. To be clear: the value is the @@ -4273,7 +4291,7 @@ typedef struct sqlite3_context sqlite3_context; ** or sqlite3_bind_text16() or sqlite3_bind_text64() then ** that parameter must be the byte offset ** where the NUL terminator would occur assuming the string were NUL -** terminated. If any NUL characters occur at byte offsets less than +** terminated. If any NUL characters occurs at byte offsets less than ** the value of the fourth parameter then the resulting string value will ** contain embedded NULs. The result of expressions involving strings ** with embedded NULs is undefined. @@ -5598,8 +5616,9 @@ typedef void (*sqlite3_destructor_type)(void*); ** 2nd parameter of sqlite3_result_error() or sqlite3_result_error16() ** as the text of an error message. ^SQLite interprets the error ** message string from sqlite3_result_error() as UTF-8. ^SQLite -** interprets the string from sqlite3_result_error16() as UTF-16 in native -** byte order. ^If the third parameter to sqlite3_result_error() +** interprets the string from sqlite3_result_error16() as UTF-16 using +** the same [byte-order determination rules] as [sqlite3_bind_text16()]. +** ^If the third parameter to sqlite3_result_error() ** or sqlite3_result_error16() is negative then SQLite takes as the error ** message all text up through the first zero character. ** ^If the third parameter to sqlite3_result_error() or @@ -5667,6 +5686,25 @@ typedef void (*sqlite3_destructor_type)(void*); ** then SQLite makes a copy of the result into space obtained ** from [sqlite3_malloc()] before it returns. ** +** ^For the sqlite3_result_text16(), sqlite3_result_text16le(), and +** sqlite3_result_text16be() routines, and for sqlite3_result_text64() +** when the encoding is not UTF8, if the input UTF16 begins with a +** byte-order mark (BOM, U+FEFF) then the BOM is removed from the +** string and the rest of the string is interpreted according to the +** byte-order specified by the BOM. ^The byte-order specified by +** the BOM at the beginning of the text overrides the byte-order +** specified by the interface procedure. ^So, for example, if +** sqlite3_result_text16le() is invoked with text that begins +** with bytes 0xfe, 0xff (a big-endian byte-order mark) then the +** first two bytes of input are skipped and the remaining input +** is interpreted as UTF16BE text. +** +** ^For UTF16 input text to the sqlite3_result_text16(), +** sqlite3_result_text16be(), sqlite3_result_text16le(), and +** sqlite3_result_text64() routines, if the text contains invalid +** UTF16 characters, the invalid characters might be converted +** into the unicode replacement character, U+FFFD. +** ** ^The sqlite3_result_value() interface sets the result of ** the application-defined function to be a copy of the ** [unprotected sqlite3_value] object specified by the 2nd parameter. ^The