From: Victor Stinner Date: Sun, 18 Dec 2011 18:30:55 +0000 (+0100) Subject: Issue #13617: Document that the result of the conversion of a Unicode object to X-Git-Tag: v3.3.0a1~570^2~1 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=0d81c1357d4b628a29a8f7e33e49782b50f1163a;p=thirdparty%2FPython%2Fcpython.git Issue #13617: Document that the result of the conversion of a Unicode object to wchar*, Py_UNICODE* and bytes may contain embedded null characters/bytes. Patch written by Arnaud Calmettes. --- 0d81c1357d4b628a29a8f7e33e49782b50f1163a diff --cc Doc/c-api/unicode.rst index a6f3a69bfe3e,35006547c646..43e3d2fef23b --- a/Doc/c-api/unicode.rst +++ b/Doc/c-api/unicode.rst @@@ -527,158 -328,31 +527,164 @@@ APIs Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two arguments. + +.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \ + const char *encoding, const char *errors) + + Coerce an encoded object *obj* to an Unicode object and return a reference with + incremented refcount. + + :class:`bytes`, :class:`bytearray` and other char buffer compatible objects + are decoded according to the given *encoding* and using the error handling + defined by *errors*. Both can be *NULL* to have the interface use the default + values (see the next section for details). + + All other objects, including Unicode objects, cause a :exc:`TypeError` to be + set. + + The API returns *NULL* if there was an error. The caller is responsible for + decref'ing the returned objects. + + +.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode) + + Return the length of the Unicode object, in code points. + + .. versionadded:: 3.3 + + +.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \ + PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many) + + Copy characters from one Unicode object into another. This function performs + character conversion when necessary and falls back to :c:func:`memcpy` if + possible. Returns ``-1`` and sets an exception on error, otherwise returns + ``0``. + + .. versionadded:: 3.3 + + +.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \ + Py_UCS4 character) + + Write a character to a string. The string must have been created through + :c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable, + the string must not be shared, or have been hashed yet. + + This function checks that *unicode* is a Unicode object, that the index is + not out of bounds, and that the object can be modified safely (i.e. that it + its reference count is one), in contrast to the macro version + :c:func:`PyUnicode_WRITE_CHAR`. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index) + + Read a character from a string. This function checks that *unicode* is a + Unicode object and the index is not out of bounds, in contrast to the macro + version :c:func:`PyUnicode_READ_CHAR`. + + .. versionadded:: 3.3 + + +.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \ + Py_ssize_t end) + + Return a substring of *str*, from character index *start* (included) to + character index *end* (excluded). Negative indices are not supported. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, \ + Py_ssize_t buflen, int copy_null) + + Copy the string *u* into a UCS4 buffer, including a null character, if + *copy_null* is set. Returns *NULL* and sets an exception on error (in + particular, a :exc:`ValueError` if *buflen* is smaller than the length of + *u*). *buffer* is returned on success. + + .. versionadded:: 3.3 + + +.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u) + + Copy the string *u* into a new UCS4 buffer that is allocated using + :c:func:`PyMem_Malloc`. If this fails, *NULL* is returned with a + :exc:`MemoryError` set. + + .. versionadded:: 3.3 + + +Deprecated Py_UNICODE APIs +"""""""""""""""""""""""""" + +.. deprecated-removed:: 3.3 4.0 + +These API functions are deprecated with the implementation of :pep:`393`. +Extension modules can continue using them, as they will not be removed in Python +3.x, but need to be aware that their use can now cause performance and memory hits. + + +.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size) + + Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u* + may be *NULL* which causes the contents to be undefined. It is the user's + responsibility to fill in the needed data. The buffer is copied into the new + object. + + If the buffer is not *NULL*, the return value might be a shared object. + Therefore, modification of the resulting Unicode object is only allowed when + *u* is *NULL*. + + If the buffer is *NULL*, :c:func:`PyUnicode_READY` must be called once the + string content has been filled before using any of the access macros such as + :c:func:`PyUnicode_KIND`. + + Please migrate to using :c:func:`PyUnicode_FromKindAndData` or + :c:func:`PyUnicode_New`. + + +.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode) + + Return a read-only pointer to the Unicode object's internal - :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object. - This will create the :c:type:`Py_UNICODE` representation of the object if it - is not yet available. ++ :c:type:`Py_UNICODE` buffer, or *NULL* on error. This will create the ++ :c:type:`Py_UNICODE*` representation of the object if it is not yet ++ available. Note that the resulting :c:type:`Py_UNICODE` string may contain ++ embedded null characters, which would cause the string to be truncated when ++ used in most C functions. + + Please migrate to using :c:func:`PyUnicode_AsUCS4`, + :c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new + APIs. + + .. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size) Create a Unicode object by replacing all decimal digits in :c:type:`Py_UNICODE` buffer of the given *size* by ASCII digits 0--9 - according to their decimal value. Return *NULL* if an exception - occurs. + according to their decimal value. Return *NULL* if an exception occurs. -.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode) +.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) - Return a read-only pointer to the Unicode object's internal - :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object. - Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded - null characters, which would cause the string to be truncated when used in - most C functions. + Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE` - array length in *size*. ++ array length in *size*. Note that the resulting :c:type:`Py_UNICODE*` string ++ may contain embedded null characters, which would cause the string to be ++ truncated when used in most C functions. + + .. versionadded:: 3.3 .. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode) Create a copy of a Unicode string ending with a nul character. Return *NULL* and raise a :exc:`MemoryError` exception on memory allocation failure, - otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free the - buffer). + otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free - the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may contain - embedded null characters, which would cause the string to be truncated when - used in most C functions. ++ the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may ++ contain embedded null characters, which would cause the string to be ++ truncated when used in most C functions. .. versionadded:: 3.2 @@@ -850,10 -479,12 +857,12 @@@ wchar_t Suppor Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing 0-termination character). Return the number of :c:type:`wchar_t` characters -- copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t` ++ copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*` string may or may not be 0-terminated. It is the responsibility of the caller -- to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is - required by the application. ++ to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is + required by the application. Also, note that the :c:type:`wchar_t*` string + might contain null characters, which would cause the string to be truncated + when used with most C functions. .. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size) @@@ -863,9 -494,11 +872,11 @@@ of wide characters (excluding the trailing 0-termination character) into *\*size*. - Returns a buffer allocated by :c:func:`PyMem_Alloc` (use :c:func:`PyMem_Free` - to free it) on success. On error, returns *NULL*, *\*size* is undefined and - raises a :exc:`MemoryError`. + Returns a buffer allocated by :c:func:`PyMem_Alloc` (use + :c:func:`PyMem_Free` to free it) on success. On error, returns *NULL*, + *\*size* is undefined and raises a :exc:`MemoryError`. Note that the - resulting :c:type:`wchar_t*` string might contain null characters, which ++ resulting :c:type:`wchar_t` string might contain null characters, which + would cause the string to be truncated when used with most C functions. .. versionadded:: 3.2