[thirdparty/openssl.git] / doc / man7 / passphrase-encoding.pod

=pod

=encoding utf8

=head1 NAME

passphrase-encoding
- How diverse parts of OpenSSL treat pass phrases character encoding

=head1 DESCRIPTION

In a modern world with all sorts of character encodings, the treatment of pass
phrases has become increasingly complex.
This manual page attempts to give an overview over how this problem is
currently addressed in different parts of the OpenSSL library.

=head2 The general case

The OpenSSL library doesn't treat pass phrases in any special way as a general
rule, and trusts the application or user to choose a suitable character set
and stick to that throughout the lifetime of affected objects.
This means that for an object that was encrypted using a pass phrase encoded in
ISO-8859-1, that object needs to be decrypted using a pass phrase encoded in
ISO-8859-1.
Using the wrong encoding is expected to cause a decryption failure.

=head2 PKCS#12

PKCS#12 is a bit different regarding pass phrase encoding.
The standard stipulates that the pass phrase shall be encoded as an ASN.1
BMPString, which consists of the code points of the basic multilingual plane,
encoded in big endian (UCS-2 BE).

OpenSSL tries to adapt to this requirements in one of the following manners:

=over 4

=item 1.

Treats the received pass phrase as UTF-8 encoded and tries to re-encode it to
UTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000
to U+FFFF, but becomes an expansion for any other character), or failing that,
proceeds with step 2.

=item 2.

Assumes that the pass phrase is encoded in ASCII or ISO-8859-1 and
opportunistically prepends each byte with a zero byte to obtain the UCS-2
encoding of the characters, which it stores as a BMPString.

Note that since there is no check of your locale, this may produce UCS-2 /
UTF-16 characters that do not correspond to the original pass phrase characters
for other character sets, such as any ISO-8859-X encoding other than
ISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical"
characters in the 0x80-0x9F range).

=back

OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why
OpenSSL still does this, to be able to read files produced with older versions.

It should be noted that this approach isn't entirely fault free.

A pass phrase encoded in ISO-8859-2 could very well have a sequence such as
0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE"
and "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would
be misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN
SMALL LETTER I WITH DIARESIS) I<if the pass phrase doesn't contain anything that
would be invalid UTF-8>.
A pass phrase that contains this kind of byte sequence will give a different
outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0.

 0x00 0xC3 0x00 0xAF                    # OpenSSL older than 1.1.0
 0x00 0xEF                              # OpenSSL 1.1.0 and newer

On the same accord, anything encoded in UTF-8 that was given to OpenSSL older
than 1.1.0 was misinterpreted as ISO-8859-1 sequences.

=head2 OSSL_STORE

L<ossl_store(7)> acts as a general interface to access all kinds of objects,
potentially protected with a pass phrase, a PIN or something else.
This API stipulates that pass phrases should be UTF-8 encoded, and that any
other pass phrase encoding may give undefined results.
This API relies on the application to ensure UTF-8 encoding, and doesn't check
that this is the case, so what it gets, it will also pass to the underlying
loader.

=head1 RECOMMENDATIONS

This section assumes that you know what pass phrase was used for encryption,
but that it may have been encoded in a different character encoding than the
one used by your current input method.
For example, the pass phrase may have been used at a time when your default
encoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61
0xEF 0x76 0x65), and you're now in an environment where your default encoding
is UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76
0x65).
Whenever it's mentioned that you should use a certain character encoding, it
should be understood that you either change the input method to use the
mentioned encoding when you type in your pass phrase, or use some suitable tool
to convert your pass phrase from your default encoding to the target encoding.

Also note that the sub-sections below discuss human readable pass phrases.
This is particularly relevant for PKCS#12 objects, where human readable pass
phrases are assumed.
For other objects, it's as legitimate to use any byte sequence (such as a
sequence of bytes from `/dev/urandom` that's been saved away), which makes any
character encoding discussion irrelevant; in such cases, simply use the same
byte sequence as it is.

=head2 Creating new objects

For creating new pass phrase protected objects, make sure the pass phrase is
encoded using UTF-8.
This is default on most modern Unixes, but may involve an effort on other
platforms.
Specifically for Windows, setting the environment variable
B<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt
converted to UTF-8 (command line and separately prompted pass phrases alike).

=head2 Opening existing objects

For opening pass phrase protected objects where you know what character
encoding was used for the encryption pass phrase, make sure to use the same
encoding again.

For opening pass phrase protected objects where the character encoding that was
used is unknown, or where the producing application is unknown, try one of the
following:

=over 4

=item 1.

Try the pass phrase that you have as it is in the character encoding of your
environment.
It's possible that its byte sequence is exactly right.

=item 2.

Convert the pass phrase to UTF-8 and try with the result.
Specifically with PKCS#12, this should open up any object that was created
according to the specification.

=item 3.

Do a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try
with the result.
This differs from the previous attempt because ISO-8859-1 maps directly to
U+0000 to U+00FF, which other non-UTF-8 character sets do not.

This also takes care of the case when a UTF-8 encoded string was used with
OpenSSL older than 1.1.0.
(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3
0x83 0xC2 0xAF when re-encoded in the naïve manner.
The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the
erroneous/non-compliant encoding used by OpenSSL older than 1.1.0)

=back

=head1 SEE ALSO

L<evp(7)>,
L<ossl_store(7)>,
L<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>,
L<PEM_do_header(3)>,
L<PKCS12_parse(3)>, L<PKCS12_newpass(3)>,
L<d2i_PKCS8PrivateKey_bio(3)>

=head1 COPYRIGHT

Copyright 2018 The OpenSSL Project Authors. All Rights Reserved.

Licensed under the Apache License 2.0 (the "License").  You may not use
this file except in compliance with the License.  You can obtain a copy
in the file LICENSE in the source distribution or at
L<https://www.openssl.org/source/license.html>.

=cut
Commit	Line	Data
491c3532 RL	1	=pod
	2
	3	=encoding utf8
	4
	5	=head1 NAME
	6
55c5c1b6	7	passphrase-encoding
491c3532 RL	8	- How diverse parts of OpenSSL treat pass phrases character encoding
	9
	10	=head1 DESCRIPTION
	11
	12	In a modern world with all sorts of character encodings, the treatment of pass
	13	phrases has become increasingly complex.
	14	This manual page attempts to give an overview over how this problem is
	15	currently addressed in different parts of the OpenSSL library.
	16
	17	=head2 The general case
	18
	19	The OpenSSL library doesn't treat pass phrases in any special way as a general
	20	rule, and trusts the application or user to choose a suitable character set
	21	and stick to that throughout the lifetime of affected objects.
	22	This means that for an object that was encrypted using a pass phrase encoded in
	23	ISO-8859-1, that object needs to be decrypted using a pass phrase encoded in
	24	ISO-8859-1.
	25	Using the wrong encoding is expected to cause a decryption failure.
	26
	27	=head2 PKCS#12
	28
	29	PKCS#12 is a bit different regarding pass phrase encoding.
	30	The standard stipulates that the pass phrase shall be encoded as an ASN.1
	31	BMPString, which consists of the code points of the basic multilingual plane,
	32	encoded in big endian (UCS-2 BE).
	33
	34	OpenSSL tries to adapt to this requirements in one of the following manners:
	35
	36	=over 4
	37
	38	=item 1.
	39
	40	Treats the received pass phrase as UTF-8 encoded and tries to re-encode it to
	41	UTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000
	42	to U+FFFF, but becomes an expansion for any other character), or failing that,
	43	proceeds with step 2.
	44
	45	=item 2.
	46
	47	Assumes that the pass phrase is encoded in ASCII or ISO-8859-1 and
	48	opportunistically prepends each byte with a zero byte to obtain the UCS-2
	49	encoding of the characters, which it stores as a BMPString.
	50
	51	Note that since there is no check of your locale, this may produce UCS-2 /
	52	UTF-16 characters that do not correspond to the original pass phrase characters
	53	for other character sets, such as any ISO-8859-X encoding other than
	54	ISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical"
	55	characters in the 0x80-0x9F range).
	56
	57	=back
	58
	59	OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why
	60	OpenSSL still does this, to be able to read files produced with older versions.
	61
	62	It should be noted that this approach isn't entirely fault free.
	63
55c5c1b6	64	A pass phrase encoded in ISO-8859-2 could very well have a sequence such as
491c3532 RL	65	0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE"
	66	and "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would
	67	be misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN
55c5c1b6	68	SMALL LETTER I WITH DIARESIS) I<if the pass phrase doesn't contain anything that
491c3532 RL	69	would be invalid UTF-8>.
	70	A pass phrase that contains this kind of byte sequence will give a different
	71	outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0.
	72
	73	0x00 0xC3 0x00 0xAF # OpenSSL older than 1.1.0
	74	0x00 0xEF # OpenSSL 1.1.0 and newer
	75
	76	On the same accord, anything encoded in UTF-8 that was given to OpenSSL older
	77	than 1.1.0 was misinterpreted as ISO-8859-1 sequences.
	78
	79	=head2 OSSL_STORE
	80
	81	L<ossl_store(7)> acts as a general interface to access all kinds of objects,
	82	potentially protected with a pass phrase, a PIN or something else.
0189bf2b RL	83	This API stipulates that pass phrases should be UTF-8 encoded, and that any
	84	other pass phrase encoding may give undefined results.
	85	This API relies on the application to ensure UTF-8 encoding, and doesn't check
	86	that this is the case, so what it gets, it will also pass to the underlying
	87	loader.
491c3532 RL	88
	89	=head1 RECOMMENDATIONS
	90
	91	This section assumes that you know what pass phrase was used for encryption,
	92	but that it may have been encoded in a different character encoding than the
	93	one used by your current input method.
	94	For example, the pass phrase may have been used at a time when your default
	95	encoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61
	96	0xEF 0x76 0x65), and you're now in an environment where your default encoding
	97	is UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76
	98	0x65).
	99	Whenever it's mentioned that you should use a certain character encoding, it
	100	should be understood that you either change the input method to use the
	101	mentioned encoding when you type in your pass phrase, or use some suitable tool
	102	to convert your pass phrase from your default encoding to the target encoding.
	103
	104	Also note that the sub-sections below discuss human readable pass phrases.
	105	This is particularly relevant for PKCS#12 objects, where human readable pass
	106	phrases are assumed.
	107	For other objects, it's as legitimate to use any byte sequence (such as a
	108	sequence of bytes from `/dev/urandom` that's been saved away), which makes any
	109	character encoding discussion irrelevant; in such cases, simply use the same
	110	byte sequence as it is.
	111
	112	=head2 Creating new objects
	113
	114	For creating new pass phrase protected objects, make sure the pass phrase is
	115	encoded using UTF-8.
	116	This is default on most modern Unixes, but may involve an effort on other
	117	platforms.
	118	Specifically for Windows, setting the environment variable
22bb8c25	119	B<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt
491c3532 RL	120	converted to UTF-8 (command line and separately prompted pass phrases alike).
	121
	122	=head2 Opening existing objects
	123
	124	For opening pass phrase protected objects where you know what character
	125	encoding was used for the encryption pass phrase, make sure to use the same
	126	encoding again.
	127
	128	For opening pass phrase protected objects where the character encoding that was
	129	used is unknown, or where the producing application is unknown, try one of the
	130	following:
	131
	132	=over 4
	133
	134	=item 1.
	135
55c5c1b6	136	Try the pass phrase that you have as it is in the character encoding of your
491c3532 RL	137	environment.
	138	It's possible that its byte sequence is exactly right.
	139
	140	=item 2.
	141
	142	Convert the pass phrase to UTF-8 and try with the result.
	143	Specifically with PKCS#12, this should open up any object that was created
	144	according to the specification.
	145
	146	=item 3.
	147
	148	Do a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try
	149	with the result.
	150	This differs from the previous attempt because ISO-8859-1 maps directly to
	151	U+0000 to U+00FF, which other non-UTF-8 character sets do not.
	152
	153	This also takes care of the case when a UTF-8 encoded string was used with
	154	OpenSSL older than 1.1.0.
	155	(for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3
	156	0x83 0xC2 0xAF when re-encoded in the naïve manner.
	157	The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the
	158	erroneous/non-compliant encoding used by OpenSSL older than 1.1.0)
	159
	160	=back
	161
	162	=head1 SEE ALSO
	163
	164	L<evp(7)>,
	165	L<ossl_store(7)>,
	166	L<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>,
	167	L<PEM_do_header(3)>,
	168	L<PKCS12_parse(3)>, L<PKCS12_newpass(3)>,
	169	L<d2i_PKCS8PrivateKey_bio(3)>
	170
	171	=head1 COPYRIGHT
	172
	173	Copyright 2018 The OpenSSL Project Authors. All Rights Reserved.
	174
3187791e	175	Licensed under the Apache License 2.0 (the "License"). You may not use
491c3532 RL	176	this file except in compliance with the License. You can obtain a copy
	177	in the file LICENSE in the source distribution or at
	178	L<https://www.openssl.org/source/license.html>.
	179
	180	=cut