From da9a6c6ebd84d2910c04b0b546d58748c4b34185 Mon Sep 17 00:00:00 2001 From: Viktor Dukhovni Date: Mon, 28 Jul 2025 17:45:18 +1000 Subject: [PATCH] Add design doc for rfc4514 DN output format MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Reviewed-by: Saša Nedvědický Reviewed-by: Tomas Mraz Reviewed-by: Todd Short (Merged from https://github.com/openssl/openssl/pull/28104) --- doc/designs/rfc4514.md | 258 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 258 insertions(+) create mode 100644 doc/designs/rfc4514.md diff --git a/doc/designs/rfc4514.md b/doc/designs/rfc4514.md new file mode 100644 index 00000000000..93b707e2c7f --- /dev/null +++ b/doc/designs/rfc4514.md @@ -0,0 +1,258 @@ +RFC4514: String Representation of Distinguished Names +===================================================== + +Introduction +------------ + +[RFC4514], obsoletes [RFC2253], which defines the standard string format for +representing *Distinguished Names* (**DN**s) and *Relative Distinguished Names* +(**RDN**s) within LDAP, but is used more broadly, e.g., as the string +representations of issuer and subject names in X.509 certificates. + +Distinguished Names (DNs) +------------------------- + +A *Distinguished Name* (**DN**) is a sequence of *Relative Distinguished Names* +(**RDNs**). + +### String representation of DNs + +The string representation of a DN consists of strings representing each of its +`RDNs` separated by commas (`,`). The [RFC4514] specification lists the RDNs +**in reverse order** (as a result, the most specific elements, such as +`CommonName`, are output first, and the most general, such as the +`CountryName`, last). The expected physical order of `RDNs` within a `DN` is +to list the most general names first. + +Empty DNs are represented by an **empty** string. + +**Example:** `cn=John Doe,ou=People,dc=example,dc=com` + +### Attributes and Values + +Each RDN is composed of one or more attribute-value pairs. An attribute-value +pair is represented as `type=value`. The type names are not case-sensitive. + +**Example:** `cn=John Doe`. Here `cn` (CommonName) is the attribute type and +`John Doe` is the attribute value. + +Relative Distinguished Names (RDNs) +----------------------------------- + +A Relative Distinguished Name (RDN) identifies an entry uniquely within its +immediate superior entry. An RDN can consist of a single attribute-value pair +or a set of multiple attribute-value pairs. + +### Multiple Attribute-Value Pairs in an RDN + +The string representation of multiple attribute-value pairs within a single RDN +separates these by plus signs (`+`). In most cases each RDN consists of just a +single attribute-value pair. The order of these pairs within an RDN is not +significant (the ASN.1 abstract syntax designates them as a **SET** rather than +as a **SEQUENCE**). + +**Example:** `cn=John Doe+uid=jdoe` + +String Representation Rules +--------------------------- + +RFC4514 specifies detailed rules for the string representation of DNs and RDNs +to handle special characters and ensure unambiguous parsing. + +### Escaping Special Characters + +Certain characters have special meaning in DN strings and must be escaped if +they appear in an attribute value. The special characters are: + +* Comma (`,`) +* Plus (`+`) +* Double Quote (`"`) +* Backslash (`\`) +* Less than (`<`) +* Greater than (`>`) +* Semicolon (`;`) +* Leading hash (`#`) +* Leading or trailing space +* Optionally escaped: equals (`=`), non-leading hash(`#`) + +These characters are escaped by preceding them with a backslash (`\`). Other +characters **may** be escaped by encoding **each octet** of their UTF-8 +encoding as two hexadecimal digits preceded by a backslash. + +**Example:** `cn=Doe\, John` (escaping a comma in the value) + +### Hexadecimal Escaping + +Any character **may** be represented separately encoding each byte of its UTF-8 +encoding with its hexadecimal value preceded by a backslash (`\`). This is +particularly applicable for non-ASCII characters. + +**Example:** `cn=John\20Doe` (representing a space using its hexadecimal code) +**Example:** `cn=Doe\2c John` (escaping a comma in the value) +**Example**: `cn=Виктор \d0\94\d1\83\d1\85\d0\be\d0\b2\d0\bd\d1\8b\d0\b9` +(escaping each UTF-8 byte of the last name). + +### Leading and Trailing Spaces, or a leading hash mark + +Leading or trailing spaces and any leading hash mark in an attribute value must +be escaped. Spaces in the middle of a value do not need to be escaped. + +**Example:** `cn=\ John Doe\ ` (escaping leading and trailing spaces) + +### Character Sets + +The string must first be converted to UTF-8, prior to any escaping. In +particular some strings in X.509 certificates may be encoded in 16-bit Unicode +(BMP) form, as a first step, these need to be converted to UTF-8. + +Tests should include some examples of non-ASCII, non-UTF8 strings that require +conversion to UTF-8 as part of encoding, the output should not produce the +`\U` or `\W` forms seen in `do_esc_char()`. + +Attribute Type Names +-------------------- + +The core attribute type names "c", "l", "o", "ou", etc., are specified directly in +[RFC4519] Sections 2 and 4. These names are not case sensitive. We may wish +to expand the set of recognised type names to include some that are new in +[RFC4519] or in the IANA [LDAP descriptor registry]. + +Only the entries of type "A" (Attribute Type) are potentially relevant. All +the *mainstream* attribute types are already listed in +`crypto/objects/objects.txt` and should be already supported: + +| Atribute Name | OID | Reference | +|---|---|---| +| uid | 0.9.2342.19200300.100.1.1 | [RFC4519] | +| userId | 0.9.2342.19200300.100.1.1 | [RFC4519] | +| mail | 0.9.2342.19200300.100.1.3 | [RFC4524] | +| RFC822Mailbox | 0.9.2342.19200300.100.1.3 | [RFC4524] | +| DC | 0.9.2342.19200300.100.1.25 | [RFC4519] | +| domainComponent | 0.9.2342.19200300.100.1.25 | [RFC4519] | +| email | 1.2.840.113549.1.9.1 | [RFC3280] | +| emailAddress | 1.2.840.113549.1.9.1 | [RFC3280] | +| cn | 2.5.4.3 | [RFC4519] | +| commonName | 2.5.4.3 | [RFC4519] | +| sn | 2.5.4.4 | [RFC4519] | +| surname | 2.5.4.4 | [RFC4519] | +| serialNumber | 2.5.4.5 | [RFC4519] | +| c | 2.5.4.6 | [RFC4519] | +| countryName | 2.5.4.6 | [RFC4519] | +| L | 2.5.4.7 | [RFC4519] | +| localityName | 2.5.4.7 | [RFC4519] | +| st | 2.5.4.8 | [RFC4519] | +| stateOrProvinceName | 2.5.4.8 | [RFC2256] | +| street | 2.5.4.9 | [RFC4519] | +| streetAddress | 2.5.4.9 | [RFC2256] | +| o | 2.5.4.10 | [RFC4519] | +| organizationName | 2.5.4.10 | [RFC4519] | +| ou | 2.5.4.11 | [RFC4519] | +| organizationalUnitName | 2.5.4.11 | [RFC4519] | +| title | 2.5.4.12 | [RFC4519] | +| description | 2.5.4.13 | [RFC4519] | +| businessCategory | 2.5.4.15 | [RFC4519] | +| postalAddress | 2.5.4.16 | [RFC4519] | +| postalCode | 2.5.4.17 | [RFC4519] | +| postOfficeBox | 2.5.4.18 | [RFC4519] | +| physicalDeliveryOfficeName | 2.5.4.19 | [RFC4519] | +| telephoneNumber | 2.5.4.20 | [RFC4519] | +| name | 2.5.4.41 | [RFC4519] | +| givenName | 2.5.4.42 | [RFC4519] | +| initials | 2.5.4.43 | [RFC4519] | +| generationQualifier | 2.5.4.44 | [RFC4519] | +| pseudonym | 2.5.4.65 | [RFC3280] | + +When an attribute type OID is not one of the known values it is represented by +its dotted-decimal form, and the attribute value must then be encoded with a +leading `#` character followed by the hexadecimal encoding of the DER encoded +value, see section 2.4 of [RFC4514]. This form may also be used when the value +has no suitable string representation. + +I have not checked whether we implement case-insensitive string comparison for +any of the attributes for which this is expected in LDAP. In certificates I do +not expect to find case-variants of RDNs that need to be considered equivalent +when comparing subject and issuer DNs. + +Parsing of Names +---------------- + +The parsing of X.509 directory names (e.g. the `-subj` option of the x509 +command) is performed by the `parse_name()` function in `apps/lib/apps.c`. +This currently assumes that the output format is that of the legacy +`X509_NAME_oneline()` function. That format always starts with a `/` +character. A single slash by itself represents an **empty** RDN sequence. + +The `parse_name()` function is used in the `ca, cmp, req, storeutl,` and +`x509` commands. + +If or when we switch to output the [RFC4514] format, we need to also accept +it on input, therefore, `parse_name()` needs to be updated to treat strings +starting with a `/` as legacy online forms, and other strings as the RFC4514 +format. + +Parsing of [RFC4514] syntax is covered in Section 3. Currently, our parser +does not support RDNs with ad hoc dott-decimal OIDs, only known named attribute +types are supported. We should consider allowing explicit dotted decimal OIDs +and using `X509_NAME_add_entry_by_OBJ()` to add these. + +Names in the configuration file +------------------------------- + +In configuration files, we represent directory names as a "section" with one +"attr = value" line per RDN component. Relevant documentation is in +`x509v3_config(3)` and `openssl-req(1`). For example: + + subjectAltName = dirName:dir_sect + [dir_sect] + C = UK + O = My Organization + OU = My Unit + CN = My Name + +So in the configuration file, we only have to handle the syntax of the +individual value elements, the DN as a whole is not parsed. The `string_mask` +affects the encoding of the various strings, and defaults to `utf8only` (other +values are not recommended). + +Only the `ca` and `req` commands process the string mask, though user +applications can do the same by calling `ASN1_STRING_set_default_mask_asc()`, +which is an undocumented and non-thread-safe function. The comments above the +code say: + + /*- + * This function sets the default to various "flavours" of configuration. + * based on an ASCII string. Currently this is: + * MASK:XXXX : a numerical mask value. + * default : use Printable, IA5, T61, BMP, and UTF8 string types + * nombstr : any string type except variable-sized BMPStrings or UTF8Strings + * pkix : PKIX recommendation in RFC 5280 + * utf8only : this is the default, use UTF8Strings + */ + +Bottom-line is that for most users the DN components in the configuration file +are already UTF8-friendly, the only thing to check is whether we support the +desired set of attribute type names, both in the configuration file and while +parsing a string representation of a complete DN. + + + +[RFC2253]: + + +[RFC2256]: + + +[RFC3280]: + + +[RFC4514]: + + +[RFC4519]: + + +[RFC4524]: + + +[LDAP descriptor registry]: + -- 2.47.3