--- /dev/null
+RFC4514: String Representation of Distinguished Names
+=====================================================
+
+Introduction
+------------
+
+[RFC4514], obsoletes [RFC2253], which defines the standard string format for
+representing *Distinguished Names* (**DN**s) and *Relative Distinguished Names*
+(**RDN**s) within LDAP, but is used more broadly, e.g., as the string
+representations of issuer and subject names in X.509 certificates.
+
+Distinguished Names (DNs)
+-------------------------
+
+A *Distinguished Name* (**DN**) is a sequence of *Relative Distinguished Names*
+(**RDNs**).
+
+### String representation of DNs
+
+The string representation of a DN consists of strings representing each of its
+`RDNs` separated by commas (`,`). The [RFC4514] specification lists the RDNs
+**in reverse order** (as a result, the most specific elements, such as
+`CommonName`, are output first, and the most general, such as the
+`CountryName`, last). The expected physical order of `RDNs` within a `DN` is
+to list the most general names first.
+
+Empty DNs are represented by an **empty** string.
+
+**Example:** `cn=John Doe,ou=People,dc=example,dc=com`
+
+### Attributes and Values
+
+Each RDN is composed of one or more attribute-value pairs. An attribute-value
+pair is represented as `type=value`. The type names are not case-sensitive.
+
+**Example:** `cn=John Doe`. Here `cn` (CommonName) is the attribute type and
+`John Doe` is the attribute value.
+
+Relative Distinguished Names (RDNs)
+-----------------------------------
+
+A Relative Distinguished Name (RDN) identifies an entry uniquely within its
+immediate superior entry. An RDN can consist of a single attribute-value pair
+or a set of multiple attribute-value pairs.
+
+### Multiple Attribute-Value Pairs in an RDN
+
+The string representation of multiple attribute-value pairs within a single RDN
+separates these by plus signs (`+`). In most cases each RDN consists of just a
+single attribute-value pair. The order of these pairs within an RDN is not
+significant (the ASN.1 abstract syntax designates them as a **SET** rather than
+as a **SEQUENCE**).
+
+**Example:** `cn=John Doe+uid=jdoe`
+
+String Representation Rules
+---------------------------
+
+RFC4514 specifies detailed rules for the string representation of DNs and RDNs
+to handle special characters and ensure unambiguous parsing.
+
+### Escaping Special Characters
+
+Certain characters have special meaning in DN strings and must be escaped if
+they appear in an attribute value. The special characters are:
+
+* Comma (`,`)
+* Plus (`+`)
+* Double Quote (`"`)
+* Backslash (`\`)
+* Less than (`<`)
+* Greater than (`>`)
+* Semicolon (`;`)
+* Leading hash (`#`)
+* Leading or trailing space
+* Optionally escaped: equals (`=`), non-leading hash(`#`)
+
+These characters are escaped by preceding them with a backslash (`\`). Other
+characters **may** be escaped by encoding **each octet** of their UTF-8
+encoding as two hexadecimal digits preceded by a backslash.
+
+**Example:** `cn=Doe\, John` (escaping a comma in the value)
+
+### Hexadecimal Escaping
+
+Any character **may** be represented separately encoding each byte of its UTF-8
+encoding with its hexadecimal value preceded by a backslash (`\`). This is
+particularly applicable for non-ASCII characters.
+
+**Example:** `cn=John\20Doe` (representing a space using its hexadecimal code)
+**Example:** `cn=Doe\2c John` (escaping a comma in the value)
+**Example**: `cn=Виктор \d0\94\d1\83\d1\85\d0\be\d0\b2\d0\bd\d1\8b\d0\b9`
+(escaping each UTF-8 byte of the last name).
+
+### Leading and Trailing Spaces, or a leading hash mark
+
+Leading or trailing spaces and any leading hash mark in an attribute value must
+be escaped. Spaces in the middle of a value do not need to be escaped.
+
+**Example:** `cn=\ John Doe\ ` (escaping leading and trailing spaces)
+
+### Character Sets
+
+The string must first be converted to UTF-8, prior to any escaping. In
+particular some strings in X.509 certificates may be encoded in 16-bit Unicode
+(BMP) form, as a first step, these need to be converted to UTF-8.
+
+Tests should include some examples of non-ASCII, non-UTF8 strings that require
+conversion to UTF-8 as part of encoding, the output should not produce the
+`\U<xxxx>` or `\W<xxxxxxxx>` forms seen in `do_esc_char()`.
+
+Attribute Type Names
+--------------------
+
+The core attribute type names "c", "l", "o", "ou", etc., are specified directly in
+[RFC4519] Sections 2 and 4. These names are not case sensitive. We may wish
+to expand the set of recognised type names to include some that are new in
+[RFC4519] or in the IANA [LDAP descriptor registry].
+
+Only the entries of type "A" (Attribute Type) are potentially relevant. All
+the *mainstream* attribute types are already listed in
+`crypto/objects/objects.txt` and should be already supported:
+
+| Atribute Name | OID | Reference |
+|---|---|---|
+| uid | 0.9.2342.19200300.100.1.1 | [RFC4519] |
+| userId | 0.9.2342.19200300.100.1.1 | [RFC4519] |
+| mail | 0.9.2342.19200300.100.1.3 | [RFC4524] |
+| RFC822Mailbox | 0.9.2342.19200300.100.1.3 | [RFC4524] |
+| DC | 0.9.2342.19200300.100.1.25 | [RFC4519] |
+| domainComponent | 0.9.2342.19200300.100.1.25 | [RFC4519] |
+| email | 1.2.840.113549.1.9.1 | [RFC3280] |
+| emailAddress | 1.2.840.113549.1.9.1 | [RFC3280] |
+| cn | 2.5.4.3 | [RFC4519] |
+| commonName | 2.5.4.3 | [RFC4519] |
+| sn | 2.5.4.4 | [RFC4519] |
+| surname | 2.5.4.4 | [RFC4519] |
+| serialNumber | 2.5.4.5 | [RFC4519] |
+| c | 2.5.4.6 | [RFC4519] |
+| countryName | 2.5.4.6 | [RFC4519] |
+| L | 2.5.4.7 | [RFC4519] |
+| localityName | 2.5.4.7 | [RFC4519] |
+| st | 2.5.4.8 | [RFC4519] |
+| stateOrProvinceName | 2.5.4.8 | [RFC2256] |
+| street | 2.5.4.9 | [RFC4519] |
+| streetAddress | 2.5.4.9 | [RFC2256] |
+| o | 2.5.4.10 | [RFC4519] |
+| organizationName | 2.5.4.10 | [RFC4519] |
+| ou | 2.5.4.11 | [RFC4519] |
+| organizationalUnitName | 2.5.4.11 | [RFC4519] |
+| title | 2.5.4.12 | [RFC4519] |
+| description | 2.5.4.13 | [RFC4519] |
+| businessCategory | 2.5.4.15 | [RFC4519] |
+| postalAddress | 2.5.4.16 | [RFC4519] |
+| postalCode | 2.5.4.17 | [RFC4519] |
+| postOfficeBox | 2.5.4.18 | [RFC4519] |
+| physicalDeliveryOfficeName | 2.5.4.19 | [RFC4519] |
+| telephoneNumber | 2.5.4.20 | [RFC4519] |
+| name | 2.5.4.41 | [RFC4519] |
+| givenName | 2.5.4.42 | [RFC4519] |
+| initials | 2.5.4.43 | [RFC4519] |
+| generationQualifier | 2.5.4.44 | [RFC4519] |
+| pseudonym | 2.5.4.65 | [RFC3280] |
+
+When an attribute type OID is not one of the known values it is represented by
+its dotted-decimal form, and the attribute value must then be encoded with a
+leading `#` character followed by the hexadecimal encoding of the DER encoded
+value, see section 2.4 of [RFC4514]. This form may also be used when the value
+has no suitable string representation.
+
+I have not checked whether we implement case-insensitive string comparison for
+any of the attributes for which this is expected in LDAP. In certificates I do
+not expect to find case-variants of RDNs that need to be considered equivalent
+when comparing subject and issuer DNs.
+
+Parsing of Names
+----------------
+
+The parsing of X.509 directory names (e.g. the `-subj` option of the x509
+command) is performed by the `parse_name()` function in `apps/lib/apps.c`.
+This currently assumes that the output format is that of the legacy
+`X509_NAME_oneline()` function. That format always starts with a `/`
+character. A single slash by itself represents an **empty** RDN sequence.
+
+The `parse_name()` function is used in the `ca, cmp, req, storeutl,` and
+`x509` commands.
+
+If or when we switch to output the [RFC4514] format, we need to also accept
+it on input, therefore, `parse_name()` needs to be updated to treat strings
+starting with a `/` as legacy online forms, and other strings as the RFC4514
+format.
+
+Parsing of [RFC4514] syntax is covered in Section 3. Currently, our parser
+does not support RDNs with ad hoc dott-decimal OIDs, only known named attribute
+types are supported. We should consider allowing explicit dotted decimal OIDs
+and using `X509_NAME_add_entry_by_OBJ()` to add these.
+
+Names in the configuration file
+-------------------------------
+
+In configuration files, we represent directory names as a "section" with one
+"attr = value" line per RDN component. Relevant documentation is in
+`x509v3_config(3)` and `openssl-req(1`). For example:
+
+ subjectAltName = dirName:dir_sect
+ [dir_sect]
+ C = UK
+ O = My Organization
+ OU = My Unit
+ CN = My Name
+
+So in the configuration file, we only have to handle the syntax of the
+individual value elements, the DN as a whole is not parsed. The `string_mask`
+affects the encoding of the various strings, and defaults to `utf8only` (other
+values are not recommended).
+
+Only the `ca` and `req` commands process the string mask, though user
+applications can do the same by calling `ASN1_STRING_set_default_mask_asc()`,
+which is an undocumented and non-thread-safe function. The comments above the
+code say:
+
+ /*-
+ * This function sets the default to various "flavours" of configuration.
+ * based on an ASCII string. Currently this is:
+ * MASK:XXXX : a numerical mask value.
+ * default : use Printable, IA5, T61, BMP, and UTF8 string types
+ * nombstr : any string type except variable-sized BMPStrings or UTF8Strings
+ * pkix : PKIX recommendation in RFC 5280
+ * utf8only : this is the default, use UTF8Strings
+ */
+
+Bottom-line is that for most users the DN components in the configuration file
+are already UTF8-friendly, the only thing to check is whether we support the
+desired set of attribute type names, both in the configuration file and while
+parsing a string representation of a complete DN.
+
+<!-- Links -->
+
+[RFC2253]:
+ <https://www.rfc-editor.org/rfc/rfc2253.html>
+
+[RFC2256]:
+ <https://www.rfc-editor.org/rfc/rfc2256.html>
+
+[RFC3280]:
+ <https://www.rfc-editor.org/rfc/rfc3280.html>
+
+[RFC4514]:
+ <https://www.rfc-editor.org/rfc/rfc4514.html>
+
+[RFC4519]:
+ <https://www.rfc-editor.org/rfc/rfc4519.html>
+
+[RFC4524]:
+ <https://www.rfc-editor.org/rfc/rfc4524.html>
+
+[LDAP descriptor registry]:
+ <https://www.iana.org/assignments/ldap-parameters/ldap-parameters.xhtml#ldap-parameters-3>