From: Ezekiel Newren Date: Wed, 29 Oct 2025 22:19:39 +0000 (+0000) Subject: doc: define unambiguous type mappings across C and Rust X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=5d32e50a3325ac50de558a2de74fa63799546368;p=thirdparty%2Fgit.git doc: define unambiguous type mappings across C and Rust Document other nuances with crossing the FFI boundary. Other language mappings may be added in the future. Signed-off-by: Ezekiel Newren Signed-off-by: Junio C Hamano --- diff --git a/Documentation/technical/unambiguous-types.adoc b/Documentation/technical/unambiguous-types.adoc new file mode 100644 index 0000000000..658a5b578e --- /dev/null +++ b/Documentation/technical/unambiguous-types.adoc @@ -0,0 +1,229 @@ += Unambiguous types + +Most of these mappings are obvious, but there are some nuances and gotchas with +Rust FFI (Foreign Function Interface). + +This document defines clear, one-to-one mappings between primitive types in C, +Rust (and possible other languages in the future). Its purpose is to eliminate +ambiguity in type widths, signedness, and binary representation across +platforms and languages. + +For Git, the only header required to use these unambiguous types in C is +`git-compat-util.h`. + +== Boolean types +[cols="1,1", options="header"] +|=== +| C Type | Rust Type +| bool^1^ | bool +|=== + +== Integer types + +In C, `` (or an equivalent) must be included. + +[cols="1,1", options="header"] +|=== +| C Type | Rust Type +| uint8_t | u8 +| uint16_t | u16 +| uint32_t | u32 +| uint64_t | u64 + +| int8_t | i8 +| int16_t | i16 +| int32_t | i32 +| int64_t | i64 +|=== + +== Floating-point types + +Rust requires IEEE-754 semantics. +In C, that is typically true, but not guaranteed by the standard. + +[cols="1,1", options="header"] +|=== +| C Type | Rust Type +| float^2^ | f32 +| double^2^ | f64 +|=== + +== Size types + +These types represent pointer-sized integers and are typically defined in +`` or an equivalent header. + +Size types should be used any time pointer arithmetic is performed e.g. +indexing an array, describing the number of elements in memory, etc... + +[cols="1,1", options="header"] +|=== +| C Type | Rust Type +| size_t^3^ | usize +| ptrdiff_t^4^ | isize +|=== + +== Character types + +This is where C and Rust don't have a clean one-to-one mapping. A C `char` is +an 8-bit type that is signless (neither signed nor unsigned) which causes +problems with e.g. `make DEVELOPER=1`. Rust's `char` type is an unsigned 32-bit +integer that is used to describe Unicode code points. Even though a C `char` +is the same width as `u8`, `char` should be converted to u8 where it is +describing bytes in memory. If a C `char` is not describing bytes, then it +should be converted to a more accurate unambiguous type. + +While you could specify `char` in the C code and `u8` in Rust code, it's not as +clear what the appropriate type is, but it would work across the FFI boundary. +However the bigger problem comes from code generation tools like cbindgen and +bindgen. When cbindgen see u8 in Rust it will generate uint8_t on the C side +which will cause differ in signedness warnings/errors. Similaraly if bindgen +see `char` on the C side it will generate `std::ffi::c_char` which has its own +problems. + +=== Notes +^1^ This is only true if stdbool.h (or equivalent) is used. + +^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the +platform/arch for C does not follow IEEE-754 then this equivalence does not +hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but +there may be a strange platform/arch where even this isn't true. + +^3^ C also defines uintptr_t, but this should not be used in Git. + +^4^ C also defines ssize_t and intptr_t, but these should not be used in Git. + + +== Problems with std::ffi::c_* types in Rust +TL;DR: They're not guaranteed to match C types for all possible C +compilers/platforms/architectures. + +Only a few of Rust's C FFI types are considered safe and semantically clear to +use: + + +* `c_void` +* `CStr` +* `CString` + +Even then, they should be used sparingly, and only where the semantics match +exactly. + +The std::os::raw::c_* (which is deprecated) directly inherits the problems of +core::ffi, which changes over time and seems to make a best guess at the +correct definition for a given platform/target. This probably isn't a problem +for all platforms that Rust supports currently, but can anyone say that Rust +got it right for all C compilers of all platforms/targets? + +On top of all of that we're targeting an older version of Rust which doesn't +have the latest mappings. + +To give an example: c_long is defined in +footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]] +footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]] + +=== Rust version 1.63.0 + +[source] +---- +mod c_long_definition { + cfg_if! { + if #[cfg(all(target_pointer_width = "64", not(windows)))] { + pub type c_long = i64; + pub type NonZero_c_long = crate::num::NonZeroI64; + pub type c_ulong = u64; + pub type NonZero_c_ulong = crate::num::NonZeroU64; + } else { + // The minimal size of `long` in the C standard is 32 bits + pub type c_long = i32; + pub type NonZero_c_long = crate::num::NonZeroI32; + pub type c_ulong = u32; + pub type NonZero_c_ulong = crate::num::NonZeroU32; + } + } +} +---- + +=== Rust version 1.89.0 + +[source] +---- +mod c_long_definition { + crate::cfg_select! { + any( + all(target_pointer_width = "64", not(windows)), + // wasm32 Linux ABI uses 64-bit long + all(target_arch = "wasm32", target_os = "linux") + ) => { + pub(super) type c_long = i64; + pub(super) type c_ulong = u64; + } + _ => { + // The minimal size of `long` in the C standard is 32 bits + pub(super) type c_long = i32; + pub(super) type c_ulong = u32; + } + } +} +---- + +Even for the cases where C types are correctly mapped to Rust types via +std::ffi::c_* there are still problems. Let's take c_char for example. On some +platforms it's u8 on others it's i8. + +=== Subtraction underflow in debug mode + +The following code will panic in debug on platforms that define c_char as u8, +but won't if it's an i8. + +[source] +---- +let mut x: std::ffi::c_char = 0; +x -= 1; +---- + +=== Inconsistent shift behavior + +`x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8. + +[source] +---- +let mut x: std::ffi::c_char = 0x80; +x >>= 1; +---- + +=== Equality fails to compile on some platforms + +The following will not compile on platforms that define c_char as i8, but will +if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get +a warning on platforms that use u8 and a clean compilation where i8 is used. + +[source] +---- +let mut x: std::ffi::c_char = 0x61; +assert_eq!(x, b'a'); +---- + +== Enum types +Rust enum types should not be used as FFI types. Rust enum types are more like +C union types than C enum's. For something like: + +[source] +---- +#[repr(C, u8)] +enum Fruit { + Apple, + Banana, + Cherry, +} +---- + +It's easy enough to make sure the Rust enum matches what C would expect, but a +more complex type like. + +[source] +---- +enum HashResult { + SHA1([u8; 20]), + SHA256([u8; 32]), +} +---- + +The Rust compiler has to add a discriminant to the enum to distinguish between +the variants. The width, location, and values for that discriminant is up to +the Rust compiler and is not ABI stable.