man/man7/pathname.7: Pathnames are opaque C strings
On Mon, Jan 27, 2025 at 07:27:59PM +0100, наб wrote:
> Skimming the thread: UNIX paths are sequences of non-NUL bytes.
>
> It is never correct to expect to be able to have a (parse, unparse)
> operation pair for which unparse(parse(x)) = x for path x.
>
> It's obviously wrong to reject a pathname just because you dont like it.
>
> Thus, when displaying a path, either (a) dump it directly to the output
> (the user has configured their display device to understand the paths they use),
> or if that's not possible (b) setlocale(LC_ALL, "") + mbrtowc() loop
> and render the result (applying usual ?/� substitutions for mbrtowc()
> errors makes sense here).
>
> There are very few operations on paths that are actually reasonable
> to do, ever; those are: appending stuff, prepending stuff
> (this is just appending stuff with the arguments backwards),
> and cleaving at /es;
> the "stuff" better be copied whole-sale from some other path
> or an unprocessed argument (or, sure, the PFCS).
>
> If you're getting bytes to append to a path, do that directly.
>
> If you're getting characters to append to a path,
> then wctomb(3) is the only non-invalid solution,
> since that (obviously) turns characters into bytes in the current
> locale, which (ex def) is the operation desired.
>
> I don't understand what the UTF-32 dance is supposed to be.
>
> If you're recommending transcoding paths, don't.
>
> To re-iterate: paths are not character sequences.
> They do not represent characters.
> You can't meaningfully coerce them thusly without loss of precision
> (this is ok to do for display! and nothing else).
> If at any point you find yourself turning wchar_t -> char
> you are doing something wrong;
> if you find yourself doing char -> wchar_t for anything beside display
> you should probably reconsider.
>
> This is different under Win32 of course. But that concerns us naught.