Daniel Stenberg [Wed, 14 Sep 2022 07:18:30 +0000 (09:18 +0200)]
urlapi: detect scheme better when not guessing
When the parser is not allowed to guess scheme, it should consider the
word ending at the first colon to be the scheme, independently of number
of slashes.
The parser now checks that the scheme is known before it counts slashes,
to improve the error messge for URLs with unknown schemes and maybe no
slashes.
When following redirects, no scheme guessing is allowed and therefore
this change effectively prevents redirects to unknown schemes such as
"data".
Daniel Stenberg [Mon, 12 Sep 2022 07:57:01 +0000 (09:57 +0200)]
urldata: use a curl_prot_t type for storing protocol bits
This internal-use-only storage type can be bumped to a curl_off_t once
we need to use bit 32 as the previous 'unsigned int' can no longer hold
them all then.
The websocket protocols take bit 30 and 31 so they are the last ones
that fit within 32 bits - but cannot properly be exported through APIs
since those use *signed* 32 bit types (long) in places.
formdata: fix warning: 'CURLformoption' is promoted to 'int'
curl/lib/formdata.c: In function 'FormAdd':
curl/lib/formdata.c:249:31: warning: 'CURLformoption' is promoted to 'int' when passed through '...'
249 | option = va_arg(params, CURLformoption);
| ^
curl/lib/formdata.c:249:31: note: (so you should pass 'int' not 'CURLformoption' to 'va_arg')
curl/lib/formdata.c:249:31: note: if this code is reached, the program will abort
Prior to this commit, non-persistent pointers were being used to store
sessions. When a WOLFSSL object was then freed, that freed the session
it owned, and thus invalidated the pointer held in curl's cache. This
commit makes it so we get a persistent (deep copied) session pointer
that we then add to the cache. Accordingly, wolfssl_session_free, which
was previously a no-op, now needs to actually call SSL_SESSION_free.
Daniel Stenberg [Tue, 13 Sep 2022 07:17:53 +0000 (09:17 +0200)]
docs: use "WebSocket" in singular
This is how the RFC calls the protocol. Also rename the file in docs/ to
WEBSOCKET.md in uppercase to match how we have done it for many other
protocol docs in similar fashion.
content_encoding: use writer struct subclasses for different encodings
The variable-sized encoding-specific storage of a struct contenc_writer
currently relies on void * alignment that may be insufficient with
regards to the specific storage fields, although having not caused any
problems yet.
In addition, gcc 11.3 issues a warning on access to fields of partially
allocated structures that can occur when the specific storage size is 0:
content_encoding.c: In function ‘Curl_build_unencoding_stack’:
content_encoding.c:980:21: warning: array subscript ‘struct contenc_writer[0]’ is partly outside array bounds of ‘unsigned char[16]’ [-Warray-bounds]
980 | writer->handler = handler;
| ~~~~~~~~~~~~~~~~^~~~~~~~~
In file included from content_encoding.c:49:
memdebug.h:115:29: note: referencing an object of size 16 allocated by ‘curl_dbg_calloc’
115 | #define calloc(nbelem,size) curl_dbg_calloc(nbelem, size, __LINE__, __FILE__)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
content_encoding.c:977:60: note: in expansion of macro ‘calloc’
977 | struct contenc_writer *writer = (struct contenc_writer *)calloc(1, sz);
To solve both these problems, the current commit replaces the
contenc_writer/params structure pairs by "subclasses" of struct
contenc_writer. These are structures that contain a contenc_writer at
offset 0. Proper field alignment is therefore handled by the compiler and
full structure allocation is performed, silencing the warnings.
Daniel Stenberg [Fri, 9 Sep 2022 13:11:14 +0000 (15:11 +0200)]
tests: add websockets tests
- add websockets support to sws
- 2300: first very basic websockets test
- 2301: first libcurl test for ws (not working yet)
- 2302: use the ws callback
- 2303: test refused upgrade
Daniel Stenberg [Fri, 9 Sep 2022 10:46:01 +0000 (12:46 +0200)]
strtoofft: after space, there cannot be a control code
With the change from ISSPACE() to ISBLANK() this function no longer
deals with (ignores) control codes the same way, which could lead to
this function returning unexpected values like in the case of
"Content-Length: \r-12354".
Daniel Stenberg [Wed, 7 Sep 2022 07:51:51 +0000 (09:51 +0200)]
headers: reset the requests counter at transfer start
If not, reusing an easy handle to do a subsequent transfer would
continue the counter from the previous invoke, which then would make use
of the header API difficult/impossible as the request counter
mismatched.
Add libtest 1947 to verify.
Reported-by: Andrew Lambert
Fixes #9424
Closes #9447
Daniel Stenberg [Wed, 7 Sep 2022 13:41:03 +0000 (15:41 +0200)]
http2: make nghttp2 less picky about field whitespace
In nghttp2 1.49.0 it returns error on leading and trailing whitespace in
header fields according to language in the recently shipped RFC 9113.
nghttp2 1.50.0 introduces an option to switch off this strict check and
this change enables this option by default which should make curl behave
more similar to how it did with nghttp2 1.48.0 and earlier.
We might want to consider making this an option in the future.
Michael Heimpold [Wed, 24 Aug 2022 16:58:02 +0000 (18:58 +0200)]
ftp: ignore a 550 response to MDTM
The 550 is overused as a return code for multiple error case, e.g.
file not found and/or insufficient permissions to access the file.
So we cannot fail hard in this case.
Adjust test 511 since we now fail later.
Add new test 3027 which check that when MDTM failed, but the file could
actually be retrieved, that in this case no filetime is provided.
Reported-by: Michael Heimpold
Fixes #9357
Closes #9387
Daniel Stenberg [Thu, 1 Sep 2022 08:16:24 +0000 (10:16 +0200)]
urlapi: leaner with fewer allocs
Slightly faster with more robust code. Uses fewer and smaller mallocs.
- remove two fields from the URL handle struct
- reduce copies and allocs
- use dynbuf buffers more instead of custom malloc + copies
- uses dynbuf to build the host name in reduces serial alloc+free within
the same function.
- move dedotdotify into urlapi.c and make it static, not strdup the input
and optimize it by checking for . and / before using strncmp
- remove a few strlen() calls
- add Curl_dyn_setlen() that can "trim" an existing dynbuf
Jay Satiro [Mon, 29 Aug 2022 07:59:23 +0000 (03:59 -0400)]
setup-win32: no longer define UNICODE/_UNICODE implicitly
- If UNICODE or _UNICODE is defined but the other isn't then error
instead of implicitly defining it.
As Marcel pointed out it is too late at this point to make such a define
because Windows headers may already be included, so likely it never
worked. We never noticed because build systems that can make Windows
Unicode builds always define both. If one is defined but not the other
then something went wrong during the build configuration.
Daniel Stenberg [Mon, 5 Sep 2022 21:21:15 +0000 (23:21 +0200)]
misc: ISSPACE() => ISBLANK()
Instances of ISSPACE() use that should rather use ISBLANK(). I think
somewhat carelessly used because it sounds as if it checks for space or
whitespace, but also includes %0a to %0d.
For parsing purposes, we should only accept what we must and not be
overly liberal. It leads to surprises and surprises lead to bad things.
Daniel Stenberg [Thu, 1 Sep 2022 07:23:22 +0000 (09:23 +0200)]
NPN: remove support for and use of
Next Protocol Negotiation is a TLS extension that was created and used
for agreeing to use the SPDY protocol (the precursor to HTTP/2) for
HTTPS. In the early days of HTTP/2, before the spec was finalized and
shipped, the protocol could be enabled using this extension with some
servers.
curl supports the NPN extension with some TLS backends since then, with
a command line option `--npn` and in libcurl with
`CURLOPT_SSL_ENABLE_NPN`.
HTTP/2 proper is made to use the ALPN (Application-Layer Protocol
Negotiation) extension and the NPN extension has no purposes
anymore. The HTTP/2 spec was published in May 2015.
Today, use of NPN in the wild should be extremely rare and most likely
totally extinct. Chrome removed NPN support in Chrome 51, shipped in
June 2016. Removed in Firefox 53, April 2017.
Samuel Henrique [Thu, 1 Sep 2022 21:32:49 +0000 (22:32 +0100)]
configure: fail if '--without-ssl' + explicit parameter for an ssl lib
A side effect of a previous change to configure (576e507c78bdd2ec88)
exposed a non-critical issue that can happen if configure is called with
both '--without-ssl' and some parameter setting the use of a ssl library
(e.g. --with-gnutls). The configure script would end up assuming this is
a MultiSSL build, due to the way the case statement is written.
I have changed the order of the variables in the string concatenation
for the case statement and also tweaked the options so that
--without-ssl never turns the build into a MultiSSL one and also clearly
stating that there are conflicting parameters if the user sets it like
described above.
Daniel Stenberg [Fri, 2 Sep 2022 12:24:25 +0000 (14:24 +0200)]
tests/certs/scripts: insert standard curl source headers
... including the SPDX-License-Identifier.
These omissions were not detected by the RUEUSE CI job nor the copyright.pl
scanners because we have a general wildcard in .reuse/dep5 for
"tests/certs/*".
Reported-by: Samuel Henrique
Fixes #9417
Closes #9420
Samuel Henrique [Fri, 2 Sep 2022 11:02:02 +0000 (12:02 +0100)]
CURLOPT_WILDCARDMATCH.3: Fix backslash escaping under single quotes
Lintian (on Debian) has been complaining about this for a while but
I didn't bother initially as the groff parser that we use is not
affected by this.
But I have now noticed that the online manpage is affected by it:
https://curl.se/libcurl/c/CURLOPT_WILDCARDMATCH.html
(I'm using double quotes for quoting-only down below)
The section that should be parsed as "'\'" ends up being parsed as
"'´".
This is due to roffit not parsing "'\\'" correctly, which is fine
as the "correct" way of writing "'\'" is "'\e'" instead.
Note that this fix is not enough to fix the online manpage at
curl's website, as roffit seems to parse it wrongly either way.
My intent is to at least fix the manpage so that roffit can
be changed to parse "'\e'" correctly (although I suggest making
roffit parse both ways correctly, since that's what groff does).
More details at:
https://bugs.debian.org/966803
https://salsa.debian.org/lintian/lintian/-/blob/930b18e4b28b7540253f458ef42a884cca7965c3/tags/a/acute-accent-in-manual-page.tag
Daniel Stenberg [Wed, 31 Aug 2022 13:57:46 +0000 (15:57 +0200)]
tool_operate: prevent over-queuing in parallel mode
When doing a huge amount of parallel transfers, we must not add them to
the per_transfer list frivolously since they all use memory after all.
This was previous done without really considering millions or billions
of transfers. Massive parallelism would use a lot of memory for no good
purpose.
The queue is now limited to twice the paralleism number.
This makes the 'Qd' value in the parallel progress meter mostly useless
for users, but works for now for us as a debug display.
Reported-by: justchen1369 on github
Fixes #8933
Closes #9389
Viktor Szakats [Wed, 31 Aug 2022 11:57:24 +0000 (11:57 +0000)]
cmake: fix original MinGW builds
1. Re-enable `HAVE_GETADDRINFO` detection on Windows
Commit d08ee3c83d6bd416aef62ff844c98e47c4682429 (in 2013) added logic
that automatically assumed `getaddrinfo()` to be present for builds
with IPv6 enabled. As it turns out, certain toolchains (e.g. original
MinGW) by default target older Windows versions, and thus do not
support `getaddrinfo()` out of the box. The issue was masked for
a while by CMake builds forcing a newer Windows version, but that
logic got deleted in commit 8ba22ffb2030ed91312fc8634e29516cdf0a9761.
Since then, some CI builds started failing due to IPv6 enabled,
`HAVE_GETADDRINFO` set, but `getaddrinfo()` in fact missing.
It also turns out that IPv6 works without `getaddrinfo()` since commit 67a08dca27a6a07b36c7f97252e284ca957ff1a5 (from 2019, via #4662). So,
to resolve all this, we can now revert the initial commit, thus
restoring `getaddrinfo()` detection and support IPv6 regardless of its
outcome.
Reported-by: Daniel Stenberg
2. Omit `bcrypt` with original MinGW
Original (aka legacy/old) MinGW versions do not support `bcrypt`
(introduced with Vista). We already have logic to handle that in
`lib/rand.c` and autotools builds, where we do not call the
unsupported API and do not link `bcrypt`, respectively, when using
original MinGW.
This patch ports that logic to CMake, fixing the link error:
`c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: cannot find -lbcrypt`