read_id() function will decode and return 4294959095 value, and a subsequent
assignment:
int maxsize;
[...]
maxsize = read_id(&data, 0);
will experience an integer overflow on plaforms where signed int has 4-byte
size (e.g. x86_64).
The same flaw is possible at the next line:
allsize = read_id(&data, 0);
Subsequent arithmetics will interpreter the value as a very large negative
number, possibly doing wrong decisions:
maxsize += 5; /* so we can read the next schema of an array */
if (maxsize > allsize)
maxsize = allsize;
and finally, the negative value passed to solv_calloc():
buf = solv_calloc(maxsize + DATA_READ_CHUNK + 4, 1); /* 4 extra bytes to detect overflows */
will be coerced to an unsigned type (size_t) leading to allocating a smaller
buffer then intended. Then writing to the small buffer will experience a heap
buffer overflow:
l = maxsize;
if (l < DATA_READ_CHUNK)
l = DATA_READ_CHUNK;
if (l > allsize)
l = allsize;
if (!l || fread(buf, l, 1, data.fp) != 1)
This flaw can be demostrated by passing that solv file to the dumpsolv tool which
will crash if compiled with ASAN:
$ /tmp/b/tools/dumpsolv /tmp/vuln_1_101_1_negative_maxsize.solv
=================================================================
==17608==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7c0a2ede00b1 at pc 0x7fea30451468 b
p 0x7ffe07220a50 sp 0x7ffe07220220
WRITE of size 8192 at 0x7c0a2ede00b1 thread T0
#0 0x7fea30451467 in fread.part.0 (/lib64/libasan.so.8+0x51467) (BuildId: 80bfc4ae44fdec6ef5fecfb01 e2b57d28660991c)
#1 0x7fea3028eef1 in repo_add_solv /home/test/libsolv/src/repo_solv.c:1034
#2 0x0000004041cc in main /home/test/libsolv/tools/dumpsolv.c:471
#3 0x7fea3003c680 in __libc_start_call_main (/lib64/libc.so.6+0x3680) (BuildId: c04494d63bca865bedf571a4075ef8867ccf9fa9)
#4 0x7fea3003c797 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3797) (BuildId: c04494d63bca865bedf571a4075ef8867ccf9fa9)
#5 0x000000400694 in _start (/tmp/b/tools/dumpsolv+0x400694) (BuildId: 0a70b5b14e5cd81f90a309bb2ff3219dfbf30bb8)
0x7c0a2ede00b1 is located 0 bytes after 1-byte region [0x7c0a2ede00b0,0x7c0a2ede00b1)
allocated by thread T0 here:
#0 0x7fea304ef41f in malloc (/lib64/libasan.so.8+0xef41f) (BuildId: 80bfc4ae44fdec6ef5fecfb01e2b57d28660991c)
#1 0x7fea302e4b4c in solv_calloc /home/test/libsolv/src/util.c:77
#2 0x7fea3028ee38 in repo_add_solv /home/test/libsolv/src/repo_solv.c:1025
#3 0x0000004041cc in main /home/test/libsolv/tools/dumpsolv.c:471
#4 0x7fea3003c680 in __libc_start_call_main (/lib64/libc.so.6+0x3680) (BuildId: c04494d63bca865bedf571a4075ef8867ccf9fa9)
#5 0x7fea3003c797 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3797) (BuildId: c04494d63bca865bedf571a4075ef8867ccf9fa9)
#6 0x000000400694 in _start (/tmp/b/tools/dumpsolv+0x400694) (BuildId: 0a70b5b14e5cd81f90a309bb2ff3219dfbf30bb8)
SUMMARY: AddressSanitizer: heap-buffer-overflow /home/test/libsolv/src/repo_solv.c:1034 in repo_add_solv
This patch catches the integer overflow, sets an error and jumps to the end of
the function just after deallocation of the buffer (which would contain an
undefined pointer). This patch also handles a possible integer overflow at
"maxsize += 5" line.
I originally wanted to replace read_id() with read_u32(), but
complemtary repowriter_write() function also stored the value as
a signed integer, so I guess the the Id type is inteded there.
There are probably other ways how to fix it, like passing INT_MAX-5
limit to read_id(), though the error message would be less
understandable.
It's also possible to reject this patch with an explanation that loading
untrusted solv files is not supported. Though some kind of
fortification would be welcomed by people who debug solver problems
from reported solv files.
Petr Písař [Wed, 22 Apr 2026 07:18:29 +0000 (09:18 +0200)]
Fix a buffer overflow when copying SHA-384/512 checksum from a Debian repository
When parsing Debian repository, control2solvable() copies a package
checksum string from the repository into a stack-allocated "char
checksum[32 * 2 + 1]" array.
If the repository defined a SHA384 or SHA512 tag, a buffer overflow
occured (as can be seen when compiling libsolv with CFLAGS='-O0 -g
-fsanitize=address') because those tag values are longer:
$ cat /tmp/Packages
Package: p
Version: 1
Architecture: all
SHA512: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
$ /tmp/b/tools/deb2solv -r /tmp/Packages
=================================================================
==3695==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7b685ecf0071 at pc 0x7f6861683722 b
p 0x7fff37e3e7a0 sp 0x7fff37e3df60
WRITE of size 129 at 0x7b685ecf0071 thread T0
#0 0x7f6861683721 in strcpy.part.0 (/lib64/libasan.so.8+0x83721) (BuildId: 80bfc4ae44fdec6ef5fecfb01e2b57d28660991c)
#1 0x7f6861d7f34d in control2solvable /home/test/libsolv/ext/repo_deb.c:491
#2 0x7f6861d804ea in repo_add_debpackages /home/test/libsolv/ext/repo_deb.c:622
#3 0x000000400fd5 in main /home/test/libsolv/tools/deb2solv.c:134
#4 0x7f686123c680 in __libc_start_call_main (/lib64/libc.so.6+0x3680) (BuildId: c04494d63bca865bedf571a4075ef8867ccf9fa9)
#5 0x7f686123c797 in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3797) (BuildId: c04494d63bca865bedf571a4075ef8867ccf9fa9)
#6 0x000000400694 in _start (/tmp/b/tools/deb2solv+0x400694) (BuildId: a3350337819a51edd0c75293970d3458b5033bc9)
Address 0x7b685ecf0071 is located in stack of thread T0 at offset 113 in frame
#0 0x7f6861d7de2a in control2solvable /home/test/libsolv/ext/repo_deb.c:365
This frame has 1 object(s):
[48, 113) 'checksum' (line 371) <== Memory access at offset 113 overflows this variable
This patch fixes it by enlarging the buffer to accomodate the longest
supported digest string.
Alberto Ruiz [Wed, 18 Mar 2026 03:54:01 +0000 (03:54 +0000)]
Extract makeevr helper from makeevr_atts with memcpy optimization
Split makeevr_atts into a parsing wrapper and a makeevr helper that
takes pre-parsed epoch/ver/rel strings. The helper uses memcpy with
precomputed lengths instead of strcpy+strlen, avoiding redundant
string walks when building the EVR string. This also enables callers
that already have the parsed components to skip the attribute scan.
Alberto Ruiz [Wed, 18 Mar 2026 03:53:57 +0000 (03:53 +0000)]
Use hash table for XML element name lookup in solv_xmlparser
Replace the per-state linked list traversal with an open-addressing
hash table keyed on (fromstate, element_name). The old approach did
a linear scan with strcmp over all valid elements for the current
state on every XML start tag, which for STATE_SOLVABLE meant ~40
strcmp calls per tag. The hash table reduces this to typically one
hash probe plus one strcmp for confirmation.
Alberto Ruiz [Wed, 18 Mar 2026 07:23:09 +0000 (07:23 +0000)]
Add hash table to dirpool for O(1) directory lookup
Profiling repo2solv with perf on a full primary+filelists workload
(Fedora 42, ~75K packages) revealed that dirpool_add_dir consumed
48.6% of all cycles and caused 92% of last-level cache misses.
The root cause: directories sharing a parent are stored across
multiple non-contiguous blocks in the dirs[] array, linked through
the dirtraverse[] auxiliary array. Looking up whether a (parent,
component) pair already exists required scanning every block for
that parent -- O(B*C) where B is the number of blocks and C the
average block size. Popular parents like /usr accumulate hundreds
of blocks from filelists data, causing cache-hostile pointer chasing
across scattered memory.
Replace the dirtraverse-based lookup with an open-addressed hash
table mapping (parent, comp) pairs to directory ids. The hash is
computed via relhash() (same scheme used for reldeps), and parent
is verified by a short backward walk to the block header -- always
within a few entries of the matched slot, so it stays in the same
cache line.
The hash table is built on demand and resized at 50% load factor.
The dirtraverse array is kept for dirpool_child/dirpool_sibling
traversal APIs but is no longer needed for the add_dir fast path.
Benchmark results (Fedora 42 metadata, 5 runs, median wall-clock):
primary + filelists: 10,256 ms vs 20,028 ms baseline (49% faster)
primary only: 2,629 ms vs 2,546 ms baseline (within noise)
perf profile after the change shows dirpool_add_dir dropped from
48.6% to 1.78% of cycles. Output is byte-identical to baseline.
rewrite_suse_dep: always make a copy of the dependency string
Unfortunately we cannot call strn2id on a buffer coming from
id2str, as strn2id may reallocate the string space and then the copy
of the string will lead to access of freed memory.
Hijacking the isdefault variable makes the code hard to understand
and also can lead to the SOLVABLE_ISDEFAULT flag being set on
the environment if the last groupid used the default flag.
So just add a new "groupid_isdefault" flag. We also no longer downgrade
a "Requires" to a "Recommends" if the default attribute is set
in a grouplist entry.
Respect the "default" attribute in environment optionlist in comps xml
Add the default groups in optionlist with the "recommends" dependency
instead of "suggests" to differentiate between them.
The "reqtype" attribute of the "parsedata" structure cannot be used
directly, because it wouldn't be clear whether to use the "suggests" or
"requires" dependency for non-default groups, since the information was
already processed in the parent element and is not available at the time
of processing the individual "groupid" elements other than through the
"reqtype" attribute. Therefore, the "isdefault" attribute is used for
this.
Gong Zhile [Wed, 22 Oct 2025 10:14:30 +0000 (18:14 +0800)]
Fix qsort_r preprocessor for musl and FreeBSD
The original mentioned qsort_r signature difference now only exists in DragonFly
BSD & MacOS. However, the preprocessor also broke the compliation on musl+linux
and FreeBSD, leading the compilation error on buildroot.
solver_addbestrules: recalculate pointer to current rule after adding a new rule
The code adds new rules in a FOR_RULELITERALS loop. The iterator needs
to access elements of the rule it iterates over, so if we add a new
rule in the loop body, we have to make sure that the pointer to the
current rule stays valid.
repo_autopattern: support creation of obsoletes for product packages
This adds support for provides of the type "product-obsoletes(name)".
We translate this to "Obsoletes: product:<name>" in the generated
product pseudo package.
We need this because people used "Obsoletes: product:name" in the
"release" package, but this is no longer allowed in newer rpm versions.
Besides, the obsoletes is kind of wrong in the "release" package
anyway, it belongs in the generated "product:" package.
Implement color filtering when adding update targets
The old code created update jobs spanning multiple architectures
even if "implicitobsoleteusescolors" was set.
Also add color filtering in replaces_installed_package, where it
seems to be also missing