ScalarArrayOpExpr used for either NOT IN or <>/= ALL, when the array
contains a NULL constant, will never evaluate to true. Here we add an
explicit short-circuit in scalararraysel() to account for this and return
0.0 rows when we see that a NULL exists. When the array is a constant,
we can very quickly see if there are any NULL values and return early
before going to much effort in scalararraysel(). For non-const arrays,
we short-circuit after finding the first NULL and forego selectivity
estimations of any remaining elements.
In the future, it might be better to do something for this case in
constant folding. We would need to be careful to only do this for
strict operators on expressions located in places that don't care about
distinguishing false from NULL returns. i.e. EXPRKIND_QUAL expressions.
Doing that requires a bit more thought and effort, so here we just fix
some needlessly slow selectivity estimations for ScalarArrayOpExpr
containing many array elements and at least one NULL.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Reviewed-by: David Geier <geidav.pg@gmail.com>
Reviewed-by: Zsolt Parragi <zsolt.parragi@percona.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/
eaa2598c-5356-4e1e-9ec3-
5fd6eb1cd704@tantorlabs.com
if (arrayisnull) /* qual can't succeed if null array */
return (Selectivity) 0.0;
arrayval = DatumGetArrayTypeP(arraydatum);
+
+ /*
+ * When the array contains a NULL constant, same as var_eq_const, we
+ * assume the operator is strict and nothing will match, thus return
+ * 0.0.
+ */
+ if (!useOr && array_contains_nulls(arrayval))
+ return (Selectivity) 0.0;
+
get_typlenbyvalalign(ARR_ELEMTYPE(arrayval),
&elmlen, &elmbyval, &elmalign);
deconstruct_array(arrayval,
List *args;
Selectivity s2;
+ /*
+ * When the array contains a NULL constant, same as var_eq_const,
+ * we assume the operator is strict and nothing will match, thus
+ * return 0.0.
+ */
+ if (!useOr && IsA(elem, Const) && ((Const *) elem)->constisnull)
+ return (Selectivity) 0.0;
+
/*
* Theoretically, if elem isn't of nominal_element_type we should
* insert a RelabelType, but it seems unlikely that any operator
Function Scan on generate_series g (cost=N..N rows=1000 width=N)
(1 row)
+--
+-- Test ScalarArrayOpExpr row estimates for <> ALL for arrays with NULLs. We
+-- expect the planner to estimate 1 row will match in both of the following
+-- tests.
+--
+-- Try a const array containing a NULL
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 99, NULL]);$$,
+false, true, false, true);
+ explain_mask_costs
+---------------------------------------------------------
+ Seq Scan on tenk1 (cost=N..N rows=1 width=N)
+ Filter: (unique1 <> ALL ('{1,2,99,NULL}'::integer[]))
+(2 rows)
+
+-- Try a non-const array containing a NULL
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 98, (SELECT 99), NULL]);$$,
+false, true, false, true);
+ explain_mask_costs
+-------------------------------------------------------------------------------------
+ Seq Scan on tenk1 (cost=N..N rows=1 width=N)
+ Filter: (unique1 <> ALL (ARRAY[1, 2, 98, (InitPlan expr_1).col1, NULL::integer]))
+ InitPlan expr_1
+ -> Result (cost=N..N rows=1 width=N)
+(4 rows)
+
DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);
SELECT * FROM generate_series(25.0, 2.0, 0.0) g(s);$$,
false, true, false, true);
+--
+-- Test ScalarArrayOpExpr row estimates for <> ALL for arrays with NULLs. We
+-- expect the planner to estimate 1 row will match in both of the following
+-- tests.
+--
+
+-- Try a const array containing a NULL
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 99, NULL]);$$,
+false, true, false, true);
+
+-- Try a non-const array containing a NULL
+SELECT explain_mask_costs($$
+SELECT * FROM tenk1 WHERE unique1 <> ALL (ARRAY[1, 2, 98, (SELECT 99), NULL]);$$,
+false, true, false, true);
DROP FUNCTION explain_mask_costs(text, bool, bool, bool, bool);