The compiler understandably doesn't know that _M_node only ever has a
single call site, _M_dfs, (and is not directly called from other library
headers or user code) and so decides not to inline it. So use the
always_inline attribute to force the inlining. This seems sufficient to
make all _M_dfs subroutines get inlined away, and speeds up the executor
by 30% on some microbenchmarks.
libstdc++-v3/ChangeLog:
* include/bits/regex_executor.tcc (__detail::_Executor::_M_node)
[__OPTIMIZE__]: Add [[gnu::always_inline]] attribute. Declare
inline.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
template<typename _BiIter, typename _Alloc, typename _TraitsT,
bool __dfs_mode>
- void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
+#ifdef __OPTIMIZE__
+ [[__gnu__::__always_inline__]]
+#endif
+ inline void _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
_M_node(_Match_mode __match_mode, _StateIdT __i)
{
if (_M_states._M_visited(__i))