Here is how the hinting works in the libstdc++ implementation of unordered containers, and the rationale behind this behavior.
In the following text, the phrase equivalent to refer
to the result of the invocation of the equal predicate imposed on the
container by its key_equal
object, which defaults to (basically)
“==”.
Unordered containers can be seen as a std::vector
of
std::forward_list
. The std::vector
represents
the buckets and each std::forward_list
is the list of nodes
belonging to the same bucket. When inserting an element in such a data
structure we first need to compute the element hash code to find the
bucket to insert the element to, the second step depends on the uniqueness
of elements in the container.
In the case of std::unordered_set
and
std::unordered_map
you need to look through all bucket's
elements for an equivalent one. If there is none the insertion can be
achieved, otherwise the insertion fails. As we always need to loop though
all bucket's elements, the hint doesn't tell us if the element is already
present, and we don't have any constraint on where the new element is to
be inserted, the hint won't be of any help and will then be ignored.
In the case of std::unordered_multiset
and std::unordered_multimap
equivalent elements must be
linked together so that the equal_range(const key_type&)
can return the range of iterators pointing to all equivalent elements.
This is where hinting can be used to point to another equivalent element
already part of the container and so skip all non equivalent elements of
the bucket. So to be useful the hint shall point to an element equivalent
to the one being inserted. The new element will be then inserted right
after the hint. Note that because of an implementation detail inserting
after a node can require updating the bucket of the following node. To
check if the next bucket is to be modified we need to compute the
following node's hash code. So if you want your hint to be really efficient
it should be followed by another equivalent element, the implementation
will detect this equivalence and won't compute next element hash code.
It is highly advised to start using unordered containers hints only if you have a benchmark that will demonstrate the benefit of it. If you don't then do not use hints, it might do more harm than good.
The unordered containers in libstdc++ may cache the hash code for each element alongside the element itself. In some cases not recalculating the hash code every time it's needed can improve performance, but the additional memory overhead can also reduce performance, so whether an unordered associative container caches the hash code or not depends on a number of factors. The caching policy for GCC 4.8 is described below.
The C++ standard requires that erase
and swap
operations must not throw exceptions. Those operations might need an
element's hash code, but cannot use the hash function if it could
throw.
This means the hash codes will be cached unless the hash function
has a non-throwing exception specification such as noexcept
or throw()
.
Secondly, libstdc++ also needs the hash code in the implementation of
local_iterator
and const_local_iterator
in
order to know when the iterator has reached the end of the bucket.
This means that the local iterator types will embed a copy of the hash
function when possible.
Because the local iterator types must be DefaultConstructible and
CopyAssignable, if the hash function type does not model those concepts
then it cannot be embedded and so the hash code must be cached.
Note that a hash function might not be safe to use when
default-constructed (e.g if it a function pointer) so a hash
function that is contained in a local iterator won't be used until
the iterator is valid, so the hash function has been copied from a
correctly-initialized object.
If the hash function is non-throwing, DefaultConstructible and CopyAssignable then libstdc++ doesn't need to cache the hash code for correctness, but might still do so for performance if computing a hash code is an expensive operation, as it may be for arbitrarily long strings. As an extension libstdc++ provides a trait type to describe whether a hash function is fast. By default hash functions are assumed to be fast unless the trait is specialized for the hash function and the trait's value is false, in which case the hash code will always be cached. The trait can be specialized for user-defined hash functions like so:
#include <unordered_set> struct hasher { std::size_t operator()(int val) const noexcept { // Some very slow computation of a hash code from an int ! ... } } namespace std { template<> struct __is_fast_hash<hasher> : std::false_type { }; }