28 Algorithms library [algorithms]

28.4 Parallel algorithms [algorithms.parallel]

This section describes components that C++ programs may use to perform operations on containers and other sequences in parallel.

28.4.1 Terms and definitions [algorithms.parallel.defns]

A parallel algorithm is a function template listed in this International Standard with a template parameter named ExecutionPolicy.
Parallel algorithms access objects indirectly accessible via their arguments by invoking the following functions:
  • All operations of the categories of the iterators that the algorithm is instantiated with.
  • Operations on those sequence elements that are required by its specification.
  • User-provided function objects to be applied during the execution of the algorithm, if required by the specification.
  • Operations on those function objects required by the specification.
    [Note
    : end note
    ]
These functions are herein called element access functions.
[Example
:
The sort function may invoke the following element access functions:
  • Operations of the random-access iterator of the actual template argument (as per [random.access.iterators]), as implied by the name of the template parameter RandomAccessIterator.
  • The swap function on the elements of the sequence (as per the preconditions specified in [sort]).
  • The user-provided Compare function object.
end example
]

28.4.2 Requirements on user-provided function objects [algorithms.parallel.user]

Unless otherwise specified, function objects passed into parallel algorithms as objects of type Predicate, BinaryPredicate, Compare, UnaryOperation, BinaryOperation, BinaryOperation1, BinaryOperation2, and the operators used by the analogous overloads to these parallel algorithms that could be formed by the invocation with the specified default predicate or operation (where applicable) shall not directly or indirectly modify objects via their arguments, nor shall they rely on the identity of the provided objects.

28.4.3 Effect of execution policies on algorithm execution [algorithms.parallel.exec]

Parallel algorithms have template parameters named ExecutionPolicy ([execpol]) which describe the manner in which the execution of these algorithms may be parallelized and the manner in which they apply the element access functions.
Unless otherwise stated, implementations may make arbitrary copies of elements (with type T) from sequences where is_­trivially_­copy_­constructible_­v<T> and is_­trivially_­destructible_­v<T> are true.
[Note
:
This implies that user-supplied function objects should not rely on object identity of arguments for such input sequences.
Users for whom the object identity of the arguments to these function objects is important should consider using a wrapping iterator that returns a non-copied implementation object such as reference_­wrapper<T> ([refwrap]) or some equivalent solution.
end note
]
The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution​::​sequenced_­policy all occur in the calling thread of execution.
[Note
:
The invocations are not interleaved; see [intro.execution].
end note
]
The invocations of element access functions in parallel algorithms invoked with an execution policy object of type execution​::​parallel_­policy are permitted to execute in either the invoking thread of execution or in a thread of execution implicitly created by the library to support parallel algorithm execution.
If the threads of execution created by thread ([thread.thread.class]) provide concurrent forward progress guarantees ([intro.progress]), then a thread of execution implicitly created by the library will provide parallel forward progress guarantees; otherwise, the provided forward progress guarantee is implementation-defined.
Any such invocations executing in the same thread of execution are indeterminately sequenced with respect to each other.
[Note
:
It is the caller's responsibility to ensure that the invocation does not introduce data races or deadlocks.
end note
]
[Example
:
int a[] = {0,1};
std::vector<int> v;
std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int i) {
  v.push_back(i*2+1); // incorrect: data race
});
The program above has a data race because of the unsynchronized access to the container v.
end example
]
[Example
:
std::atomic<int> x{0};
int a[] = {1,2};
std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
  x.fetch_add(1, std::memory_order_relaxed);
  // spin wait for another iteration to change the value of x
  while (x.load(std::memory_order_relaxed) == 1) { } // incorrect: assumes execution order
});
The above example depends on the order of execution of the iterations, and will not terminate if both iterations are executed sequentially on the same thread of execution.
end example
]
[Example
:
int x = 0;
std::mutex m;
int a[] = {1,2};
std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
  std::lock_guard<mutex> guard(m);
  ++x;
});
The above example synchronizes access to object x ensuring that it is incremented correctly.
end example
]
The invocations of element access functions in parallel algorithms invoked with an execution policy of type execution​::​parallel_­unsequenced_­policy are permitted to execute in an unordered fashion in unspecified threads of execution, and unsequenced with respect to one another within each thread of execution.
These threads of execution are either the invoking thread of execution or threads of execution implicitly created by the library; the latter will provide weakly parallel forward progress guarantees.
[Note
:
This means that multiple function object invocations may be interleaved on a single thread of execution, which overrides the usual guarantee from [intro.execution] that function executions do not interleave with one another.
end note
]
Since execution​::​parallel_­unsequenced_­policy allows the execution of element access functions to be interleaved on a single thread of execution, blocking synchronization, including the use of mutexes, risks deadlock.
Thus, the synchronization with execution​::​parallel_­unsequenced_­policy is restricted as follows: A standard library function is vectorization-unsafe if it is specified to synchronize with another function invocation, or another function invocation is specified to synchronize with it, and if it is not a memory allocation or deallocation function.
Vectorization-unsafe standard library functions may not be invoked by user code called from execution​::​parallel_­unsequenced_­policy algorithms.
[Note
:
Implementations must ensure that internal synchronization inside standard library functions does not prevent forward progress when those functions are executed by threads of execution with weakly parallel forward progress guarantees.
end note
]
[Example
:
int x = 0;
std::mutex m;
int a[] = {1,2};
std::for_each(std::execution::par_unseq, std::begin(a), std::end(a), [&](int) {
  std::lock_guard<mutex> guard(m); // incorrect: lock_­guard constructor calls m.lock()
  ++x;
});
The above program may result in two consecutive calls to m.lock() on the same thread of execution (which may deadlock), because the applications of the function object are not guaranteed to run on different threads of execution.
end example
]
[Note
:
The semantics of the execution​::​parallel_­policy or the execution​::​parallel_­unsequenced_­policy invocation allow the implementation to fall back to sequential execution if the system cannot parallelize an algorithm invocation due to lack of resources.
end note
]
If an invocation of a parallel algorithm uses threads of execution implicitly created by the library, then the invoking thread of execution will either
  • temporarily block with forward progress guarantee delegation ([intro.progress]) on the completion of these library-managed threads of execution, or
  • eventually execute an element access function;
the thread of execution will continue to do so until the algorithm is finished.
[Note
:
In blocking with forward progress guarantee delegation in this context, a thread of execution created by the library is considered to have finished execution as soon as it has finished the execution of the particular element access function that the invoking thread of execution logically depends on.
end note
]
The semantics of parallel algorithms invoked with an execution policy object of implementation-defined type are implementation-defined.

28.4.4 Parallel algorithm exceptions [algorithms.parallel.exceptions]

During the execution of a parallel algorithm, if temporary memory resources are required for parallelization and none are available, the algorithm throws a bad_­alloc exception.
During the execution of a parallel algorithm, if the invocation of an element access function exits via an uncaught exception, the behavior is determined by the ExecutionPolicy.

28.4.5 ExecutionPolicy algorithm overloads [algorithms.parallel.overloads]

Parallel algorithms are algorithm overloads.
Each parallel algorithm overload has an additional template type parameter named ExecutionPolicy, which is the first template parameter.
Additionally, each parallel algorithm overload has an additional function parameter of type ExecutionPolicy&&, which is the first function parameter.
[Note
:
Not all algorithms have parallel algorithm overloads.
end note
]
Unless otherwise specified, the semantics of ExecutionPolicy algorithm overloads are identical to their overloads without.
Unless otherwise specified, the complexity requirements of ExecutionPolicy algorithm overloads are relaxed from the complexity requirements of the overloads without as follows: when the guarantee says “at most expr” or “exactly expr” and does not specify the number of assignments or swaps, and expr is not already expressed with notation, the complexity of the algorithm shall be .
Parallel algorithms shall not participate in overload resolution unless is_­execution_­policy_­v<decay_­t<ExecutionPolicy>> is true.