Index-based parallel-for performs parallel iterations over a range <tt>[first, last)</tt> with the given @c step size. These indices must be @em integral type.
You need to include the header file, `%taskflow/cuda/algorithm/for_each.hpp`, for creating a single-threaded task. @section SingleTaskCUDASingleTask Run a Task with a ...