Sometimes you need to copy parts of a vector into another vector or adding data from one vector to an existing vector. This can be done in a few ways in C++.

This note will try to compare different ways to copy data from one vector to another in C++.

Naive method using vector::push_back

Naive method to loop source vector and copy each element to target vector.

std::vector<int> source = {1,1,1,1,1};
std::vector<int> target;
for (auto& a : source) {
    target.push_back(a);
}

Using std::copy

Another way is using std:copy() like this:

std::vector<int> source = {1,1,1,1,1};
std::vector<int> target;
std::copy(std::begin(source), std::end(source), std::back_inserter(target));

The std::back_inserter will create an iterator that makes it possible to add elements to the back of target vector.

Using range insert

Use the vector insert method like this:

std::vector<int> target;
std::vector<int> source = {1,1,1,1,1};
target.insert(std::end(target), std::begin(source), std::end(source));

Comparing execution times

Compare the methods by calling the methods multiple times in a loop:

std::vector<int> v = {123,123,123,123,123,123,123,123};
for (int i = 0; i < 10000000; ++i)
{
#ifdef VECTOR_COPY_PRE
        std::vector<int> t(v.size());
        std::copy(std::begin(v), std::end(v), std::begin(t));
#elif VECTOR_COPY
        std::vector<int> t;
        std::copy(std::begin(v), std::end(v), std::back_inserter(t));
#elif VECTOR_PUSH
        std::vector<int> t;
        for (auto& a : v) {
            t.push_back(a);
        }
#else
        std::vector<int> t;
        t.insert(std::end(t), std::begin(v), std::end(v));
#endif
}

Compile with:

$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY -o vector_copy
$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY_PRE -o vector_copy_pre
$ g++ --std=c++11 vector_p.cpp -DVECTOR_PUSH -o vector_copy_push
$ g++ --std=c++11 vector_p.cpp -o vector_insert

Measure execution time:

$ time ./vector_push

real    0m15,993s
user    0m15,822s
sys     0m0,007s

$ time ./vector_copy

real    0m16,371s
user    0m16,370s
sys     0m0,001s
$ time ./vector_insert

real    0m3,862s
user    0m3,812s
sys     0m0,007s

There is a remarkable difference in execution time. The range insert is way faster.

The std::copy cannot do anything about the size of the target vector, elements has to be added one at a time to the end of target vector, eventually causing a reallocation when growing beyond its capacity. The same holds for push_back method.

With the range insert the target vector can be allocated before copying data to it making this more efficient. Since the vector also is continous in memory the insert can work on the entire memory chunk and not on each individual object.

Improving std::copy performance

What’s the result if target vector is pre-allocated before doing std::copy(). Since the vector is already allocated there is no need for std::back_inserter and copying is being done to the beginning of the vector.

std::vector<int> source = {1,1,1,1,1};
std::vector<int> target(source.size());
std::copy(std::begin(source), std::end(source), std::begin(target));
$ time ./vector_copy_pre

real	0m2,007s
user	0m1,992s
sys     0m0,004s

Now it’s a lot faster and even faster than range insert.

What is the reason for this improvment? How can this be faster than range inserts?

Profiling the execution shows that there are some extra copy and move operations prepended with uninitialized for the insert case which is skipped in the std::copy case.

Profiling range insert:

  526.256 us   13.683 us           1  main
  403.389 us   20.154 us          10  std::vector::insert
  353.452 us    5.080 us          10  std::vector::_M_insert_dispatch
  347.215 us   47.567 us          10  std::vector::_M_range_insert
  144.146 us   12.661 us          20  std::__uninitialized_move_if_noexcept_a
  106.393 us    5.591 us          20  std::__uninitialized_copy_a
  100.802 us    5.850 us          20  std::uninitialized_copy
   94.952 us    5.467 us          20  std::__uninitialized_copy::__uninit_copy
   89.485 us   12.385 us          20  std::copy
   88.946 us    1.107 us           1  _GLOBAL__sub_I_main
   87.839 us    5.849 us           1  __static_initialization_and_destruction_0
   81.990 us   81.990 us           1  std::ios_base::Init::Init
   51.846 us    2.922 us          10  std::__uninitialized_copy_a
   48.924 us    2.886 us          10  std::uninitialized_copy
   47.826 us    7.441 us          11  std::vector::~vector
   46.962 us   15.653 us          10  std::vector::_M_check_len
   ...

Profiling std::copy to pre-allocated vector:

  256.866 us   19.317 us           1  main
  101.321 us    0.769 us           1  _GLOBAL__sub_I_main
  100.552 us    5.880 us           1  __static_initialization_and_destruction_0
   94.672 us   94.672 us           1  std::ios_base::Init::Init
   66.468 us    5.023 us          10  std::vector::vector
   55.917 us   11.418 us          11  std::vector::~vector
   54.190 us    7.921 us          10  std::copy
   43.687 us   11.120 us          10  std::__copy_move_a2
   35.401 us    5.504 us          10  std::_Vector_base::_Vector_base
   34.061 us    5.515 us          11  std::_Vector_base::~_Vector_base
   27.044 us    5.474 us          10  std::vector::_M_default_initialize
   21.032 us    3.483 us          11  std::_Vector_base::_M_allocate
   20.254 us    3.061 us          10  std::__uninitialized_default_n_a
   19.986 us    3.071 us          10  std::_Vector_base::_M_create_storage
   19.639 us    3.375 us          11  std::_Vector_base::_M_deallocate
   18.550 us   14.647 us          30  std::__niter_base
   17.549 us    3.466 us          11  std::allocator_traits::allocate
   17.299 us    1.161 us           1  std::vector::vector
   ...

Enabling optimizer

Compiling with optimization enabled

$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY -O3 -o vector_copy_o3
$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY_PRE -O3 -o vector_copy_pre_o3
$ g++ --std=c++11 vector_p.cpp -DVECTOR_PUSH -O3 -o vector_copy_push_o3
$ g++ --std=c++11 vector_p.cpp -O3 -o vector_insert_o3

Same result with O3 enabled

$ time ./vector_copy_push_o3 

real    0m1,009s
user    0m1,008s
sys     0m0,001s
$ time ./vector_copy_o3 

real    0m1,014s
user    0m1,013s
sys     0m0,000s
$ time ./vector_insert_o3

real    0m0,153s
user    0m0,151s
sys     0m0,001s
$ time ./vector_copy_pre_o3

real    0m0,155s
user    0m0,155s
sys     0m0,000s

Optimization is really working well here. There is basically no difference between range insert and copy to pre allocated vector, it might be that the copying is optimized away completely since this dummy program does not do anything with the target vector. Still a huge gap to more naive methods.

Conclusion

If the size of the target vector is known in advance and is empty to start with it is an advantage to allocate the target vector and use std::copy to copy data to it. If appending data to an exsting vector the range insert method to the end of target vector is preferable. Optimization seem to even out the differences between those methods.