C++ vector insert performance
Sometimes you need to copy parts of a vector into another vector or adding data from one vector to an existing vector. This can be done in a few ways in C++.
This note will try to compare different ways to copy data from one vector to another in C++.
Naive method using vector::push_back
Naive method to loop source vector and copy each element to target vector.
std::vector<int> source = {1,1,1,1,1};
std::vector<int> target;
for (auto& a : source) {
target.push_back(a);
}
Using std::copy
Another way is using std:copy() like this:
std::vector<int> source = {1,1,1,1,1};
std::vector<int> target;
std::copy(std::begin(source), std::end(source), std::back_inserter(target));
The std::back_inserter will create an iterator that makes it possible to add elements to the back of target vector.
Using range insert
Use the vector insert method like this:
std::vector<int> target;
std::vector<int> source = {1,1,1,1,1};
target.insert(std::end(target), std::begin(source), std::end(source));
Comparing execution times
Compare the methods by calling the methods multiple times in a loop:
std::vector<int> v = {123,123,123,123,123,123,123,123};
for (int i = 0; i < 10000000; ++i)
{
#ifdef VECTOR_COPY_PRE
std::vector<int> t(v.size());
std::copy(std::begin(v), std::end(v), std::begin(t));
#elif VECTOR_COPY
std::vector<int> t;
std::copy(std::begin(v), std::end(v), std::back_inserter(t));
#elif VECTOR_PUSH
std::vector<int> t;
for (auto& a : v) {
t.push_back(a);
}
#else
std::vector<int> t;
t.insert(std::end(t), std::begin(v), std::end(v));
#endif
}
Compile with:
$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY -o vector_copy
$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY_PRE -o vector_copy_pre
$ g++ --std=c++11 vector_p.cpp -DVECTOR_PUSH -o vector_copy_push
$ g++ --std=c++11 vector_p.cpp -o vector_insert
Measure execution time:
$ time ./vector_push
real 0m15,993s
user 0m15,822s
sys 0m0,007s
$ time ./vector_copy
real 0m16,371s
user 0m16,370s
sys 0m0,001s
$ time ./vector_insert
real 0m3,862s
user 0m3,812s
sys 0m0,007s
There is a remarkable difference in execution time. The range insert is way faster.
The std::copy cannot do anything about the size of the target vector, elements has to be added one at a time to the end of target vector, eventually causing a reallocation when growing beyond its capacity. The same holds for push_back method.
With the range insert the target vector can be allocated before copying data to it making this more efficient. Since the vector also is continous in memory the insert can work on the entire memory chunk and not on each individual object.
Improving std::copy performance
What’s the result if target vector is pre-allocated before doing std::copy(). Since the vector is already allocated there is no need for std::back_inserter and copying is being done to the beginning of the vector.
std::vector<int> source = {1,1,1,1,1};
std::vector<int> target(source.size());
std::copy(std::begin(source), std::end(source), std::begin(target));
$ time ./vector_copy_pre
real 0m2,007s
user 0m1,992s
sys 0m0,004s
Now it’s a lot faster and even faster than range insert.
What is the reason for this improvment? How can this be faster than range inserts?
Profiling the execution shows that there are some extra copy and move operations prepended with uninitialized
for the insert case which is skipped in the std::copy case.
Profiling range insert:
526.256 us 13.683 us 1 main
403.389 us 20.154 us 10 std::vector::insert
353.452 us 5.080 us 10 std::vector::_M_insert_dispatch
347.215 us 47.567 us 10 std::vector::_M_range_insert
144.146 us 12.661 us 20 std::__uninitialized_move_if_noexcept_a
106.393 us 5.591 us 20 std::__uninitialized_copy_a
100.802 us 5.850 us 20 std::uninitialized_copy
94.952 us 5.467 us 20 std::__uninitialized_copy::__uninit_copy
89.485 us 12.385 us 20 std::copy
88.946 us 1.107 us 1 _GLOBAL__sub_I_main
87.839 us 5.849 us 1 __static_initialization_and_destruction_0
81.990 us 81.990 us 1 std::ios_base::Init::Init
51.846 us 2.922 us 10 std::__uninitialized_copy_a
48.924 us 2.886 us 10 std::uninitialized_copy
47.826 us 7.441 us 11 std::vector::~vector
46.962 us 15.653 us 10 std::vector::_M_check_len
...
Profiling std::copy to pre-allocated vector:
256.866 us 19.317 us 1 main
101.321 us 0.769 us 1 _GLOBAL__sub_I_main
100.552 us 5.880 us 1 __static_initialization_and_destruction_0
94.672 us 94.672 us 1 std::ios_base::Init::Init
66.468 us 5.023 us 10 std::vector::vector
55.917 us 11.418 us 11 std::vector::~vector
54.190 us 7.921 us 10 std::copy
43.687 us 11.120 us 10 std::__copy_move_a2
35.401 us 5.504 us 10 std::_Vector_base::_Vector_base
34.061 us 5.515 us 11 std::_Vector_base::~_Vector_base
27.044 us 5.474 us 10 std::vector::_M_default_initialize
21.032 us 3.483 us 11 std::_Vector_base::_M_allocate
20.254 us 3.061 us 10 std::__uninitialized_default_n_a
19.986 us 3.071 us 10 std::_Vector_base::_M_create_storage
19.639 us 3.375 us 11 std::_Vector_base::_M_deallocate
18.550 us 14.647 us 30 std::__niter_base
17.549 us 3.466 us 11 std::allocator_traits::allocate
17.299 us 1.161 us 1 std::vector::vector
...
Enabling optimizer
Compiling with optimization enabled
$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY -O3 -o vector_copy_o3
$ g++ --std=c++11 vector_p.cpp -DVECTOR_COPY_PRE -O3 -o vector_copy_pre_o3
$ g++ --std=c++11 vector_p.cpp -DVECTOR_PUSH -O3 -o vector_copy_push_o3
$ g++ --std=c++11 vector_p.cpp -O3 -o vector_insert_o3
Same result with O3 enabled
$ time ./vector_copy_push_o3
real 0m1,009s
user 0m1,008s
sys 0m0,001s
$ time ./vector_copy_o3
real 0m1,014s
user 0m1,013s
sys 0m0,000s
$ time ./vector_insert_o3
real 0m0,153s
user 0m0,151s
sys 0m0,001s
$ time ./vector_copy_pre_o3
real 0m0,155s
user 0m0,155s
sys 0m0,000s
Optimization is really working well here. There is basically no difference between range insert and copy to pre allocated vector, it might be that the copying is optimized away completely since this dummy program does not do anything with the target vector. Still a huge gap to more naive methods.
Conclusion
If the size of the target vector is known in advance and is empty to start with it is an advantage to allocate the target vector and use std::copy to copy data to it. If appending data to an exsting vector the range insert method to the end of target vector is preferable. Optimization seem to even out the differences between those methods.