What is the Fastest Method to Iterate Over a String?

Few days ago I decided to check what is really the fastest method to iterate over strings in C++. As a string class I chose string class from STL as it is very popular and provides a couple of ways to iterate it. So how can one iterate over an std::string?

  1. By using indexes. E.g. str[i] and running i from zero to the length of the string.
  2. By using the at method. string::at(size_t pos) provides similar interface to indexes with the exceptions that it checks whether the given position is past the end of the string and then throws an exception. One may see it as the safe version of the regular index.
  3. Treating the string as a sequence of characters and and iterate over it using iterators.
  4. Using string::c_str() to get a pointer to a regular C string representation of the string stored in the std::string and treating it as array, e.g. using indexes to go over it.
  5. The last way to iterate over the string is to get a pointer to a C string representation using string::c_str() and advancing the pointer itself to iterate over the string.

The third method is the native method of iterating over objects in STL, and like the last two it can’t be used if the iteration changes the string itself (e.g. inserting or deleting characters). The first and second method are similar to the fourth (treating the pointer to the C string as an array), except that they aren’t so problematic as the latter when changing the string. The second method is the safest as it’s the only one that does range checks and throws exception if trying to access positions which are outside the string.

To benchmark and find out which method is the fastest method to iterate over a string I’ve created a huge string of random characters ranging from ‘a’ to ‘z’ and five executables, each one implementing one of the above iteration methods to do a simple task
(count the number of occurrences of each letter). The string is fifty million characters long which, as the longer the string the less important the overhead becomes.

The executables for the benchmark of every version were compiled with the default setting of g++ (without optimization as the compiler might change the iteration methods when optimizing). The benchmark executables where timed by using the time command and redirecting the executables output to /dev/null. The tests were run both on 64bit Gentoo (with 1 GB RAM) and on 32bit Kubuntu (with 512 MB RAM), to make sure the overall results (which method it better not the runtime itself) isn’t system depended.

Now to the result itself. After running the benchmark couple of times I came up with the following conclusions: Don’t use iterators for string iteration. Iterators came last on every test and usually times up to three times more than the slowest method besides it. Iterators may be the native STL way to iterate over STL containers but it provides very slow way to so. While iterators can be called STL pointers as the work and behave much the same way pointers do, they didn’t preform even close to pointers. Iterating using pointers came up as the fastest way to iterate over a string with a small margin over string::at() and indexes (both over usual C strings representation and std::string). When inspecting the rest of the methods one may find that string::at() and the indexes came up with very close timings, but on most tests the indexes over std::string where faster. It is obvious why they should preform better than string::at(), as they don’t do range checks as the latter does. For some reason I got that iterating using indexes over the std::string directly is faster than iterating over the the C string representation (by very minor margin) and that the C string is timing roughly the same as the string::at().

To conclude the benchmark pointers are the fastest way to iterate by a small margin, but using the std::string indexes and string::at is preferable in my opinion as the performance difference isn’t that big and the latter methods provide safer way (that can handle string manipulation that may cause the string data to be copied to another place in the memory) than the pointers. The indexes over C string representation suffer the same disadvantages as the pointers but don’t operate as fast. Stay away from iterators at all costs! Iterators suffer similar disadvantages as pointers do (regarding string manipulation) and don’t give anything in return except for horrible run time.

My timings

Here is the output of one of the runs of the benchmark. The results where typical to almost all of the test. To try the benchmark on your computer, see instructions bellow.

guy@Guy_Computer ~/temp/stringbenchmark $ ./benchmark
iteration using indexes

real    0m0.690s
user    0m0.644s
sys     0m0.048s
iteration using indexes over C string

real    0m0.752s
user    0m0.720s
sys     0m0.028s
iteration using string::at()

real    0m0.762s
user    0m0.708s
sys     0m0.036s
iteration using pointers over C string

real    0m0.642s
user    0m0.604s
sys     0m0.036s
iteration using iterators

real    0m2.323s
user    0m2.272s
sys     0m0.052s

Running the Tests on Your System

If you want you can run these test on your system to see the exact results. To do so download the tar archive and go to the directory you downloaded it into. Open the command line on this directory and execute:
tar -zxvf stringbenchmark.tar.gz
cd stringbenchmark
make
./benchmark

You will see the test results printed on your screen. While exact timings might differ from run to run, the overall trends should be clear.

6 thoughts on “What is the Fastest Method to Iterate Over a String?”

  1. The results are right, without optimization.
    However if I turn on optimizations, with -O2 I will get the following result:

    jancsi@dew:~/$ ./benchmark
    iteration using indexes
    real 0m0.325s
    user 0m0.280s
    sys 0m0.040s

    iteration using indexes over C string
    real 0m0.327s
    user 0m0.260s
    sys 0m0.060s

    iteration using string::at()
    real 0m0.425s
    user 0m0.348s
    sys 0m0.068s

    iteration using pointers over C string
    real 0m0.703s
    user 0m0.612s
    sys 0m0.072s

    iteration using iterators
    real 0m0.355s
    user 0m0.304s
    sys 0m0.048s

    J.

  2. One method that was left out is using the .data member of std::string. This accesses the underlying data where the c_str() does not and cannot (std::string must be able to have embedded ”’s).

    That method ought to be the fastest (hopefully tied with iterators):

    char * cur = s.data();
    char * end = cur + s.size();
    for( ; cur!=end; ++cur )
    {
    f(*cur);
    }

  3. Your iterator code is flawed, move the .begin() and .end() code outside the for loop, otherwise the functions (which are not inline) will be run EVERY loop iteration!

    Also you need to change the iter ‘tmp’ from tmp++ to ++tmp. The former is postfix, meaning the compiler will create a temporary object to store the contents of ‘tmp’, then increment it. Using prefix notation will elliminate the temporary.

    Heres a better example:
    std::string::iterator tmp(hugestring.begin());
    std::string::iterator const tmp_end(hugestring.end());
    for(; tmp != tmp_end; ++tmp)
    {
    //->Code Here
    }

    Note how I use the constructor call for storing the begin() and end() values. If not done this way, the compiler will create a temporary for both those lines!

  4. ^ ^
    Like RichardK said, not exactly apples to apples. Also, how often do you know the size of your string at compile time?

Leave a Reply

Your email address will not be published.

 

This site uses Akismet to reduce spam. Learn how your comment data is processed.