22.09.07
Profiling Code Using clock_gettime
After raising the issue of the low resolution problem of the timer provided by clock() in Resolution Problems in clock(), I’ve ended the post by mentioning to two more functions that should provide high-resolution timers suitable for profiling code. In this post I will discuss one of them, clock_gettime().
The clock_gettime() provides access to several useful timers with the resolution of nanoseconds. First, the prototype for the function is as follows:
int clock_gettime(clockid_t clk_id, struct timespect *tp);
The clk_id allows us to select a specific clock from the several
offered by the system, which includes:
-
CLOCK_REALTIME, a system-wide realtime clock. -
CLOCK_PROCESS_CPUTIME_ID, high-resolution timer provided by the CPU for each process. -
CLOCK_THREAD_CPUTIME_ID, high-resolution timer provided by the CPU for each of the threads.
Usually, there are more clocks provided, but I find these three the most useful as they allow to get the execution time spent in the system level, process level and the thread level.
The current clock time, for the chosen clock is stored int the struct provided by the *tp pointer. The timespec struct is defined as follows:
struct timespec { time_t tv_sec; /* seconds */ long tv_nsec; /* nanoseconds */ };
To time the processing time some function took, one should call clock_gettime() twice. Once before the function call and once right after it and subtract the returned timings to get the actual runtime.
Getting the difference between two timespec structs isn’t very complicated and can be acheived using the function diff() defined bellow:
timespec diff(timespec start, timespec end) { timespec temp; if ((end.tv_nsec-start.tv_nsec)<0) { temp.tv_sec = end.tv_sec-start.tv_sec-1; temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec; } else { temp.tv_sec = end.tv_sec-start.tv_sec; temp.tv_nsec = end.tv_nsec-start.tv_nsec; } return temp; }
Now let’s move to some real example:
#include <iostream> #include <time.h> using namespace std; timespec diff(timespec start, timespec end); int main() { timespec time1, time2; int temp; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1); for (int i = 0; i< 242000000; i++) temp+=temp; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2); cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl; return 0; } timespec diff(timespec start, timespec end) { timespec temp; if ((end.tv_nsec-start.tv_nsec)<0) { temp.tv_sec = end.tv_sec-start.tv_sec-1; temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec; } else { temp.tv_sec = end.tv_sec-start.tv_sec; temp.tv_nsec = end.tv_nsec-start.tv_nsec; } return temp; }
To use clock_gettime you need to include time.h and to link to librt.a. If you use gcc just make sure you add -lrt to your list of arguments.
Play a bit with the length of the for loop. As you can see clock_gettime provides much more accurate results and can register very short processing time too. Just remember that as the case with any profiling functions, this function adds a little overhead to your program, so make sure you disable the profiling code, using preprocessor commands for example, in the production release.
26/9/2007 - Update
You may want to take a look at The Revised String Iteration Benchmark post for another, larger, example of using clock_gettime to time performance of code.







Isa said,
June 25, 2008 at 6:55 pm
Guy,
Are you sure that the following diff routine is correct?
timespec diff(timespec start, timespec end)
{
timespec temp;
if ((end.tv_nsec-start.tv_nsec)<0) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec-start.tv_sec;
temp.tv_nsec = end.tv_nsec-start.tv_nsec;
}
return temp;
}
What is your ratoionale for adding 1000000000, According to me what you need to take into accunt LONG_INT_MAX rather than 1000000000. Thereby
if ((end.tv_nsec-start.tv_nsec)<0) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
}
Needs to be replaced by
if ((end.tv_nsec < start.tv_nsec) ) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = LONG_INT_MAX+ (end.tv_nsec-start.tv_nsec);
}
Cheers!
Isa
Isa said,
June 25, 2008 at 7:06 pm
Whoops. I meant LONG_MAX instead of LONG_INT_MAX.
Cheers!
Isa
Guy said,
June 25, 2008 at 9:00 pm
Hi Isa,
To understand why the code works we first need to understand how the timespec struct is structured and what the code tries to achieve. The timespec stores the time since certain point using two fields, tv_sec stores the seconds and tv_nsec stores the nanoseconds the represent the remaining part of the second. The if condition handles the case when the end time nanosecond count is smaller then the start-time’s. What we want to do is to subtract the timespecs field by field, however doing so in this case leads to negative value in the nanoseconds field, which will result in an overflow and undesired behavior.
The solution for this is the same method used when doing long subtraction in elementary school, borrow one from the next more important field. In this case I borrow a second from the tv_sec counter, so this is 1,000,000,000 nanoseconds (and hence the constant), now after borrowing I no longer have negative value in the nanoseconds counter.
As this is the same principle as in regular long subtraction, using LONG_MAX, is wrong as it doesn’t represent 1 second in nanoseconds.
You intention would have worked if we only had one field, that counted time as nanoseconds, and utilized the whole range of values in long data-type, and in this case the “if” check for overflow. But in this way you can’t distinguish between one overflow and more, it is still flawed. Luckily for us timespec keeps two fields, seconds, and nanoseconds, which allows us to overcome this pitfall and implement the timespec diff as I did.
Regards,
Guy
Isa said,
June 27, 2008 at 8:47 am
Guy,
Thanks for the explanation.