Profiling Code Using `clock_gettime`

After raising the issue of the low-resolution problem of the timer provided by clock() in Resolution Problems in clock(), I ended the post by mentioning two more functions that should provide high-resolution timers suitable for profiling code. In this post, I will discuss one of them, clock_gettime().

The clock_gettime() function provides access to several useful timers with a resolution of nanoseconds. First, the prototype for the function is as follows:

int clock_gettime(clockid_t clk_id, struct timespec *tp);

The clk_id allows us to select a specific clock from the several
offered by the system, which include:

CLOCK_REALTIME, a system-wide realtime clock.
CLOCK_PROCESS_CPUTIME_ID, a high-resolution timer provided by the CPU for each process.
CLOCK_THREAD_CPUTIME_ID, a high-resolution timer provided by the CPU for each thread.

Usually, there are more clocks provided, but I find these three the most useful, as they allow us to get the execution time spent at the system level, process level, and thread level.

The current clock time for the chosen clock is stored in the struct provided by the *tp pointer. The timespec struct is defined as follows:

struct timespec {
    time_t tv_sec; /* seconds */
    long tv_nsec; /* nanoseconds */
};

To time how long some function took to process, one should call clock_gettime() twice: once before the function call and once right after it, and subtract the returned timings to get the actual runtime.

Getting the difference between two timespec structs isn’t very complicated and can be achieved using the function diff() defined below:

timespec diff(timespec start, timespec end)
{
    timespec temp;
    if ((end.tv_nsec-start.tv_nsec)<0) {
        temp.tv_sec = end.tv_sec-start.tv_sec-1;
        temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    } else {
        temp.tv_sec = end.tv_sec-start.tv_sec;
        temp.tv_nsec = end.tv_nsec-start.tv_nsec;
    }
    return temp;
}

Now let’s move to a real example:

#include <iostream>
#include <time.h>
using namespace std;

timespec diff(timespec start, timespec end);

int main()
{
    timespec time1, time2;
    int temp;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
    for (int i = 0; i< 242000000; i++)
        temp+=temp;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
    cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl;
    return 0;
}

timespec diff(timespec start, timespec end)
{
    timespec temp;
    if ((end.tv_nsec-start.tv_nsec)<0) {
        temp.tv_sec = end.tv_sec-start.tv_sec-1;
        temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    } else {
        temp.tv_sec = end.tv_sec-start.tv_sec;
        temp.tv_nsec = end.tv_nsec-start.tv_nsec;
    }
    return temp;
}

To use clock_gettime, you need to include time.h and link to librt.a. If you use gcc, just make sure you add -lrt to your list of arguments.

Play a bit with the length of the for loop. As you can see, clock_gettime provides much more accurate results and can register very short processing times too. Just remember that, as is the case with any profiling function, this function adds a little overhead to your program, so make sure you disable the profiling code, using preprocessor commands for example, in the production release.

26/9/2007 – Update

You may want to take a look at The Revised String Iteration Benchmark post for another, larger, example of using clock_gettime to time code performance.

60 thoughts on “Profiling Code Using `clock_gettime`”

Isa says:

June 25, 2008 at 18:55

Guy,

Are you sure that the following diff routine is correct?

timespec diff(timespec start, timespec end)
{
timespec temp;
if ((end.tv_nsec-start.tv_nsec)<0) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec-start.tv_sec;
temp.tv_nsec = end.tv_nsec-start.tv_nsec;
}
return temp;
}

What is your ratoionale for adding 1000000000, According to me what you need to take into accunt LONG_INT_MAX rather than 1000000000. Thereby

if ((end.tv_nsec-start.tv_nsec)<0) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
}

Needs to be replaced by

if ((end.tv_nsec < start.tv_nsec) ) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = LONG_INT_MAX+ (end.tv_nsec-start.tv_nsec);
}

Cheers!

Isa
Isa says:

June 25, 2008 at 19:06

Whoops. I meant LONG_MAX instead of LONG_INT_MAX.

Cheers!
Isa
Guy says:

June 25, 2008 at 21:00

Hi Isa,

To understand why the code works we first need to understand how the timespec struct is structured and what the code tries to achieve. The timespec stores the time since certain point using two fields, tv_sec stores the seconds and tv_nsec stores the nanoseconds the represent the remaining part of the second. The if condition handles the case when the end time nanosecond count is smaller then the start-time’s. What we want to do is to subtract the timespecs field by field, however doing so in this case leads to negative value in the nanoseconds field, which will result in an overflow and undesired behavior.

The solution for this is the same method used when doing long subtraction in elementary school, borrow one from the next more important field. In this case I borrow a second from the tv_sec counter, so this is 1,000,000,000 nanoseconds (and hence the constant), now after borrowing I no longer have negative value in the nanoseconds counter.

As this is the same principle as in regular long subtraction, using LONG_MAX, is wrong as it doesn’t represent 1 second in nanoseconds.

You intention would have worked if we only had one field, that counted time as nanoseconds, and utilized the whole range of values in long data-type, and in this case the “if” check for overflow. But in this way you can’t distinguish between one overflow and more, it is still flawed. Luckily for us timespec keeps two fields, seconds, and nanoseconds, which allows us to overcome this pitfall and implement the timespec diff as I did.

Regards,
Guy
Isa says:

June 27, 2008 at 08:47

Guy,

Thanks for the explanation.
Ozan says:

August 26, 2008 at 12:16

hi
thanks for that good piece of code and explanation.
i have a question which is => is that code thread safe or not? CLOCK_THREAD_CPUTIME_ID if i give this parameter is it thread safe?
Guy says:

August 26, 2008 at 14:39

Hi Ozan,

The diff function is thread safe. You should make sure the time1,time2 variable to which you save the data aren’t overwritten by other threads (possible solution is to store them in Thread Local Storage (see Boost Threads library).

Using CLOCK_THREAD_CPUTIME_ID isn’t relevant as we are only using the clock for measuring relative time (not the time passed since the clock got started).

I hope this makes the issue clearer.
Ozan says:

August 26, 2008 at 14:57

thank you Guy. I have made some tryings with sys/times.h and ctime, these didnt help me. So again i want to say thank you for this nice example. I also want to make clearer that: Although all my threads have their own time variables, sys/times.h–>times function returned 0 most of time. i checked your method with single process and will try with my thread function soon.
Beulah says:

October 23, 2008 at 01:31

Good for people to know.
Pingback: Sympatoche: პროგრამირება
Neil says:

March 27, 2009 at 21:51

Hi Guy,

great article.
Do you know a way to do the same thing on Windows?
I have to measure the CPU-Time (not only elapsed time) of several processes at µs-precision.

Regards,

Neil
Tim says:

March 28, 2009 at 00:05

This is a nice tutorial.. Can we use CLOCK_MONOTONIC? when i tried to use it in a loop with 3 seconds sleep i always get the same time..

int main()
{
struct timespec now;
while(1)
{
void clock_gettime(CLOCK_MONOTONIC, &now);

printf(“Seconds is %u\n”, now.tv_sec);
printf(“Nano Seconds is %u\n”, now.ntv_sec);

sleep(3);
}

return 0;
}
Guy says:

March 28, 2009 at 20:51

@Neil: Take a look at QueryPerformanceCounter and QueryPerformanceFrequency it should be fairly close to clock_gettime. I don’t know a way in Windows to get the actual CPU time spent on a process (or several).

@Tim: Are you sure your system implements CLOCK_MONOTONIC? According to the posix specification, support for CLOCK_MONOTONIC is optional. It looks to me that you use clock_gettime correctly, maybe try to make some real work instead of sleep to see if it changes something.
Pingback: Profiling Code Using clock_gettime « Look here first!
soji says:

July 7, 2009 at 19:03

very helpful.
Chris says:

July 9, 2009 at 03:14

Hello to all,
I am trying to profile a code with different tests but they are similar to each other.I run clock_gettime repeatly and all tests have around 0.3 sec and sometimes I get 3.3 sec which is not normal.And when I try the same tests without the loops I get the normal around 0.3 sec.I tried to put sleep in the loop but nothing happens.May someone explain this to me.

Thanks in advance.
Chris says:

July 9, 2009 at 03:40

OK I got it I have the negative one case….
lordamit says:

August 8, 2009 at 20:20

Guy 🙂 Been searching for something like this for the last 3 days.
you are a life saver 😀
And once again, thanks a lot.
(please remove the previous comment, i gave the wrong link for website)
Guy says:

August 9, 2009 at 22:13

@lordamit, I’ve deleted the previous comment, you’re welcomed.
mezzo says:

August 19, 2009 at 14:42

Hi Guy,
I found your tutorial very useful,thanks a lot,.
I have a problem, I’am implementing a timers on embedded system,first a implemented it using system Time, but this was not good since systme time can be changed a t any moment and this may fail all timer.. So I decided to use CPU time based on your tutorial with clock_gettime, but first pb ::: CLOCK_PROCESS_CPUTIME_ID undeclared , I’am using C language on FREEBSD5.3 ,have you any idea ?!
thanks a lot !
Mostofa says:

June 3, 2010 at 11:26

Hi,

How to disable the profiling code?

regards,
Mostofa
Guy says:

June 3, 2010 at 20:14

@mezzo: There are some more clocks documented in “man clock_gettime”,you may want to have a look and try them.

@Mostofa: Delete it? put in a conditional block?
Lucks says:

July 30, 2010 at 08:09

Hi Guy,
I am using times() and clock() function my linux and windows applications respectively. clock() and times() functions are not getting updated with proper values so at some times it results in wrong values. so I would like to know is there any OS call or any system call which updates the clock() and times() functions. is clock_gettime() can be used in both windows and linux applications? whether the clock_ids are part of time.h header or we need to add some other header or any library?
Guy says:

July 30, 2010 at 20:14

Hi Lucks,
1. They should be updated automatically.
2. clock_gettime() doesn’t exist on windows. See this comment for alternatives for windows.
3. The ids are part of time.h. See the example program in the post.
Bogdan says:

November 23, 2010 at 18:31

Hello Guy,

Very nice and useful code. Thank you!

I have just one remark regarding the output:
For example: if a process needs 1 second and 3.000 nsec, the output will be 1:3000, and if the process needs 1 second and 300.0000 nsec the output will be 1:300000, which is practically identical. For the total time, I am suggesting something like this:
diff(time1,time2).tv_sec + (long double)diff(time1,time2).tv_nsec/1000000000

What do think ?

Best regards,
Bogdan
Albert says:

December 15, 2010 at 17:59

Hi, it seems like CLOCK_PROCESS_CPUTIME_ID timer summarizes times spent by each thread. So that I can compute the
scalability = time(CLOCK_PROCESS_CPUTIME_ID)/
time(CLOCK_THREAD_CPUTIME_ID).
E.g.:
clock_gettime(CLOCK_PROCESS_CPUTIME_ID …);
clock_gettime(CLOCK_THREAD_CPUTIME_ID …);
#pragma omp parallel
{
// do mt-job here
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID …);
clock_gettime(CLOCK_THREAD_CPUTIME_ID …);

Is that a right conclusion?
At least on my Fedora13-64 I guess it is true.
Data says:

December 20, 2010 at 13:06

tv_sec might be a 32 bit field, is it possible that this field will overflow? (i.e. reach its largest value and reset?) Shouldn’t you take this into consideration in your “diff” function?
Guy says:

December 23, 2010 at 22:26

It is possible. However, it’s very unlikely to happen to you. The exact time it will happen depends on the clock you choose. For the CLOCK_REALTIME you should expect it to happen in 2038, and for the other two clocks, CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, it’s even less likely, as it will take 138 years.
Jomat says:

February 11, 2011 at 04:20

If you are using Linux (Eg Ubuntu) and programming in C (GCC Compiler) you need to perform the following changes:

static struct timespec*
clock_diff(struct timespec *start, struct timespec *end) {
struct timespec *diff;

if(!(diff = malloc(sizeof(struct timespec)))) {
return NULL;
} else if((end->tv_nsec – start->tv_nsec) tv_sec = end->tv_sec – start->tv_sec – 1;
diff->tv_nsec = 1000000000 + end->tv_nsec – start->tv_nsec;
} else {
diff->tv_sec = end->tv_sec – start->tv_sec;
diff->tv_nsec = end->tv_nsec – start->tv_nsec;
}
return diff;
}

// Where you want to use it
struct timespec start, stop;

// Start Counter
if(clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start) < 0) {
goto return_failed;
}

// Put here whatever you want to test here

// Where you want to stop
if(clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop) diff = clock_diff(&start, &stop))) {
goto return_failed;
}

return_failed:
// Free whatever you need to free in case it crashes…
return NULL;
Chandim says:

March 15, 2011 at 09:02

Guy,

Your example given for the clock_gettime(CLOCK_PROCESS_CPUTIME_ID) is not working when the sec count is increased. Is is something with the function call. I always get the sec count as 0 even I put a sleep between the start and end of clock_getttime functions.
Can you explain this?

Thanks in advance.
JRG says:

July 31, 2011 at 00:06

Jomat, your version contains needless sophistries and your sample as written will leak memory. Your code demonstrates very poor programming practice by forcing memory to be freed at a different level from which it was allocated.

One assumes you are attempting to optimize Mr. Rutenberg’s original code. Your attempt is both unnecessary and ineffective. gcc has automatically optimized out the temporaries in both the parameters and the return value for many years now. Your modifications are not only unnecessary, but they failed to achieve your aim, since the malloc() call consumes many more cycles than your conversion to pointers saves. Mr. Rutenberg’s code is better than yours.
Mike says:

August 20, 2011 at 04:21

Hi,

I am trying to use your code with gcc. I did add -lrt in the command line, however it doesnt seem like its recognizing timespec structure. As a result its erroring out. Any clue? Also, where can I find my time.h and sys/time.h files? I’m running on my company server.

thanks
-M.
newbee says:

October 3, 2011 at 15:11

hi,
I’m very new in c,
while try to understand you explanation, I got this

fatal error: iostream: No such file or directory

I’m use mingw and codeblock to run you example,
would you like to explain to me what wrong ?

thanks in advance.
newbee says:

October 3, 2011 at 17:53

hi again,
After research, I found that iosream compile on g++,
but new problem came.

I got error look like below :

‘timespec’ does not name of a type
in function ‘int main()’;
timespec,CLOCK_PROCESS_CPUTIME_ID’, time1, clock_gettime, time2, diff was not declare on this scope

thanks in advance.
jg4545 says:

November 13, 2011 at 23:08

Hi,
Like Chandim I tried the code but the second count is not incrementing even if I use sleep(3).
Can anyone help please?
The program works fine if I use CLOCK_MONOTONIC.

Thanks.
Nelson says:

March 12, 2012 at 09:33

Hi Guy,
Thanks for this useful article. Keep up the good work.
Is this thread safe ?
Guy says:

March 12, 2012 at 22:42

Basically, yes. Use CLOCK_THREAD_CPUTIME_ID and you should be all right. Of course you would have to use thread-safe io to print the results and not just plain cout as I did.
Pingback: Andy's blog » RDTSC 与 clock_gettime
Pingback: issue with clock_gettime
Tony says:

November 2, 2012 at 23:55

I see a lot of people commenting about calling sleep and it not showing up in their times. The time spent sleeping is not counted in a process-specific or thread-specific clock. Think abou it, if you are sleeping, your process is not running, therefore there is no time-accrual.
Pingback: Controlar el tiempo de ejcucion de un programa con C++ - Jesusnoseq
Sumi says:

December 10, 2012 at 23:11

Hi Guy,

Shouldn’t the following line in calculating the difference-

temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;

instead be-

temp.tv_nsec = 1000000000-end.tv_nsec+start.tv_nsec;

?

For e.g., if you have start time of 1 hr : 00 mins and end time of 2 hrs : 05 mins, the difference is:

0 hrs : (60-05 + 0) = 55 mins.
Guy says:

December 14, 2012 at 18:04

@Sumi: The difference between 1:00 and 2:05 is 65 minutes and not 55 minutes. The calculation is using trivial carry. Furthermore, in the case you pointed, there is not need for carry at all.
Sumi says:

December 14, 2012 at 19:03

Gosh! I must’ve been sleepy. I meant 1:05 and 2:00 hrs, and the carry works out right anyway. Sorry about that.
Breno says:

January 24, 2013 at 15:36

Hi, where may I use -lrt, cause I’m using code block. Thanks a lot.
Guy says:

February 15, 2013 at 18:49

I’m not familiar with code::blocks, but it should be where you specify linker flags.
novice says:

October 3, 2013 at 13:50

Thanks Guy.

If my Process has many threads (30 threads total, lets say 18 threads on Linux Core0, and 12 threads on Linux core1 ), and I use clock_gettime(CLOCK_PROCESS_CPUTIME_ID , &proc_ts)

will clock_gettime give me the correct Process Time inside proc_ts? In other words: Will clock_gettime( ) correctly account for all the thread execution times across two Linux cores.

In my code I am doing a sampling every 5 seconds. I do a clock_gettime(CLOCK_MONOTONIC, &sys_ts) in the beginning and at the end of the iteration to make sure 5 secs of system time elapsed. Since I have two Linux cores, I would expect my multi-threaded-process to have executed a theoretical max of upto 10 seconds.

For the most part I seem to be getting sane results for my process time. Once in a rare while, clock_gettime(clk_id, &proc_ts) is telling me that my process spent 13 seconds. I dont understand how clock_gettime( ) can produce a result greater than the theoretical MAX [5 sec * 2 cores = 10 secs].

Thanks in Advance for your help 🙂
Pingback: Explanation wanted: log10 faster than log and log2, but only with O2 and greater | Ask Programming & Technology
John says:

May 6, 2014 at 01:36

Thanks Guy,

I was using the above code to time measure how long it takes for my program to run.
I use CLOCK_PROCESS_CPUTIME_ID as clock id. After I run the code with profiling, it said it took 15 sec. But when I actually measure using a stop watch, it is 30 sec. Why the time return by diff() is 1/2 of what it should be?

Thank you.
Guy says:

May 9, 2014 at 14:39

CLOCK_PROCESS_CPUTIME_ID measures actual CPU time not wall-time, hence the difference you see. When there is a context switch, the time spent on the other process isn’t counted in the the timer you’re using, however, your stopwatch keeps on going.
Bharathi says:

December 9, 2014 at 07:26

perfFrequencyLSL = m_util.getQueryPerformanceFrequency(perfFrequencyLSL);
perfCPUTicksStart = m_util.getQueryPerformanceCounter(perfCPUTicksStart);
startTime = m_util.getTime(perfCPUTicksStart, perfFrequencyLSL);

while(currentTime < (startTime + 1))
{
/// do some operation
perfCPUTicksCurrent = m_util.getQueryPerformanceCounter(perfCPUTicksCurrent);
currentTime = m_util.getTime(perfCPUTicksCurrent, perfFrequencyLSL);
}

How should I implement this functionality in Linux using this clock_gettime() function.
Here is the function definition of getTime().
float Utilities::getTime(LARGE_INTEGER time, LARGE_INTEGER frequency)
{
return time.QuadPart/frequency.QuadPart;
}

26/9/2007 – Update

Share this:

60 thoughts on “Profiling Code Using clock_gettime”

Leave a Reply

60 thoughts on “Profiling Code Using `clock_gettime`”