Profiling Code Using clock_gettime

After raising the issue of the low resolution problem of the timer provided by clock() in Resolution Problems in clock(), I’ve ended the post by mentioning to two more functions that should provide high-resolution timers suitable for profiling code. In this post I will discuss one of them, clock_gettime().

The clock_gettime() provides access to several useful timers with the resolution of nanoseconds. First, the prototype for the function is as follows:

int clock_gettime(clockid_t clk_id, struct timespect *tp);

The clk_id allows us to select a specific clock from the several
offered by the system, which includes:

  • CLOCK_REALTIME, a system-wide realtime clock.
  • CLOCK_PROCESS_CPUTIME_ID, high-resolution timer provided by the CPU for each process.
  • CLOCK_THREAD_CPUTIME_ID, high-resolution timer provided by the CPU for each of the threads.

Usually, there are more clocks provided, but I find these three the most useful as they allow to get the execution time spent in the system level, process level and the thread level.

The current clock time, for the chosen clock is stored int the struct provided by the *tp pointer. The timespec struct is defined as follows:

struct timespec {
	time_t tv_sec; /* seconds */
	long tv_nsec; /* nanoseconds */
};

To time the processing time some function took, one should call clock_gettime() twice. Once before the function call and once right after it and subtract the returned timings to get the actual runtime.

Getting the difference between two timespec structs isn’t very complicated and can be acheived using the function diff() defined bellow:

timespec diff(timespec start, timespec end)
{
	timespec temp;
	if ((end.tv_nsec-start.tv_nsec)<0) {
		temp.tv_sec = end.tv_sec-start.tv_sec-1;
		temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
	} else {
		temp.tv_sec = end.tv_sec-start.tv_sec;
		temp.tv_nsec = end.tv_nsec-start.tv_nsec;
	}
	return temp;
}

Now let’s move to some real example:

#include <iostream>
#include <time.h>
using namespace std;

timespec diff(timespec start, timespec end);

int main()
{
	timespec time1, time2;
	int temp;
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
	for (int i = 0; i< 242000000; i++)
		temp+=temp;
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
	cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl;
	return 0;
}

timespec diff(timespec start, timespec end)
{
	timespec temp;
	if ((end.tv_nsec-start.tv_nsec)<0) {
		temp.tv_sec = end.tv_sec-start.tv_sec-1;
		temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
	} else {
		temp.tv_sec = end.tv_sec-start.tv_sec;
		temp.tv_nsec = end.tv_nsec-start.tv_nsec;
	}
	return temp;
}

To use clock_gettime you need to include time.h and to link to librt.a. If you use gcc just make sure you add -lrt to your list of arguments.

Play a bit with the length of the for loop. As you can see clock_gettime provides much more accurate results and can register very short processing time too. Just remember that as the case with any profiling functions, this function adds a little overhead to your program, so make sure you disable the profiling code, using preprocessor commands for example, in the production release.

26/9/2007 – Update

You may want to take a look at The Revised String Iteration Benchmark post for another, larger, example of using clock_gettime to time performance of code.

60 thoughts on “Profiling Code Using clock_gettime

  1. Guy,

    Are you sure that the following diff routine is correct?

    timespec diff(timespec start, timespec end)
    {
    timespec temp;
    if ((end.tv_nsec-start.tv_nsec)<0) {
    temp.tv_sec = end.tv_sec-start.tv_sec-1;
    temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    } else {
    temp.tv_sec = end.tv_sec-start.tv_sec;
    temp.tv_nsec = end.tv_nsec-start.tv_nsec;
    }
    return temp;
    }

    What is your ratoionale for adding 1000000000, According to me what you need to take into accunt LONG_INT_MAX rather than 1000000000. Thereby

    if ((end.tv_nsec-start.tv_nsec)<0) {
    temp.tv_sec = end.tv_sec-start.tv_sec-1;
    temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    }

    Needs to be replaced by

    if ((end.tv_nsec < start.tv_nsec) ) {
    temp.tv_sec = end.tv_sec-start.tv_sec-1;
    temp.tv_nsec = LONG_INT_MAX+ (end.tv_nsec-start.tv_nsec);
    }

    Cheers!

    Isa

  2. Hi Isa,

    To understand why the code works we first need to understand how the timespec struct is structured and what the code tries to achieve. The timespec stores the time since certain point using two fields, tv_sec stores the seconds and tv_nsec stores the nanoseconds the represent the remaining part of the second. The if condition handles the case when the end time nanosecond count is smaller then the start-time’s. What we want to do is to subtract the timespecs field by field, however doing so in this case leads to negative value in the nanoseconds field, which will result in an overflow and undesired behavior.

    The solution for this is the same method used when doing long subtraction in elementary school, borrow one from the next more important field. In this case I borrow a second from the tv_sec counter, so this is 1,000,000,000 nanoseconds (and hence the constant), now after borrowing I no longer have negative value in the nanoseconds counter.

    As this is the same principle as in regular long subtraction, using LONG_MAX, is wrong as it doesn’t represent 1 second in nanoseconds.

    You intention would have worked if we only had one field, that counted time as nanoseconds, and utilized the whole range of values in long data-type, and in this case the “if” check for overflow. But in this way you can’t distinguish between one overflow and more, it is still flawed. Luckily for us timespec keeps two fields, seconds, and nanoseconds, which allows us to overcome this pitfall and implement the timespec diff as I did.

    Regards,
    Guy

  3. hi
    thanks for that good piece of code and explanation.
    i have a question which is => is that code thread safe or not? CLOCK_THREAD_CPUTIME_ID if i give this parameter is it thread safe?

  4. Hi Ozan,

    The diff function is thread safe. You should make sure the time1,time2 variable to which you save the data aren’t overwritten by other threads (possible solution is to store them in Thread Local Storage (see Boost Threads library).

    Using CLOCK_THREAD_CPUTIME_ID isn’t relevant as we are only using the clock for measuring relative time (not the time passed since the clock got started).

    I hope this makes the issue clearer.

  5. thank you Guy. I have made some tryings with sys/times.h and ctime, these didnt help me. So again i want to say thank you for this nice example. I also want to make clearer that: Although all my threads have their own time variables, sys/times.h–>times function returned 0 most of time. i checked your method with single process and will try with my thread function soon.

  6. Hi Guy,

    great article.
    Do you know a way to do the same thing on Windows?
    I have to measure the CPU-Time (not only elapsed time) of several processes at µs-precision.

    Regards,

    Neil

  7. This is a nice tutorial.. Can we use CLOCK_MONOTONIC? when i tried to use it in a loop with 3 seconds sleep i always get the same time..

    int main()
    {
    struct timespec now;
    while(1)
    {
    void clock_gettime(CLOCK_MONOTONIC, &now);

    printf(“Seconds is %u\n”, now.tv_sec);
    printf(“Nano Seconds is %u\n”, now.ntv_sec);

    sleep(3);
    }

    return 0;
    }

  8. @Neil: Take a look at QueryPerformanceCounter and QueryPerformanceFrequency it should be fairly close to clock_gettime. I don’t know a way in Windows to get the actual CPU time spent on a process (or several).

    @Tim: Are you sure your system implements CLOCK_MONOTONIC? According to the posix specification, support for CLOCK_MONOTONIC is optional. It looks to me that you use clock_gettime correctly, maybe try to make some real work instead of sleep to see if it changes something.

  9. Hello to all,
    I am trying to profile a code with different tests but they are similar to each other.I run clock_gettime repeatly and all tests have around 0.3 sec and sometimes I get 3.3 sec which is not normal.And when I try the same tests without the loops I get the normal around 0.3 sec.I tried to put sleep in the loop but nothing happens.May someone explain this to me.

    Thanks in advance.

  10. Guy 🙂 Been searching for something like this for the last 3 days.
    you are a life saver 😀
    And once again, thanks a lot.
    (please remove the previous comment, i gave the wrong link for website)

  11. Hi Guy,
    I found your tutorial very useful,thanks a lot,.
    I have a problem, I’am implementing a timers on embedded system,first a implemented it using system Time, but this was not good since systme time can be changed a t any moment and this may fail all timer.. So I decided to use CPU time based on your tutorial with clock_gettime, but first pb ::: CLOCK_PROCESS_CPUTIME_ID undeclared , I’am using C language on FREEBSD5.3 ,have you any idea ?!
    thanks a lot !

  12. @mezzo: There are some more clocks documented in “man clock_gettime”,you may want to have a look and try them.

    @Mostofa: Delete it? put in a conditional block?

  13. Hi Guy,
    I am using times() and clock() function my linux and windows applications respectively. clock() and times() functions are not getting updated with proper values so at some times it results in wrong values. so I would like to know is there any OS call or any system call which updates the clock() and times() functions. is clock_gettime() can be used in both windows and linux applications? whether the clock_ids are part of time.h header or we need to add some other header or any library?

  14. Hi Lucks,
    1. They should be updated automatically.
    2. clock_gettime() doesn’t exist on windows. See this comment for alternatives for windows.
    3. The ids are part of time.h. See the example program in the post.

  15. Hello Guy,

    Very nice and useful code. Thank you!

    I have just one remark regarding the output:
    For example: if a process needs 1 second and 3.000 nsec, the output will be 1:3000, and if the process needs 1 second and 300.0000 nsec the output will be 1:300000, which is practically identical. For the total time, I am suggesting something like this:
    diff(time1,time2).tv_sec + (long double)diff(time1,time2).tv_nsec/1000000000

    What do think ?

    Best regards,
    Bogdan

  16. Hi, it seems like CLOCK_PROCESS_CPUTIME_ID timer summarizes times spent by each thread. So that I can compute the
    scalability = time(CLOCK_PROCESS_CPUTIME_ID)/
    time(CLOCK_THREAD_CPUTIME_ID).
    E.g.:
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID …);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID …);
    #pragma omp parallel
    {
    // do mt-job here
    }
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID …);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID …);

    Is that a right conclusion?
    At least on my Fedora13-64 I guess it is true.

  17. tv_sec might be a 32 bit field, is it possible that this field will overflow? (i.e. reach its largest value and reset?) Shouldn’t you take this into consideration in your “diff” function?

  18. It is possible. However, it’s very unlikely to happen to you. The exact time it will happen depends on the clock you choose. For the CLOCK_REALTIME you should expect it to happen in 2038, and for the other two clocks, CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, it’s even less likely, as it will take 138 years.

  19. If you are using Linux (Eg Ubuntu) and programming in C (GCC Compiler) you need to perform the following changes:

    static struct timespec*
    clock_diff(struct timespec *start, struct timespec *end) {
    struct timespec *diff;

    if(!(diff = malloc(sizeof(struct timespec)))) {
    return NULL;
    } else if((end->tv_nsec – start->tv_nsec) tv_sec = end->tv_sec – start->tv_sec – 1;
    diff->tv_nsec = 1000000000 + end->tv_nsec – start->tv_nsec;
    } else {
    diff->tv_sec = end->tv_sec – start->tv_sec;
    diff->tv_nsec = end->tv_nsec – start->tv_nsec;
    }
    return diff;
    }

    // Where you want to use it
    struct timespec start, stop;

    // Start Counter
    if(clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start) < 0) {
    goto return_failed;
    }

    // Put here whatever you want to test here

    // Where you want to stop
    if(clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop) diff = clock_diff(&start, &stop))) {
    goto return_failed;
    }

    return_failed:
    // Free whatever you need to free in case it crashes…
    return NULL;

  20. Guy,

    Your example given for the clock_gettime(CLOCK_PROCESS_CPUTIME_ID) is not working when the sec count is increased. Is is something with the function call. I always get the sec count as 0 even I put a sleep between the start and end of clock_getttime functions.
    Can you explain this?

    Thanks in advance.

  21. Jomat, your version contains needless sophistries and your sample as written will leak memory. Your code demonstrates very poor programming practice by forcing memory to be freed at a different level from which it was allocated.

    One assumes you are attempting to optimize Mr. Rutenberg’s original code. Your attempt is both unnecessary and ineffective. gcc has automatically optimized out the temporaries in both the parameters and the return value for many years now. Your modifications are not only unnecessary, but they failed to achieve your aim, since the malloc() call consumes many more cycles than your conversion to pointers saves. Mr. Rutenberg’s code is better than yours.

  22. Hi,

    I am trying to use your code with gcc. I did add -lrt in the command line, however it doesnt seem like its recognizing timespec structure. As a result its erroring out. Any clue? Also, where can I find my time.h and sys/time.h files? I’m running on my company server.

    thanks
    -M.

  23. hi,
    I’m very new in c,
    while try to understand you explanation, I got this

    fatal error: iostream: No such file or directory

    I’m use mingw and codeblock to run you example,
    would you like to explain to me what wrong ?

    thanks in advance.

  24. hi again,
    After research, I found that iosream compile on g++,
    but new problem came.

    I got error look like below :

    ‘timespec’ does not name of a type
    in function ‘int main()’;
    timespec,CLOCK_PROCESS_CPUTIME_ID’, time1, clock_gettime, time2, diff was not declare on this scope

    thanks in advance.

  25. Hi,
    Like Chandim I tried the code but the second count is not incrementing even if I use sleep(3).
    Can anyone help please?
    The program works fine if I use CLOCK_MONOTONIC.

    Thanks.

  26. Hi Guy,
    Thanks for this useful article. Keep up the good work.
    Is this thread safe ?

  27. Basically, yes. Use CLOCK_THREAD_CPUTIME_ID and you should be all right. Of course you would have to use thread-safe io to print the results and not just plain cout as I did.

  28. I see a lot of people commenting about calling sleep and it not showing up in their times. The time spent sleeping is not counted in a process-specific or thread-specific clock. Think abou it, if you are sleeping, your process is not running, therefore there is no time-accrual.

  29. Hi Guy,

    Shouldn’t the following line in calculating the difference-

    temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;

    instead be-

    temp.tv_nsec = 1000000000-end.tv_nsec+start.tv_nsec;

    ?

    For e.g., if you have start time of 1 hr : 00 mins and end time of 2 hrs : 05 mins, the difference is:

    0 hrs : (60-05 + 0) = 55 mins.

  30. @Sumi: The difference between 1:00 and 2:05 is 65 minutes and not 55 minutes. The calculation is using trivial carry. Furthermore, in the case you pointed, there is not need for carry at all.

  31. Gosh! I must’ve been sleepy. I meant 1:05 and 2:00 hrs, and the carry works out right anyway. Sorry about that.

  32. Thanks Guy.

    If my Process has many threads (30 threads total, lets say 18 threads on Linux Core0, and 12 threads on Linux core1 ), and I use clock_gettime(CLOCK_PROCESS_CPUTIME_ID , &proc_ts)

    will clock_gettime give me the correct Process Time inside proc_ts? In other words: Will clock_gettime( ) correctly account for all the thread execution times across two Linux cores.

    In my code I am doing a sampling every 5 seconds. I do a clock_gettime(CLOCK_MONOTONIC, &sys_ts) in the beginning and at the end of the iteration to make sure 5 secs of system time elapsed. Since I have two Linux cores, I would expect my multi-threaded-process to have executed a theoretical max of upto 10 seconds.

    For the most part I seem to be getting sane results for my process time. Once in a rare while, clock_gettime(clk_id, &proc_ts) is telling me that my process spent 13 seconds. I dont understand how clock_gettime( ) can produce a result greater than the theoretical MAX [5 sec * 2 cores = 10 secs].

    Thanks in Advance for your help 🙂

  33. Thanks Guy,

    I was using the above code to time measure how long it takes for my program to run.
    I use CLOCK_PROCESS_CPUTIME_ID as clock id. After I run the code with profiling, it said it took 15 sec. But when I actually measure using a stop watch, it is 30 sec. Why the time return by diff() is 1/2 of what it should be?

    Thank you.

  34. CLOCK_PROCESS_CPUTIME_ID measures actual CPU time not wall-time, hence the difference you see. When there is a context switch, the time spent on the other process isn’t counted in the the timer you’re using, however, your stopwatch keeps on going.

  35. perfFrequencyLSL = m_util.getQueryPerformanceFrequency(perfFrequencyLSL);
    perfCPUTicksStart = m_util.getQueryPerformanceCounter(perfCPUTicksStart);
    startTime = m_util.getTime(perfCPUTicksStart, perfFrequencyLSL);

    while(currentTime < (startTime + 1))
    {
    /// do some operation
    perfCPUTicksCurrent = m_util.getQueryPerformanceCounter(perfCPUTicksCurrent);
    currentTime = m_util.getTime(perfCPUTicksCurrent, perfFrequencyLSL);
    }

    How should I implement this functionality in Linux using this clock_gettime() function.
    Here is the function definition of getTime().
    float Utilities::getTime(LARGE_INTEGER time, LARGE_INTEGER frequency)
    {
    return time.QuadPart/frequency.QuadPart;
    }

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.