Guy Rutenberg

Keeping track of what I do

Profiling Code Using clock_gettime

with 45 comments

After raising the issue of the low resolution problem of the timer provided by clock() in Resolution Problems in clock(), I’ve ended the post by mentioning to two more functions that should provide high-resolution timers suitable for profiling code. In this post I will discuss one of them, clock_gettime().

The clock_gettime() provides access to several useful timers with the resolution of nanoseconds. First, the prototype for the function is as follows:

int clock_gettime(clockid_t clk_id, struct timespect *tp);

The clk_id allows us to select a specific clock from the several
offered by the system, which includes:

  • CLOCK_REALTIME, a system-wide realtime clock.
  • CLOCK_PROCESS_CPUTIME_ID, high-resolution timer provided by the CPU for each process.
  • CLOCK_THREAD_CPUTIME_ID, high-resolution timer provided by the CPU for each of the threads.

Usually, there are more clocks provided, but I find these three the most useful as they allow to get the execution time spent in the system level, process level and the thread level.

The current clock time, for the chosen clock is stored int the struct provided by the *tp pointer. The timespec struct is defined as follows:

struct timespec {
	time_t tv_sec; /* seconds */
	long tv_nsec; /* nanoseconds */
};

To time the processing time some function took, one should call clock_gettime() twice. Once before the function call and once right after it and subtract the returned timings to get the actual runtime.

Getting the difference between two timespec structs isn’t very complicated and can be acheived using the function diff() defined bellow:

timespec diff(timespec start, timespec end)
{
	timespec temp;
	if ((end.tv_nsec-start.tv_nsec)<0) {
		temp.tv_sec = end.tv_sec-start.tv_sec-1;
		temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
	} else {
		temp.tv_sec = end.tv_sec-start.tv_sec;
		temp.tv_nsec = end.tv_nsec-start.tv_nsec;
	}
	return temp;
}

Now let’s move to some real example:

#include <iostream>
#include <time.h>
using namespace std;
 
timespec diff(timespec start, timespec end);
 
int main()
{
	timespec time1, time2;
	int temp;
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
	for (int i = 0; i< 242000000; i++)
		temp+=temp;
	clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
	cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl;
	return 0;
}
 
timespec diff(timespec start, timespec end)
{
	timespec temp;
	if ((end.tv_nsec-start.tv_nsec)<0) {
		temp.tv_sec = end.tv_sec-start.tv_sec-1;
		temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
	} else {
		temp.tv_sec = end.tv_sec-start.tv_sec;
		temp.tv_nsec = end.tv_nsec-start.tv_nsec;
	}
	return temp;
}

To use clock_gettime you need to include time.h and to link to librt.a. If you use gcc just make sure you add -lrt to your list of arguments.

Play a bit with the length of the for loop. As you can see clock_gettime provides much more accurate results and can register very short processing time too. Just remember that as the case with any profiling functions, this function adds a little overhead to your program, so make sure you disable the profiling code, using preprocessor commands for example, in the production release.

26/9/2007 – Update

You may want to take a look at The Revised String Iteration Benchmark post for another, larger, example of using clock_gettime to time performance of code.

Written by Guy

September 22nd, 2007 at 11:55 am

Posted in C/C++,Tutorials

Tagged with

45 Responses to 'Profiling Code Using clock_gettime'

Subscribe to comments with RSS or TrackBack to 'Profiling Code Using clock_gettime'.

  1. Guy,

    Are you sure that the following diff routine is correct?

    timespec diff(timespec start, timespec end)
    {
    timespec temp;
    if ((end.tv_nsec-start.tv_nsec)<0) {
    temp.tv_sec = end.tv_sec-start.tv_sec-1;
    temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    } else {
    temp.tv_sec = end.tv_sec-start.tv_sec;
    temp.tv_nsec = end.tv_nsec-start.tv_nsec;
    }
    return temp;
    }

    What is your ratoionale for adding 1000000000, According to me what you need to take into accunt LONG_INT_MAX rather than 1000000000. Thereby

    if ((end.tv_nsec-start.tv_nsec)<0) {
    temp.tv_sec = end.tv_sec-start.tv_sec-1;
    temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    }

    Needs to be replaced by

    if ((end.tv_nsec < start.tv_nsec) ) {
    temp.tv_sec = end.tv_sec-start.tv_sec-1;
    temp.tv_nsec = LONG_INT_MAX+ (end.tv_nsec-start.tv_nsec);
    }

    Cheers!

    Isa

    Isa

    25 Jun 08 at 18:55

  2. Whoops. I meant LONG_MAX instead of LONG_INT_MAX.

    Cheers!
    Isa

    Isa

    25 Jun 08 at 19:06

  3. Hi Isa,

    To understand why the code works we first need to understand how the timespec struct is structured and what the code tries to achieve. The timespec stores the time since certain point using two fields, tv_sec stores the seconds and tv_nsec stores the nanoseconds the represent the remaining part of the second. The if condition handles the case when the end time nanosecond count is smaller then the start-time’s. What we want to do is to subtract the timespecs field by field, however doing so in this case leads to negative value in the nanoseconds field, which will result in an overflow and undesired behavior.

    The solution for this is the same method used when doing long subtraction in elementary school, borrow one from the next more important field. In this case I borrow a second from the tv_sec counter, so this is 1,000,000,000 nanoseconds (and hence the constant), now after borrowing I no longer have negative value in the nanoseconds counter.

    As this is the same principle as in regular long subtraction, using LONG_MAX, is wrong as it doesn’t represent 1 second in nanoseconds.

    You intention would have worked if we only had one field, that counted time as nanoseconds, and utilized the whole range of values in long data-type, and in this case the “if” check for overflow. But in this way you can’t distinguish between one overflow and more, it is still flawed. Luckily for us timespec keeps two fields, seconds, and nanoseconds, which allows us to overcome this pitfall and implement the timespec diff as I did.

    Regards,
    Guy

    Guy

    25 Jun 08 at 21:00

  4. Guy,

    Thanks for the explanation.

    Isa

    27 Jun 08 at 08:47

  5. hi
    thanks for that good piece of code and explanation.
    i have a question which is => is that code thread safe or not? CLOCK_THREAD_CPUTIME_ID if i give this parameter is it thread safe?

    Ozan

    26 Aug 08 at 12:16

  6. Hi Ozan,

    The diff function is thread safe. You should make sure the time1,time2 variable to which you save the data aren’t overwritten by other threads (possible solution is to store them in Thread Local Storage (see Boost Threads library).

    Using CLOCK_THREAD_CPUTIME_ID isn’t relevant as we are only using the clock for measuring relative time (not the time passed since the clock got started).

    I hope this makes the issue clearer.

    Guy

    26 Aug 08 at 14:39

  7. thank you Guy. I have made some tryings with sys/times.h and ctime, these didnt help me. So again i want to say thank you for this nice example. I also want to make clearer that: Although all my threads have their own time variables, sys/times.h–>times function returned 0 most of time. i checked your method with single process and will try with my thread function soon.

    Ozan

    26 Aug 08 at 14:57

  8. Good for people to know.

    Beulah

    23 Oct 08 at 01:31

  9. [...] კოდის პროფიალინგი C-ში [...]

  10. Hi Guy,

    great article.
    Do you know a way to do the same thing on Windows?
    I have to measure the CPU-Time (not only elapsed time) of several processes at µs-precision.

    Regards,

    Neil

    Neil

    27 Mar 09 at 21:51

  11. This is a nice tutorial.. Can we use CLOCK_MONOTONIC? when i tried to use it in a loop with 3 seconds sleep i always get the same time..

    int main()
    {
    struct timespec now;
    while(1)
    {
    void clock_gettime(CLOCK_MONOTONIC, &now);

    printf(“Seconds is %u\n”, now.tv_sec);
    printf(“Nano Seconds is %u\n”, now.ntv_sec);

    sleep(3);
    }

    return 0;
    }

    Tim

    28 Mar 09 at 00:05

  12. @Neil: Take a look at QueryPerformanceCounter and QueryPerformanceFrequency it should be fairly close to clock_gettime. I don’t know a way in Windows to get the actual CPU time spent on a process (or several).

    @Tim: Are you sure your system implements CLOCK_MONOTONIC? According to the posix specification, support for CLOCK_MONOTONIC is optional. It looks to me that you use clock_gettime correctly, maybe try to make some real work instead of sleep to see if it changes something.

    Guy

    28 Mar 09 at 20:51

  13. [...] Code Using clock_gettime By chris Find a good explanation here written by Guy [...]

  14. very helpful.

    soji

    7 Jul 09 at 19:03

  15. Hello to all,
    I am trying to profile a code with different tests but they are similar to each other.I run clock_gettime repeatly and all tests have around 0.3 sec and sometimes I get 3.3 sec which is not normal.And when I try the same tests without the loops I get the normal around 0.3 sec.I tried to put sleep in the loop but nothing happens.May someone explain this to me.

    Thanks in advance.

    Chris

    9 Jul 09 at 03:14

  16. OK I got it I have the negative one case….

    Chris

    9 Jul 09 at 03:40

  17. Guy :) Been searching for something like this for the last 3 days.
    you are a life saver :D
    And once again, thanks a lot.
    (please remove the previous comment, i gave the wrong link for website)

    lordamit

    8 Aug 09 at 20:20

  18. @lordamit, I’ve deleted the previous comment, you’re welcomed.

    Guy

    9 Aug 09 at 22:13

  19. Hi Guy,
    I found your tutorial very useful,thanks a lot,.
    I have a problem, I’am implementing a timers on embedded system,first a implemented it using system Time, but this was not good since systme time can be changed a t any moment and this may fail all timer.. So I decided to use CPU time based on your tutorial with clock_gettime, but first pb ::: CLOCK_PROCESS_CPUTIME_ID undeclared , I’am using C language on FREEBSD5.3 ,have you any idea ?!
    thanks a lot !

    mezzo

    19 Aug 09 at 14:42

  20. Hi,

    How to disable the profiling code?

    regards,
    Mostofa

    Mostofa

    3 Jun 10 at 11:26

  21. @mezzo: There are some more clocks documented in “man clock_gettime”,you may want to have a look and try them.

    @Mostofa: Delete it? put in a conditional block?

    Guy

    3 Jun 10 at 20:14

  22. Hi Guy,
    I am using times() and clock() function my linux and windows applications respectively. clock() and times() functions are not getting updated with proper values so at some times it results in wrong values. so I would like to know is there any OS call or any system call which updates the clock() and times() functions. is clock_gettime() can be used in both windows and linux applications? whether the clock_ids are part of time.h header or we need to add some other header or any library?

    Lucks

    30 Jul 10 at 08:09

  23. Hi Lucks,
    1. They should be updated automatically.
    2. clock_gettime() doesn’t exist on windows. See this comment for alternatives for windows.
    3. The ids are part of time.h. See the example program in the post.

    Guy

    30 Jul 10 at 20:14

  24. Hello Guy,

    Very nice and useful code. Thank you!

    I have just one remark regarding the output:
    For example: if a process needs 1 second and 3.000 nsec, the output will be 1:3000, and if the process needs 1 second and 300.0000 nsec the output will be 1:300000, which is practically identical. For the total time, I am suggesting something like this:
    diff(time1,time2).tv_sec + (long double)diff(time1,time2).tv_nsec/1000000000

    What do think ?

    Best regards,
    Bogdan

    Bogdan

    23 Nov 10 at 18:31

  25. Hi, it seems like CLOCK_PROCESS_CPUTIME_ID timer summarizes times spent by each thread. So that I can compute the
    scalability = time(CLOCK_PROCESS_CPUTIME_ID)/
    time(CLOCK_THREAD_CPUTIME_ID).
    E.g.:
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID …);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID …);
    #pragma omp parallel
    {
    // do mt-job here
    }
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID …);
    clock_gettime(CLOCK_THREAD_CPUTIME_ID …);

    Is that a right conclusion?
    At least on my Fedora13-64 I guess it is true.

    Albert

    15 Dec 10 at 17:59

  26. tv_sec might be a 32 bit field, is it possible that this field will overflow? (i.e. reach its largest value and reset?) Shouldn’t you take this into consideration in your “diff” function?

    Data

    20 Dec 10 at 13:06

  27. It is possible. However, it’s very unlikely to happen to you. The exact time it will happen depends on the clock you choose. For the CLOCK_REALTIME you should expect it to happen in 2038, and for the other two clocks, CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID, it’s even less likely, as it will take 138 years.

    Guy

    23 Dec 10 at 22:26

  28. If you are using Linux (Eg Ubuntu) and programming in C (GCC Compiler) you need to perform the following changes:

    static struct timespec*
    clock_diff(struct timespec *start, struct timespec *end) {
    struct timespec *diff;

    if(!(diff = malloc(sizeof(struct timespec)))) {
    return NULL;
    } else if((end->tv_nsec – start->tv_nsec) tv_sec = end->tv_sec – start->tv_sec – 1;
    diff->tv_nsec = 1000000000 + end->tv_nsec – start->tv_nsec;
    } else {
    diff->tv_sec = end->tv_sec – start->tv_sec;
    diff->tv_nsec = end->tv_nsec – start->tv_nsec;
    }
    return diff;
    }

    // Where you want to use it
    struct timespec start, stop;

    // Start Counter
    if(clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start) < 0) {
    goto return_failed;
    }

    // Put here whatever you want to test here

    // Where you want to stop
    if(clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &stop) diff = clock_diff(&start, &stop))) {
    goto return_failed;
    }

    return_failed:
    // Free whatever you need to free in case it crashes…
    return NULL;

    Jomat

    11 Feb 11 at 04:20

  29. Guy,

    Your example given for the clock_gettime(CLOCK_PROCESS_CPUTIME_ID) is not working when the sec count is increased. Is is something with the function call. I always get the sec count as 0 even I put a sleep between the start and end of clock_getttime functions.
    Can you explain this?

    Thanks in advance.

    Chandim

    15 Mar 11 at 09:02

  30. Jomat, your version contains needless sophistries and your sample as written will leak memory. Your code demonstrates very poor programming practice by forcing memory to be freed at a different level from which it was allocated.

    One assumes you are attempting to optimize Mr. Rutenberg’s original code. Your attempt is both unnecessary and ineffective. gcc has automatically optimized out the temporaries in both the parameters and the return value for many years now. Your modifications are not only unnecessary, but they failed to achieve your aim, since the malloc() call consumes many more cycles than your conversion to pointers saves. Mr. Rutenberg’s code is better than yours.

    JRG

    31 Jul 11 at 00:06

  31. Hi,

    I am trying to use your code with gcc. I did add -lrt in the command line, however it doesnt seem like its recognizing timespec structure. As a result its erroring out. Any clue? Also, where can I find my time.h and sys/time.h files? I’m running on my company server.

    thanks
    -M.

    Mike

    20 Aug 11 at 04:21

  32. hi,
    I’m very new in c,
    while try to understand you explanation, I got this

    fatal error: iostream: No such file or directory

    I’m use mingw and codeblock to run you example,
    would you like to explain to me what wrong ?

    thanks in advance.

    newbee

    3 Oct 11 at 15:11

  33. hi again,
    After research, I found that iosream compile on g++,
    but new problem came.

    I got error look like below :

    ‘timespec’ does not name of a type
    in function ‘int main()’;
    timespec,CLOCK_PROCESS_CPUTIME_ID’, time1, clock_gettime, time2, diff was not declare on this scope

    thanks in advance.

    newbee

    3 Oct 11 at 17:53

  34. Hi,
    Like Chandim I tried the code but the second count is not incrementing even if I use sleep(3).
    Can anyone help please?
    The program works fine if I use CLOCK_MONOTONIC.

    Thanks.

    jg4545

    13 Nov 11 at 23:08

  35. Hi Guy,
    Thanks for this useful article. Keep up the good work.
    Is this thread safe ?

    Nelson

    12 Mar 12 at 09:33

  36. Basically, yes. Use CLOCK_THREAD_CPUTIME_ID and you should be all right. Of course you would have to use thread-safe io to print the results and not just plain cout as I did.

    Guy

    12 Mar 12 at 22:42

  37. [...] Code Using clock_gettime http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/ int clock_gettime(clockid_t clk_id, struct timespect [...]

  38. [...] Also, there is a good example (I think) that discusses profiling code using clock_gettime at http://www.guyrutenberg.com/2007/09/…clock_gettime/ that may provide some insight as well as other examples from Google clock_gettime example that may [...]

  39. I see a lot of people commenting about calling sleep and it not showing up in their times. The time spent sleeping is not counted in a process-specific or thread-specific clock. Think abou it, if you are sleeping, your process is not running, therefore there is no time-accrual.

    Tony

    2 Nov 12 at 23:55

  40. Hi Guy,

    Shouldn’t the following line in calculating the difference-

    temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;

    instead be-

    temp.tv_nsec = 1000000000-end.tv_nsec+start.tv_nsec;

    ?

    For e.g., if you have start time of 1 hr : 00 mins and end time of 2 hrs : 05 mins, the difference is:

    0 hrs : (60-05 + 0) = 55 mins.

    Sumi

    10 Dec 12 at 23:11

  41. @Sumi: The difference between 1:00 and 2:05 is 65 minutes and not 55 minutes. The calculation is using trivial carry. Furthermore, in the case you pointed, there is not need for carry at all.

    Guy

    14 Dec 12 at 18:04

  42. Gosh! I must’ve been sleepy. I meant 1:05 and 2:00 hrs, and the carry works out right anyway. Sorry about that.

    Sumi

    14 Dec 12 at 19:03

  43. Hi, where may I use -lrt, cause I’m using code block. Thanks a lot.

    Breno

    24 Jan 13 at 15:36

  44. I’m not familiar with code::blocks, but it should be where you specify linker flags.

    Guy

    15 Feb 13 at 18:49

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>