Copyright (c) 2003 Toomas Karmo. Revision history: 20040224T162804Z/version_0001.0002 (modified some terminology: not Linux, but GNU/Linux); 20030623T231346Z/version_0001.0000 (wrote base version). Permission is granted to copy, distribute, and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2, or any later version published by the Free Software Foundation. In the terminology of the License, this document has no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. The definitive machine-readable copy of this document is in the "Literary" section of
http://www.metascientia.com. A copy of the License is included in a hyperlinked section, entitled GNU Free Documentation License, of the machine-readable copy.

No-Frills GNU/Linux:
Timekeeping,
Timestamping,
Timelogging

Setting the Clock with NTP

Today's technology demands precise timekeeping. The GPS navigation system, for instance, has to allow for the fact that the rate at which time passes depends on such subtle physical factors as the strength of the local gravitational field. The complex physics has a simple upshot: when exactly one day, of 86400 (that's 24 times 60 times 60) seconds, passes on a Global Positioning System (GPS) satellite, a little more than 86400 seconds, in fact an extra 38 or so millionths of a second, passes on the surface of Earth. Small though the discrepancy between the two timeflows is, GPS needs to include it in its computations if it is to give us accurate navigation coordinates. Happily, today's chronometry rises to such challenges. It is expected that the Bureau international des poids et mesures (BIPM) "TAI" timescale, based on the joint output from atomic-clock timing centres in around 30 countries, would take not a day, but rather (to one significant figure) 400 years before it erred by as much as 38 millionths of a second.

Although the cheap quartz oscillators that track time in personal computers can drift by several seconds in a single day, GNU/Linux has ways of keeping our own chronometry reasonably exact. As I'll now explain, simple procedures are enough to keep us - at any rate in the hour or so immediately following a clock correction - within a few hundreds of milliseconds of BIPM.

Basic GNU/Linux chronometry starts with our synchronizing our software clock, and optionally also our hardware clock, with some "Stratum 2" time server under Network Time Protocol (NTP).

A "Stratum 1" server is a device linked directly, rather than by TCP/IP network, to a reference clock - in other words, to a clock itself directly regulated by GPS, by a WWV shortwave transmitter, or by some other closely BIPM-compliant system. A "Stratum 2" server is a device synchronized under NTP with a Stratum 1 server.

We, as ordinary members of the public, are not encouraged to point our workstations to Stratum 1 servers. We are, on the other hand, welcome to use those Stratum 2 servers whose administrators have set up a policy of open access. (How severely does accuracy degrade once NTP enters the chain? Whereas a Stratum 1 server can be expected to stay within 1 millisecond of the BIPM standard, a Stratum 2 server can be expected to deviate by on the order of 10 to 100 milliseconds.)

Computer scientist David L. Mills of the University of Delaware maintains a list of Stratum 2 NTP servers, including many with open access, at http://www.eecis.udel.edu/~mills/ntp/clock2a.html. He supplies e-mail addresses of administrators, remarking that contact should be made upon establishment of "regular operations" with any open-access server. In practice, "regular" here means that we should e-mail the person maintaining a Stratum 2 open-access NTP server if we are proposing to connect more than once in every, say, xxxxTO_BE_FILLED_IN----HUNDRED? THOUSAND???xxxx minutes.

My own use of NTP shows what can be done with the crudest of setups, namely PPP dialup on the current, or "Woody", stable branch of Debian, specifically Debian 3.0r1 for Intel x86. I used a similarly crude setup in 2001 and 2002 under Mandrake 7.2. In 2000 or the late 1990s, I invoked the now obsolescent rdate under the RFC 868 protocol, a predecessor of today's NTP.

Under my present, Debian, setup, I consider my working day incomplete unless I somehow find a minute to launch my basic maintenance script, written in bash, and scrutinize its effects. As is standard good practice, I (of course!) use my script to check the version numbers of all my installed packages, downloading necessary updates from a central security server. But more to the point, in that same script I interrogate a certain NTP server, physically housed on my local university campus. For enhanced readability, I'll break the pertinent script line into several lines here, and will also respect the confidentiality of my chosen server by altering its name to the meaningless "roohar.goozarcollege.ca":

      date; ntpdate roohar.goozarcollege.ca; 
      date;
      hwclock --systohc --utc --debug;
      hwclock --show

When the long script line executes, it first synchronizes my system clock (the "software clock", which runs only when the workstation itself does) with roohar.goozarcollege.ca, then synchronizes my real time clock (the "hardware clock", which runs as long as the CMOS battery functions) with the software clock.

It is the software clock that gets consulted by processes curious about the time of day, such as the shell-prompt command date. The hardware clock, on the other hand, gets consulted only at boot time, when the software clock is started up. By keeping software and hardware clocks separate, GNU/Linux allows us, should we so choose, to correct our software clock, and therefore to change the time presented to processes such as date, without perturbing the underlying hardware.

A cautious system administrator might not like my habit of perturbing the hardware clock once a day through hwclcok --systohc. That mildly dirty habit can allegedly cause trouble if clock corrections get large. You, for your part, might find it prudent to run hwclock --systohc more often than I do, or alternatively not to run it at all.

Further, you might find it prudent to use the command ntpdate -B roohar.goozarcollege.ca in place of my plain ntpdate roohar.goozarcollege.ca. The -B switch makes the system clock slew smoothly and slowly to the correct time instead of jumping. I myself do like the jump, since it lets me get a visceral appreciation of clock drift as I eyeball the second hand of an xclock on my desktop while my script executes. A five-second jump is not unusual in my setup. So far so good: in several years of operation, I've yet to get any complaints from processes confused by system-clock discontinuities.

My daily clock-maintenance routine - appropriate, as I say, for the free-standing PPP dialup workstation - marks pretty well the lowest level in sophistication. A LAN might, more ambitiously, incorporate a time server foo.yourdomain.com, synchronized every eight or six hours with roohar.goozarcollege.ca by way of a cron job invoking ntpdate. The other machines in the LAN could keep their clocks reasonably precise by running ntpdate nightly, under cron, to interrogate foo.

What if all the machines on your LAN need to have their software clocks achieving centisecond or millisecond tolerances at all instants of the day? In that case, according to the gurus, you may reach for ntpd in place of ntpdate.

Further, if you are running a free-standing workstation with PPP dialup, you could consider using Richard Curnow's chrony in place of ntpdate. Among the various appealing features of chrony - available in Debian Woody as one of the "Xtra" packages, but not (yet?) running as production-grade machinery on my own workstation - is a provision for automatically interrogating an NTP server whenever you connect to your ISP.

Timestamping

Once we have our software clock maintained in some appropriate way by NTP, we are ready to set up a rigorous system of timestamping. People around the world write timestamps in different ways, with "02/03/04", for instance, meaning "February 3, 2004" in the United States, but "2 March 2004" in the United Kingdom. The Geneva-based International Organization for Standardization, or ISO, brought order out of this chaos with its standard ISO 8601:1988, recently revised as ISO 8601:2000. (That "ISO", by the way, is not strictly an acronym, but a mnemonic for the classical Greek "isos", "equal".) Although the standards are not readily available for free, many Web authors present their essence. The work of this able community of authors is in turn summed up by a well-maintained hyperlink bibliography on a Directory Mozilla (DMOZ) page. To find that page, you can point your browser either at DMOZ itself, http://dmoz.org, or at the DMOZ-derived search area under the "Directory" tab on http://www.google.com. You can then input some such search string as ISO 8601. If things have not changed at DMOZ since 20030613T003120Z, when I last did the experiment, you'll find the ISO 8601 page in the Google presentation of DMOZ on the category path Science > Reference > Standards > Individual Standards > ISO 8601.

And there we have it - the very sentence you just read uses one of the ISO 8601-kosher timestamp formats! Since, as I say, the DMOZ-linked authors present the essence of the system, it is enough for me to present here just the essence of the essence.

Loosely speaking, the "Z" in this particular format is a reminder that we are taking the standard time of the zero, or Greenwich, meridian, as in the archaic astronomy-defined "Greenwich Mean Time". In strict accuracy, the "Z" means that we are using atomic-clock-defined Universal Coordinated Time (UTC), a system based on the BIPM TAI. It is UTC, with its leap seconds inserted or deleted against the unvarying beat of TAI, so as to allow for slight year-to-year irregularities in Earth's rotation, that underlies common civil timekeeping. The Eastern Standard Time of winter civil life in Toronto, Boston, or New York, for example, is defined as the system that lags UTC by exactly five hours - as in the familiar e-mail header line Date: Mon, 17 Mar 2003 15:48:32 -0500, for a mail transmitted at 15:48:32 EST, or 20:48:32 UTC. It is likewise UTC that my little ntpdate-invoking script uses, thanks to the --utc switch in the clause that initiates an update of the hardware clock.

This is the point at which to remark on a minor scandal in contemporary chronometry. As UTC is currently defined, an incurable lack of specificity infects the timestamping of future events. We literally do not know what second we are referring to when we ask what will be happening at, say, 20531225T121314Z! That's because we cannot predict upcoming irregularities in Earth's rotation, and so cannot predict when leap seconds will get interpolated or deleted, against the invariant TAI, by the authorities in upcoming decades. Metrologists have grappled with such issues for some years. Their next big discussion of UTC, under the aegis of the International Astronomical Union Commission 31 (Time) at the IAU General Assembly in Australia, is set for July of 2003. Maybe some core UTC concepts will change in the months or years following that discussion, maybe not.

The rest of the ISO 8601 format is almost self-evident. We write the year first, using all four digits, followed by a two-digit month and a two-digit day, in order to ensure that the ASCII sort order of timestamps matches the chronological order of the instants the timestamps denote. The T, separating the two day digits from the two hours digits, is the one part of the ISO prescription that could be called purely cosmetic.

Among the further notations made available by ISO 8601 is a not-quite-so-compressed year-month-day-hour-minute-second format with hyphens and colons:

2003-06-13T00:31:20Z

Moreover, ISO 8601 supplies formats useful in special branches of commerce, where we have to refer explicitly to the seventeenth week of a year or to the two hundred fifth day of a year.

When I want to produce a short-form timestamp at the command-line prompt, I invoke /bin/date with a tiny shell script, which I call utc:

#! /bin/bash
TIMESTAMP=`date -u "+%Y%m%dT%H%M%SZ"`
echo "${TIMESTAMP}"

When I wish to produce a more verbose display, I invoke /bin/date with a different script, which I call utcv:

#! /bin/bash
TIMESTAMP=`date -u "+%Y%m%dT%H%M%SZ"`
echo "Universal Coordinated Time (= UTC = EST+5 = EDT+4): ${TIMESTAMP}

In practice, I seldom use the utc script. But I do find myself using utcv many times a day, while composing e-mails within vi under mutt. Here the convenient syntax (with vi in command, as opposed to insert, mode) is :r !utcv. That puts a verbose timestamp, such as

Universal Coordinated Time (= UTC = EST+5 = EDT+4): 20030318T043915Z

into my file, in a line of its own right below whatever line my cursor happens to be on.

Although I don't often cause my shell to display a plain-vanilla UTC timestamp at the command line, I do make constant use of a backing-up shell script that copies foo.bar to foo.bar____BAKCCYYMMDDThhmmssZ, for the appropriate CC, YY, MM, DD, hh, mm, and ss. (That CCYYMMDDThhmmssZ, incidentally, is the the ISO-approved schema for talking about timestamps in general, as when we find ourselves writing the documentation for some timestamping software.) I call my script b. Cosmetics aside, the script consists of the single line

cp $1 ${1}____BAK$(date -u "+%Y%m%dT%H%M%SZ" 

A typical invocation is b .bashrc.

The script leaves a clean audit trail, letting anyone see on what dates I modified a file. For example, here is a part of the result of listing my numerous backup copies of .bashrc:

.bashrc____BAK20030207T153646Z .bashrc____BAK20030207T155409Z .bashrc____BAK20030220T150959Z

Timelogging

It is natural to combine timestamping with timelogging. I started timelogging with pen and paper as a young university student in the 1970s. Since 1997, however, I've settled on a lean, mean GNU/Linux formalism. My timelogs reside in three files.

(a) My log of days is a day-by-day review of an entire week, with various categories of activity for each day. My own particular categories (but no two people will think alike when they devise categories) are the following:

My file is organized as a piece of very plain ASCII. Here's how the file looks for one particular, embarrassingly unproductive, week in the late northern-hemisphere winter of 2003:

2003A_WK09 (2003-03-02 TO 2003-03-08)

    CHURC | MATPH AS-L0 AS-HI | ASTVO ESTBK | MAINT $LING $$$$$ || TOTL+
======================================================================         =
SUN 0hh00 |                   |             |                   || 03h00
MON 00h17 | 01h08             | 00h40       |             00h13 || 03h36
TUE 00h16 | 02h25 02h36       |             |                   || 05h20
WED 01h18 | 01h01             |             | 00h20             || 06h19
THU 00h21 | 02h23 00h39       |             |             00h20 || 07h30
FRI 00h22 |       00h10       |             | 02h09             || 06h55
SAT 00h24 |       00h35       |             | 02h02             || 07h48
=======================================================================
TOT 02h58 | 06h57 04h00 00h00 | 00h40 00h00 | 04h31 00h00 00h33 || 40h28

I find it useful to fill in each day's row early in the morning of the following day, and at the end of the week to run a small Perl script which adds up the seven day rows to produce a weekly-totals row. The script is a little too long to reproduce here. You can get it, if you're interested, from the "Technical" section of my site,
http://www.metascientia.com.

(b) Whereas I start a new log of days every week, I keep a single big log of weeks. That file can be thought of as a sequential autobiography. Here's an extract:

2003A_WK07 (2003-02-16 TO 2003-02-22)

    CHURC | MATPH AS-L0 AS-HI | ASTVO ESTBK | MAINT $LING $$$$$ || TOTL+
    03h41 | 00h00 02h20 00h00 | 00h00 00h00 | 00h04 04h18 19h04 || 41h46
======================================================================         
2003A_WK08 (2003-02-23 TO 2003-03-01)

    CHURC | MATPH AS-L0 AS-HI | ASTVO ESTBK | MAINT $LING $$$$$ || TOTL+
    02h03 | 07h54 09h06 00h31 | 00h00 00h00 | 01h23 01h08 15h32 || 40h03
====================================================================== 
2003A_WK09 (2003-03-02 TO 2003-03-08)

    CHURC | MATPH AS-L0 AS-HI | ASTVO ESTBK | MAINT $LING $$$$$ || TOTL+
    02h58 | 06h57 04h00 00h00 | 00h40 00h00 | 04h31 00h00 00h33 || 40h28

(c) To help me monitor my efforts on many projects over many years, I keep another large single file - a log of projects. In my formalism, every project is associated with a unique UTC timestamp, in most cases denoting the instant that work on the project started. If, for instance, the project is commercial, then the magic instant is liable to be the instant at which my client started exploring, whether by phone or by e-mail, the possibility of my undertaking that particular project. For each project, I track the year, month, and day of fresh activity, indicating for each day the amount of time invested, the cumulative time investment, and the nature of the work done on that day. Here are excerpts, with some xxxxxxxx overwriting to maintain confidentiality, for two simple projects (a piece of ultimately unsuccessful journalism on Iraq, and a tiny reading project on the rudiments on radio astronomy):

WRITING__20030226T230203Z____iraq_meeting
20030226: 03h30 -> 0003h30  # attended mtg, did rough notes at xxxx
20030227: 04h37 -> 0008h07  # wrote, polished; submitted to xxxxxxxx        
STUDY____20030228T223000Z____radio_astronomy
20030228: 00h31 -> 0000h31  # started Rohlfs-Wilson
20030301: 04h05 -> 0004h36  # read {f.graham.smith} Pelican 
20030306: 00h39 -> 0005h15  # ditto
20030310: 00h02 -> 0005h17  # made bare beginning on French book

As I say, the formalism is lean and mean. No software tools are needed, apart from the Perl script which generates weekly totals. I'd be willing to bet that my formalism makes timelogging as efficient as elaborate software does, since what I lose in sophistication I gain in ease of maintenance.

It goes almost (but not quite) without saying that I have fancy aliases in my ever-so-frequently revised .bashrc script, to pull up any one of these logs on an xterm in a split second. So, for instance, to revise the project-by-project log that tracks invested time on a project-by-project basis, I type just the three letters inv. That's a three-letter alias for an invocation of vi on a file rather deeply buried in my byzantine workstation. My .bashrc implements the alias with a single line, which I break up into several lines here for readability:

alias inv='vi /home/verbum/ANNN____maintenance/ RNNN____journals_etc/QNNN____diaries/ ZNNN____multiyear_analyses_etc/invested_time.txt'

Byzantine? "Anal-retentive" might be an apter characterization of my system for the rational nesting of directories. (Everything, said Einstein, is to be made as simple as possible, and no simpler.) But rational directory nesting involves management of something like space, rather than of time, and is therefore a suitable topic for another essay.