
Finding file creation times, fixing a C shell script,
and a public-domain program for converting numbers to
words.
By Ray Swartz
This month Joe Walker requests a way to track when a file was
created. Because the Unix file system doesn't store this time,
I've provided a script that stores the file's birth date in a
simple database. Jase Wells asks for help with a C shell script
that doesn't process all the desired files. The problem is how
the command-line arguments are referenced. In our feedback item,
Tilman Schmidt identifies a public-domain program that implements
a simple translator of digits to words in various languages, an
extension of the simple digit-to-English-word script that I
presented in my August 1993 column and have reproduced here in Listing 1.
So, What's New?
Question: Is there some way to find out when a file
was created?
Joe Walker / Carson City, Nev.
Answer: The Unix file system maintains three times
for each file stored on the system: the last time the file was
modified (written to), accessed (read from), or its inode entry
was last changed (through various ways).
Use the ls command
with the appropriate option to note these times:
ls -l lists the file's modification time,
ls -lu its access time, and ls -lc the
inode change time.
When the file is first created, all three times are set to its
creation time. However, as soon as a process writes to the file,
the modification and inode change times are updated. Similarly,
the access time no longer reflects the creation time after the
file is read.
Because the file system doesn't record when the file was
created, there is no Unix command that will tell you when a file
first appeared in the file system. As a result, you will have to
create your own data file and program to provide this
information.
One solution, named
cmpfiles, was presented in the May 1993
``Wizard's Grabbag'' column. This program is a large script with
many options that maintains a database of file-modification
times. To create its data, cmpfiles runs
ls - l on all the files located by the find command. This approach
requires a good deal of system resources.
Also, cmpfiles searches the entire file system.
Today, many systems have directories mounted over the network. My solution,
find.new.files [see Listing
2A], only searches for files actually stored on this machine
using find's fstype 4.2 option. The 4.2 argument works on my Sun
Microsystems Inc.'s workstation and should work for other BSD-
based systems. Your system may require other options that tell
find to search files actually located on your local
disk.
The task is much simpler if you are willing to settle for
knowing the day the file's name first appeared in the file
system. This solution can be implemented with a shell script
that is run daily by cron. [See Listing 2B for a sample entry.]
The 20-line find.new.files script locates new
files by comparing entries in a file
(/usr/local/data/file.list) with the output of the
find command. Existing files will be in both
listings. New files will be listed by find, but
won't be in /usr/local/data/file.list. Deleted files
appear in /usr/local/data/file.list but not in
find's output.
The find.new.files script employs a common
shell-programming trick. It creates a list, sorts it, and then
uses uniq to find
the duplicates (files still on the system) and the unique lines
(files deleted or created today). To distinguish between deleted
and created files, find.new.files uses a data flag
of 99/99/99. Any file with this marker is a new
file.
After setting some variables, find.new.files
checks whether its data file exists. If not, it creates one
(lines 6-8). If its data file exists, find.new.files
makes a copy of the data file for archival purposes (line 10).
The find command (line 11) is run to locate all files
stored on this machine. These file names are piped to awk for formatting
(line 12). This output needs to be combined with the current data
file (line 13) so that new and deleted files can be found.
The parentheses surrounding lines 11-13 tell the shell to run
these commands in a subshell. The effect is to redirect the
output of both the pipeline and the cat command into the
pipeline.
The sort on line 14
orders the files by path name. (The character inside the single
quotes is a tab.) The output of sort will be two
entries for an existing file and only one entry for new or
deleted files.
The uniq command is used to count the number of
entries (line 15). The -c -3 options tell
uniq to place the number of entries in front of each
line and to skip the three words in front of the path name.
The uniq utility reports one of three things:
Lines beginning with a ``2'' are existing files that need to be
put into the data file along with their previous creation time.
Lines beginning with ``1'' but containing ``99/99/99'' are new
files.
Lines beginning with ``1'' without ``99/99/99'' are files that
have been deleted. The egrep in line 16
searches this output for existing files and new files, throwing
away those no longer on the file system. (Note that there is a
tab inside the single quotes at the end of the first
argument.)
The sed command
in line 17 removes the counts inserted by uniq and
exchanges today's date for the new file marker (99/99/99). When
the data file /usr/local/data/file.list is created
for the first time, all the files currently on the system will be
listed as being created on that day.
The find.new.files program is designed to be run
once a day. The best method is to have cron run it
when file system is usage is minimal, say in the middle of the
night. Be careful with find.new.files because it
searches the entire directory hierarchy so it may consume a good
deal of your system's capacity when it executes.
Sending Him for a Loop
Question: I've written a simple shell script (called
g2j) that automates the process of converting files
from GIF
format to JPEG format,
calling on the cjpeg program. I call it with the
command line: g2j *.gif [see Listing 3].
|