Binary files: Exif image info
Images taken by digital cameras are usually stored in JPEG format,
but besides the main image also contain a thumbnail and various information
about how the image was taken, in a format called Exif (Exchangeable image file format).
Write a program that takes a JPEG file named on the command line, and if
it has an Exif segment, prints out the date the photo was taken and
writes the thumbnail to a file named name_thumb.jpg, where
name is the original name without extension.
A JPEG file starts with bytes 0xFF 0xD8, and then contains a variable
number of segments. Each segment starts with a two-byte marker:
0xFF and a second byte identifying the segment type. Exif information
is stored in the first segment which must have marker 0xFF 0xE1 (meaning
application data APP1).
All 2-byte and 4-byte integer values mentioned in the following are unsigned
values stored in binary, in little-endian byte order.
The APP1 segment has the following structure:
- an unsigned 2-byte segment size
- the Exif header, 6 bytes with the string "Exif" and two null bytes \0 \0
- 8 bytes for a TIFF header, which we ignore
- several fragments called IFD (Image File Directory), described below.
An Image File Directory has the following structure:
- a 2-byte number of directory entries
- the entries themselves, 12 bytes each. The first two bytes of each entry
are a tag identifying the entry type. The last 4 bytes are, for the entries
of interest to us, either an unsigned data value, or the offset of the data
pointed to by the entry
- a 4-byte offset of the next IFD
- a data area pointed to by the entries
All offsets are measured from the start of the TIFF header, which is 12
bytes from the start of the file (2 x 2 bytes marker, 2 bytes size, 6
bytes Exif header). Thus, adding 12 to the offset gives the position
of the data from file start.
- In the first IFD (called IFD0) we look for a directory entry
with 2-byte tag 0x8769. The 4-byte offset of this entry points to the
Exif IFD.
- Also from the first IFD, we extract the 4-byte offset to the next IFD,
called IFD1, which holds the thumbnail image, as described below.
In the Exif IFD, we look for a directory entry with tag 0x9003. This is
the tag for the date and time when the original image was taken. The
4-byte offset in this entry points to a 20-byte null-terminated string
with the data in YYYY:MM:DD HH:MM:SS format, which the program should print.
In IFD1, we look for two directory entries: one entry with tag 0x201,
whose 4-byte offset points to the JPEG thumbnail data; and another entry
with tag 0x202, whose 4-byte value represents the thumbnail image size.
Thus, we read data from the given offset, of the indicated amount, and
write it to the thumbnail file.
For more information see this page at MIT, some other explanations with figures or the standard.
Marius Minea
Last modified: Fri Mar 21 4:45:00 EET 2013