Computer programming - Lab 11

1. Write a simple version of the Unix strings program that prints all strings from a binary file that contain at least 4 consecutive printable characters except space (as checked by isgraph()).

2. Write a simple version of the Unix grep program that prints all lines from a file (second command line argument) that contain a given string (first commandline argument). Keep the common case simple but try to avoid size limitations. Variant: consider only occurrences of the string as a separate word (like grep -w).

3. Write a simple version of the Unix cut program that accepts a commandline argument of the form M-N with M, N nonnegative integers and prints out fields M through N from each line of a file (also named on the commandline). Consider comma as separator between fields.

4. Write a simple version of the Unix split program that splits a file named on the commandline into equally sized chunks (with the size also given on the commandline). The last chunk should contain the remainder. Output files are named xaa, xab, ..., xba, xbb, ... . Report an error if names are not enough. (Alternate: name the output files part1.bin, part2.bin, etc.).

5. Write a program that receives on the commandline an argument of the form -Dstr1=str2 and a file name and creates a processed version of the file, where each occurrence of string str1 is replaced by str2. (This is how the cpp preprocessor does replacements in addition to the ones specified with #define). The name of the output file is the name of the input file with .pp appended.

6. A BMP file contains in its header (54 bytes) several informations about image and file size, all as 4-byte little-endian integers:

Each line of pixels in the image occupies space rounded up to a multiple of 4 bytes.
Verify that all these data are consistent, and report any inconsistencies. You can find out the actual file size by seeking to its end, and then getting the current position.

7. Write a program that processes a bitmap file named on the command line and writes a cropped file (name given as second argument) that contains only the left half of the image.

8. Write a program that processes a .jpg file named on the command line and identifies its parts. A JPEG file starts with bytes 0xFF 0xD8. These are followed by a variable number of segments. Each segments starts with a two-byte marker: 0xFF and the second byte for the segment type. Next, there is the segment length, a two-byte integer, stored in big-endian format. The length includes the two bytes for the length, but not the two bytes for the marker.
Your program should print out for each segment the byte for the segment type (in hex) and the segment length. Stop at either the end of the image (marker 0xD9) or the start of the image stream (marker 0xDA).

9. The .zip file format is defined here. We will discuss a simplified case: an archive first contains all files: a sequence of segments, for each first the local file header (sec. 4.3.7) followed by the actual file data. Then there is a sequence of directory entries describing (again) each file (sec. 4.3.12) and finally an end of central directory record (sec. 4.3.16).
a) Write a program that tries to identify the end of central directory record (sec. 4.3.16). Assuming the zip file has no comments, the record should occupy the last 22 bytes of the file, starting with the signature, and the last two bytes (comment length) should be null. Determine the starting offset of the central directory and check that it has indeed the right signature (sec. 4.3.12).
b) Write a program that traverses this structure and checks that it is consistent. You will identify each block by its 4-byte signature. For file header and directory entries, read the lengths of the variable-size parts (file name, compressed size, extra field, comment) to add up the total length of each segment. Check that the number of file headers is the same as the number of directory entries. Add the sizes of all directory entries and check that it matches the value given in the end of central directory record.
Optional. Try to do more. For instance, remove a file from an archive. You will have to a) delete both the segment with file header and data; b) delete the directory entry; c) update the offsets in the other directory entries (unless it's the last file, try that first) d) update the end-of-directory structure (number of entries and size of directory)


Marius Minea
Last modified: Thu Dec 10 0:20:00 EET 2015