Computer programming - Lab 7

Strings

1. Ints in string Write a function that prints out all numbers (strings of digits with optional sign) from a string given as parameter. Consider only standalone numbers, i.e. numbers that do not have any non-whitespace characters adjacent to them.

2. Span of characters Implement the functions (a) strspn, (b) strcspn. (Read class notes, man page or standard for specifications).
Then, implement a function size_t pieces(const char *s, const char *allow) which counts how many separate substrings containing characters from allow are to be found in s.

Input/Output

3. The Nth field Write a program that reads from input lines that contain numbers separated by commas (and possibly also whitespace), and prints the Nth number from each line. N is a constant defined in the program. Numbers may be missing between two commas, or there may be fewer than N numbers on a line. In either of these cases, print 0.

4. Searching for words Write a program that counts the number of nonoverlapping occurrences of a word in the input.
The word is a string constant in the program (or you may run the program with a command line argument, define main as int main(int argc, char *argv[]), check if argc == 2 and use argv[1]).
Count also occurrences as substrings in other words.
Handle the case where the words in the input may be arbitrarily long.
An oft-ask question is "what does nonoverlapping mean?". Answer: The string "ababa" contains two occurrences of the string "aba": one starting at index 0, the other one at index 2. They overlap: the character 'a' at index 2 is common. Nonoverlapping strings are strings that do not overlap.

5. Words with line numbers Write a program that prints all lines from input that contain a given word. Only the part of the line starting with the word must be printed. Prefix it with the line number.
a) Consider only standalone words.
b) Count also occurrences as substrings in other words.
Handle arbitrarily long words.

6. HTML character entities In HTML, some reserved and special characters are written as &name;, for instance &amp; for &, &lt; for <, &gt; for >, &euro; for € etc.
a) Write a program that prints all entities of the form &name; (where name is formed of letters) appearing in the input, separated by one space.
b) The same, for entities of the form &#number; (with nonnegative numbers)
c) Sometimes, people forget the trailing semicolon, resulting in invalid HTML which browsers don't display well, here is an example (with interesting logic problems). Write a program that prints out the input, adding a semicolon ; after each &name where it is missing.

7. XML tags Write a program that reads input and prints out the name of any empty element tags of the form <name some other chars/> where name is assumed to be formed of alphanumeric characters.

8. Running times Write a program that reads from input a list of finishing times in a race (given in increasing order), in the form h:mm:ss, one per line. Print the winning time on a line, and for each of the other finishing times, the difference with respect to the previous time, and with respect to the winning time. For example, for the input

3:51:38
4:05:47
4:11:28
the program should print
3:51:38
+0:14:09  0:14:09
+0:05:41  0:19:50

9. Structured text The .srt format for subtitles has the following structure. Each subtitle group is formed of four parts, on separate lines

  1. the group number (increasing)
  2. start and ending times for displaying subtitles, in the format hh:mm:ss,ttt --> hh:mm:ss,ttt (with thousandths of seconds).
  3. the actual subtitles, possibly on several lines
  4. a blank line as separator from the next group.
Write a program that reads input in .srt format and in the end prints the number of subtitle groups and the total time subtitles are displayed

10. Conjunctive normal form The DIMACS format for a logic formula in conjunctive normal form contains:

  1. any number (including 0) of comment lines starting with c
  2. a line defining the problem: p cnf followed by the number V of propositions (represented as numbers from 1 to V) and the number C of clauses
  3. the clauses, each given as a whitespace-separated list of numbers from -V to V (negated or positive variables) ending with a 0. A line should not simultaneously have opposite numbers (n and -n) and thus should have at most V nonzero numbers.
a) Read from input a formula in DIMACS CNF format, store it in an array of CxV numbers and then print each line (excluding zeroes).
b) Simplify the formula, considering that we try to make each clause true, according to the following rules:
A number (of any sign) alone in a clause represents a true literal. Thus, we may simplify clauses as follows:
1) Any clause containing that number is true, and can be deleted
2) The negated number represents the negated literal, which is false, and can thus be deleted from any other clause. Rule 2 shortens clauses, and could lead to another clause with a single literal (number), thus simplifications can be done repeatedly. Print out the literals (numbers) known to be true and then the simplified clauses.


Marius Minea
Last modified: Tue Nov 7 15:40:00 EET 2017