Obfuscated C Tools Anthony Howe Mortice Kern Systems, Inc. ABSTRACT "Everything should be made as simple as possible, but not simpler." [Albert Einstein] "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction." [E.F. Schumacher] Writing for the International Obfuscated C Code Contest is an educational exercise in programming style, data structures, algorithms, and portability. Three time winner of the IOCCC's Best Utility Award, Howe discusses the contest rules, peep-hole obfuscation, and a review of the concepts used for his winning editor, program maintence utility, and regular expression text search utility. ABOUT THE SPEAKER Anthony Howe is a U. of Waterloo graduate. He has been working with Mortice Kern Systems for the past five years doing standards based work for MKS's InterOpen Source Code product, in particular Curses and Vi. Copyright 1995 by Anthony Howe. All rights reserved. No warranty. Portions Copyright (c) Landon Curt Noll & Larry Bassel, 1994. All Rights Reserved. Permission for personal, education or non-profit use is granted provided this this copyright and notice are included in its entirety and remains unaltered. All other uses must receive prior permission in writing from both Landon Curt Noll and Larry Bassel. GOALS OF THE CONTEST o To write the most Obscure/Obfuscated C program under the rules below. o To show the importance of programming style, in an ironic way. o To illustrate some of the subtleties of the C language. o To stress C compilers with unusual code. o To provide a safe forum for poor C code. THE RULES 1. Your entry must be a complete program. 2. Your entry must be of original work and in the public domain. 3. The gross-size of your entry must be less than or equal to 3217 bytes. The net-size must be less than or equal to 1536 bytes after excluding all whitespace, and excluding all semi- colon or braces followed by either whitespace or end-of-file. 4. Each person may submit at most eight entries per contest year. 5. Entries requiring human interaction to be built are not allowed (spinelli.c 1988). An entry must produce a file (or files) which may be executed. 6. Programs that require special privileges such as setuid, setgid, super-user, special owner or group are not allowed. 7. Legal abuse of flaws in the rules is encouraged, except for the entry submittal format. An entry that, in the opinion of the judges, violates the rules will be disqualified. Entries that attempt to abuse the rules must try to justify why the rule abuse is legal in the Remark section. THE JUDGING PROCESS 1. Entries are judged by Larry Bassel and Landon Curt Noll. 2. Entries are unpacked and the Judges endeavor to kept the entries anonymous until the judging is complete and awards assigned. 3. There are several elimination rounds. Each entry receives at least two readings before being eliminated. A reading consists of : - Reading the Entry section. - Reading the uudecoded Build, Program, and Info sections. - C pre-processing the source, skipping include files. - "Beautify" the source and pass the result through the C pre-processor, skipping include files, checking to see if it still remains obfuscated. In later rounds : - The source is linted for portability and possible compiler problems. (The cleaner, the better.) - The source is compiled, linked, and executed. - Miscellaneous tests are performed on the source and executable, such as does it work, does it give the correct answers, does it handle edge conditions, etc. Entries are judged on individual merit until the stack of entries is reduced to about 25. Award categories begin to form and entries will compete with each other for awards, often in several categories. THE JUDGING PROCESS (continued) Award categories vary depending on the entries received. Traditional award categories have been : - Best game. - Best utility. - Best X client. - Best one liner. - Best small program. - Worst abuse of the rules. - Strangest or most creative source layout. - Best over all. The Judges welcome suggestions for new and/or amusing categories. Sometimes the Judges will invent a completely new award for unusual winners that don't fit under any one category. 4. The prize for winning is solely fame and flames. Losers remain anonymous and all go to the bit bucket, with all related paper shredded. Losing entries can always be submitted again in subsequent years. By tradition the number of entries received is never disclosed, only the amount of double sized two-up paper shredded. Once winners are announced at Usenix, the winning authors are sent e-mail. The authors have a chance to comment on distribution and presentation of their winning programs, correct any mistakes and/or bugs before the winners are posted to the world. THE JUDGES' LIKE o Assorted levels and types of obfuscation. Use localised obfuscation for "compression", the data structures and algorithms should provide the core obfuscation. o The program must compile using an ANSI C compiler. K&R C can be used provided that the source does not cause compile or link errors for ANSI C compilers. Use when functions require variable arguments. Declare all local and global variables. Don't rely on K&R implicit type declaration. The leading hash (#) for pre-processor directives must be the first character on the line. o Programs should try to be instructional. Supply any references that might be of interest to other people. Write clear and useful documentation especially for utilities and games. There is no restriction on the length of documentation, so use it to your advantage. o Quality portable programs. The more people who can compile these programs with the least amount of fuss, the better. Test your program on as many platforms and compilers as possible, like BSD Unix and System V Unix. If at all possible also compile for PC boxes. All this will help shake down your code improving both portability and quality. If at all possible, write a test suite. This will help come the 11th hour debugging. ACCEPTABLE USE o ANSI terminal sequences that are NOT terminal specific and work with a standard Xterm (see tromp.c 1989). o Curses libraries are considered generally available on all systems, but remember to use a common subset available on both BSD and System V systems. o X clients can use Xlib and the Athena widget set (libX11.a, libXaw.a, libXmu.a, and libXt.a). Avoid dependencies on items in .Xdefaults. THE JUDGES' DISLIKES o Don't violate the entry submittal format. o Don't make your entry hardware or OS specific. o Don't use masses of #define or -D compiler options. Use #define sparingly and use -D for configuration. o Don't encrypt the documentation in any manner. The obfuscation should be in the program, NOT the documentation. o Avoid using writable strings. char *T = "So many primes, so little times!"; ... T[14] = ';'; o Avoid arranging your C source as a compact blob of characters, unless really pressed for space. The size rule was changed to be more accommodating and permit programs to be formatted at least like a typical C program, though more creative layouts are encouraged. WHERE TO BEGIN o Get an idea. Read technical magazines, papers, and books. Review past winners, examine the rules for loop-holes. Some suggestions : - cpp - bit-grep - mini rogue - mini zoo or lharc o Write the program _normally_. Use LOTS of comments to document your data structure, tricks, and algorithm. These will be for your benefit and any one else who you give the "solution" to. o Make sure your program works! Document known problems in the code. o Use a program like "ioccc.c" to apply the net-size rule and strip comments. o Use Awk, Sed, or Ex/Vi to apply a set of transformations on the comment stripped source to "compress" identifiers and common expressions (see ag.so). Keep total number of global variables, functions, and #define's below 53 so that single letter identifiers (_, A-Z, a-z) can be used. Keep local variables to a minimum or replace with globals. Always use the same local variable names in each function defined and attempt to maintain their type. Simplifies script replacement. o The rules and guidelines are updated each year. Read them before submitting an entry. Note that the rules are open to creativity interpretation much like interpreting standards or a customer specification. Typically the rules change to prevent previous abuses of the rules. PORTABILITY ISSUES o Use K&R function style, but ANSI C libraries. o Write the program _correctly_. Aim to compile with NO warnings at the default warning level of the compiler. o Include ANSI C standard headers to get the proper function declarations. Avoid using your own declariations for the standard library, because they will be non-portable. o Assume that an "int" is no bigger than a "short" and that a "short" is no more than 16 bits. o Think about 8 vs 9 bit bytes. Handle 9 bytes if at all possible. o Remember that it is compiler specific as to whether a "char" is defined signed or unsigned. Some compilers do provide a switch. o Not all machines use two's-complement integers. o Use character constants and escaped character constants. Avoid being character set dependant by not using escaped octal constants (eg. \033). STYLE ISSUES o Be consistent with style. Easier to identify common expressions and replace with a regular expression when "compressing". o Minimise the number of different data types required. For example use "int" and "char", or "long" and "char". o Avoid structures and unions. Expensive to code using member operators (. and ->). "Paired" and/or multidimensional arrays are good substitute. For example : typedef struct { int a[100], b[100]; int a, b; char c[100][10]; char c[10]; } x, y[100]; x.a a y->b *b y[i].a a[i] y[i].c[j] c[i][j] o Avoid switch-case structures, because the case and break keywords are expensive to use when the number of cases are few. o Use one loop style. The "while" loop is best because if there are enough of them you can #define it. Also the "while" keyword can also be used for the occasional "do-while" loop. o Common expressions can be replaced with a #define, which costs at least eight bytes plus the length of the phrase plus the number of occurences of the phrase. TRICKS o Use for all I/O if at all possible. - Stdio handles CR-LF vs LF issues for text data files. - Provides a useful constant, BUFSIZ, which is always a good size for declaring buffers. o always includes . o Use Boolean and integer algebraic properties to your advantage : - swap(a, b), where a and b are integers, is a ^= b, b ^= a, a ^= b. - To set least significant zero bit in N N |= N+1 - To clear least significant one bit in N N ^= N & -N However, the above relies on 2's complement, instead N ^= N & (~N + 1) TRICKS (continued) o Make good use of logical operators for if-then-else statements. - if C then T --> C && T - if not C then T --> C || E - if C then T else E --> (C && T) || E - if not C then E else T --> (C || E) && E - if A == B then T --> A - B || T Refer to a good symbolic logic text for more identies and properties that can be used. o Learn and make good use of C precedence and associativity rules to reduce unnecessary parenthesises. 1 == 2 && 3 < 4 (1 == 2) && (3 < 4) 1 + 2 << 3 * 4 (1 + 2) << (3 * 4) 1 | 2 & 3 ^ 4 1 | ((2 & 3) ^ 4) TRICKS (continued) o Loop unrolling using "Duff's Device" by Tom Duff of Bell Labs. switch (count & 7) { do { *tar++ = *src++; case 7: *tar++ = *src++; case 6: *tar++ = *src++; case 5: *tar++ = *src++; case 4: *tar++ = *src++; case 3: *tar++ = *src++; case 2: *tar++ = *src++; case 1: *tar++ = *src++; case 0: } while (0 <= (n -= 8)); } BUFFER GAP SCHEME o AE uses a "buffer gap scheme". The edit buffer is divided into two sections with a gap in between representing the cursor point. The buffer .............simple gap scheme ^ ^ ^ ^ buf gap egap ebuf Inserting characters at the front of the gap, in combination with the display manager, has the affect of push-ahead insert. The buffer uses a ......simple gap scheme ^ ^ ^ ^ buf gap egap ebuf To backspace, or delete left of the cursor, remove from the front of the gap. The buffer .............simple gap scheme ^ ^ ^ ^ buf gap egap ebuf To delete the character under the cursor remove from the end of the gap. The buffer ....................gap scheme ^ ^ ^ ^ buf gap egap ebuf To move the cursor, characters are shuffled across the gap. The buf....................fer gap scheme ^ ^ ^ ^ buf gap egap ebuf The gap is ment to be the cursor position, but to avoid shuffling characters while the cursor moves, it is easier to just move a pointer and when something serious has to be done then you move the gap to the cursor. WHERE TO FIND STUFF o Contest Opening Announcement and Winners Announcement are made in comp.lang.c comp.unix.wizards alt.sources o The Judges can be contacted by e-mail at judges@toad.com or Landon Curt Noll chongo@toad.com Larry Bassel lab@sun.com o Archives of previous winners can be obtained by anonymous ftp from : ftp.uu.net pub/ioccc o Recommended reading : Obfuscated C and Other Mysteries Don Libes, Wiley, 1993 ISBN 0-471-57805-3 o Anthony's Obfuscated Tools March 95 can be obtained via WWW from http://www.mks.com/ I can be reached by e-mail at ant@mks.com