Hi there, I am new here, so I wish to take a second to introduce myself and what I am working on, then ultimately why I am posting on this list. Also I will apologise if this is the wrong place to post this query. If this is so, perhaps you could point me in the right direction.
I have just started a PhD on reverse engineering executable code at the University of Kent in England. I arrived at this position after taking an keen interest in operating systems and programming languages during my undergraduate degree, where I eventually wrote an experimental compiler. So basically I am quite new to compiler technology, but not completely clueless. One of the first things that became apparent when reviewing literature on reverse engineering executable code, was that computer scientists consider the code generated by switch statements to be awkward to draw CFG's for. Following on from this, I have started to disassemble some simple mock up programs. I have discovered: * GCC will sometimes just use simple conditional jumps, notably with -O0. * at -O1 and greater, depending upon the size of the case table, GCC sometimes embeds a indirect jump table in the data segment of the binary and uses arithmetic to lookup the address to jump to. Next i scripted the creation of a sparse switch statement with little clusters of cases next to each other, but with nothing in-between. The exact test case was a switch statement (comparing an integer) with a (differing) printf and break statement on each arm at positions: 0-9 300-319 9000-9040 What I expected was one indirect jump table for each of the 3 clusters and some form of conditional as to which table to look in. What I got was very different. In-fact no indirect jump tables were used at-all. A series of conditional jumps were used. What I am confused about was the selection of comparisons used to construct the switch. I started to draw a tree for these findings, but obviously the tree is huge, so I only did a bit (I need to make a tool to draw these trees really): http://students.dec.bournemouth.ac.uk/ebarrett/files/switch-tree.png I would be really interested to know how GCC: * Decides whether or not to embed tables in the data segment of the binary. * Selects the comparisons in the above tree. Perhaps someone knows of a good paper/book on this topic which could be of use, or a relevant section of code in the GCC sources (at the moment I am overwhelmed by the source tree). For the record, all of my tests have switched on a variable which can not be known at compile time (argv[1]). This was to stop the optimiser being clever and removing the switch table altogether. My environment: OpenBSD-4.6-current/i386 GCC-4.2.4 Sorry for the huge email. -- Best Regards Edd Barrett http://students.dec.bournemouth.ac.uk/ebarrett