-
Notifications
You must be signed in to change notification settings - Fork 5
/
bf.sed
executable file
·105 lines (89 loc) · 4.42 KB
/
bf.sed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
#!/bin/sed -f
########################################################
## bf.sed ##
## ##
## an optimising compiler for brainfuck ##
## produces x86 Linux ELF binaries ##
## written entirely in sed ##
## ##
## usage: bf.sed file.b > prog; chmod +x prog; ./prog ##
########################################################
# slurp in the entire file
:a;$!N;$!ba;
# remove comments (all characters other than []<>+-.,)
s/[^][,.<>+-]//g;
# add a dodgy ELF header and some startup/initialisation code
# see http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
# since sed inserts newlines whenever there's a linebreak in the s///
# command, the program start address is moved to 0x080a8000, which
# contains a newline character (0x0a).
s/^/\x7fELF\x01\x01\x01zzzzzzzzz\x02z\x03z\x01zzz\x54\x80\
\x08\x34zzzzzzzzzzz\x34z\x20z\x01zzzzzzz\x01zzzzzzzz\x80\n\x08z\x80\
\x08zz\x01zzz\x01z\x05zzzz\x10zz\xfc\x31\xdb\x31\xd2\x42\xb9\x10\x27zz\x04\
\x29\xcc\x89\xe7\x31\xc0\xf3\xaa\x89\xe1/;
# call exit() at the end of the program. use a direct syscall, it's
# easier than finding libc.
s/$/\xb0\x01\xb3z\xcd\x80/
# the registers used are:
# ecx - current cell
# edx - enable bit (see below)
# eax - scratch
# the memory for the cells is allocated on the stack by the startup code
# above. sed can't count, so it's difficult to get [ and ] (looping)
# right. we can't assemble jumps to the end of the loop, so instead of
# jumping over the loop we set edx to 0 and run the loop's code. all of
# the instructions used do nothing if edx is zero, so this gets the same
# effect as a jump over the loop.
# optimisation: [-] (loop until current cell is zero) is optimised to a
# "mov byte [ecx], 0" instruction
s/\[-\]/\xc6\x01z/g;
# +/- come out to add/sub byte [ecx], dl so they inc or dec when edx = 1,
# and are no-ops when edx = 0
s/[+-]/&\x11/g;
# optimisation: handle <+>, <->, >+< and >-< specially. these sequences
# increment or decrement the next or previous cell, and leave the pointer
# where it is. it's faster to use an addressing mode for this than to move
# the pointer back and forth. e.g. <+> is compiled to "add [ecx - 1], dl".
s/<\([+-]\)\x11>/\1\x51\xff/g;
s/>\([+-]\)\x11</\1\x51\x01/g;
# </> come out as add/sub ecx, edx
s/[<>]/&\xd1/g;
# we optimise sequences of 5 +, -, < or > to perform a single add/sub of
# edx*5. why edx*5? because multiplication by 5 can be done in an
# addressing mode (lea eax, [edx + edx * 4]).
s/\(\([<>]\)\xd1\)\1\1\1\1/5\2\xc1/g;
s/\(\([+-]\)\x11\)\1\1\1\1/5\2\x01/g;
s/5/\x8d\x04\x92/g;
# i/o: , and . invoke the read and write syscalls respectively. due to our
# not-entirely-random choice of registers, most of the arguments are
# already in the right locations. in particular, ecx is the buffer to read
# or write (i.e. points to the current cell), and edx is the length of the
# buffer (so if edx = 0, the system call's a no-op)
s/[,.]/\xb3&\x8d\x43\x03\xcd\x80/g;
# looping. [ saves the current edx, pushes its address to the stack, and
# sets edx to 0 if the current cell is zero. this will run until it hits
# the corresponding ], which will either jump to the pushed address or
# restore the old edx and continue, depending on whether the current cell
# is zero.
s/\[/\x52\x38\x31\x0f\x95\xc2\xe8zzzz/g;
s/\]/\x38\x31\x74\x03\xff\x24\x24\x5a\x5a/g;
# the above was done treating +/-, </> and ,/. identically, by just
# setting up addressing modes and parameters. the next line sets up the
# opcodes to distinguish between add/sub and read/write.
y/+-<>,./z()\x01z\x01/;
# a bug in gnu sed means y/// can't handle ascii NUL, so we use s///
s/z/\x00/g;
# the ELF header at the start must include the length of the
# program. there's no arithmetic in sed, so we can't report this
# accurately. instead, we append 64k of garbage to the program and mark
# its length as 64k. as long as the program's actual size is below 64k,
# the system ignores any data past the end (and it never gets run, because
# exit() is called first). this won't work if you write a brainfuck
# program longer than 64k, but if you do that you're even more nuts than
# me.
p
z
s/.*/aaaaaaaaaaaaaaaa/;
s/.*/&&&&&&&&&&&&&&&&/;
s/.*/&&&&&&&&&&&&&&&&/;
s/.*/&&&&&&&&&&&&&&&&/;