今天花了半个小时用BrainFuck语言编写了一个只有一行字的网页(同样的事情如果用C语言大概需要花5分钟,用PHP只需要20秒钟),大概没人比我更无聊了吧。地址是http://qing.su/cgi-bin/brainfuck.cgi
BrainFuck是世界上最精致的图灵完备的计算机语言(其编译器仅有240bytes)。它由八个字符构成:<>+-.,[]分别代表了左右位移、增减变量、输出输入以及循环开闭。如此有限的字符库决定了其编写过程的繁琐和冗长、易读性极差,几乎无法成为真正生产使用的计算机语言。或许,偶尔编写一个BrainFuck程序烧一烧脑子是不错的选择。下面介绍一下用BrainFuck语言编写网页的方式。
开始编写网页之前,需要了解一下CGI编程规范。任何一种语言编写的程序都可以成为网页。如PHP, JSP之类的程序可以通过对应的脚本解释器转换为HTML标签格式,直接呈现在浏览器上供人们访问。而如果使用其他非主流语言,比如之前提到的C语言(参考 http://qing.su/article/93.html)或者正在使用的BrainFuck语言,则可以通过CGI的方式访问,让服务器将程序转化为HTML标签提供给客户端浏览器识别。按照CGI的要求,输出到浏览器上的程序需要首先提交头信息,比如,Content-type: text/html, 并且在头信息下部有一空行。因此,只要遵循这一规范,我们就可以用任何语言的程序编写网页。
首先,编写一个BrainFuck语言程序,如下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ++++++++++ * 变量第零位+10, 储存循环次数 [>+++++++++++ * 变量第一位+11, 10*11=110 == asc('n') >++++++++++++>+++>++++++>+++++++>++++++++>+>++++ * 类似上一步,设置8个变量方便输出字符 <<<<<<<<-] * 循环,每次循环第一位变量-1, 直至0 >>>>>---. * 第五位变量-3, 输出10*7-3=67 == asc('C') <<<<+. * 第一位变量+1, 输出10*11+1=111 == asc('o') -. * 第一位变量-1, 110 'n' >----. * 第二位变量-4, 10*12-4=116 == asc('t') <---------. * 第一位变量-9, 101 'e' +++++++++.>.>>>>>>+++++. <<<<<<.+++++.<++.-----------. >>>--.<++.<-----.<.>++++.----. >>>>>>++.<<<<<<<+++.>.<+++++.-. >>>>>>..<<+++++.--<+++++++.>>.. +++++++++.<<<.>>++++++++.--<++++.++++++++++++++++++. ------------------>>--<<<. >>>++.<<.----.>++++++.<<+. * 继续之前的输出 |
这个程序做了两件事:1,向服务器输出Content-type: text/html\n\n. 2,向服务器输出需要显示在屏幕上的句子,HAPPY NEW YEAR!
编写完毕后,我们在服务器上将其编译为可执行程序。编译器为汇编源码(链接为:http://www.muppetlabs.com/~breadbox/software/tiny/bf.asm.txt),可以用nasm程序将其编译成可执行程序。新建文件bf.asm将源码保存在其中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 | ;; bf.asm: Copyright (C) 1999 Brian Raiter <breadbox@muppetlabs.com> ;; Licensed under the terms of the GNU General Public License, either ;; version 2 or (at your option) any later version. ;; ;; To build: ;; nasm -f bin -o bf bf.asm && chmod +x bf ;; To use: ;; bf < foo.b > foo && chmod +x foo BITS 32 ;; This is the size of the data area supplied to compiled programs. %define arraysize 30000 ;; For the compiler, the text segment is also the data segment. The ;; memory image of the compiler is inside the code buffer, and is ;; modified in place to become the memory image of the compiled ;; program. The area of memory that is the data segment for compiled ;; programs is not used by the compiler. The text and data segments of ;; compiled programs are really only different areas in a single ;; segment, from the system's point of view. Both the compiler and ;; compiled programs load the entire file contents into a single ;; memory segment which is both writeable and executable. %define TEXTORG 0x45E9B000 %define DATAOFFSET 0x2000 %define DATAORG (TEXTORG + DATAOFFSET) ;; Here begins the file image. org TEXTORG ;; At the beginning of the text segment is the ELF header and the ;; program header table, the latter consisting of a single entry. The ;; two structures overlap for a space of eight bytes. Nearly all ;; unused fields in the structures are used to hold bits of code. ;; The beginning of the ELF header. db 0x7F, "ELF" ; ehdr.e_ident ;; The top(s) of the main compiling loop. The loop jumps back to ;; different positions, depending on how many bytes to copy into the ;; code buffer. After doing that, esi is initialized to point to the ;; epilog code chunk, a copy of edi (the pointer to the end of the ;; code buffer) is saved in ebp, the high bytes of eax are reset to ;; zero (via the exchange with ebx), and then the next character of ;; input is retrieved. emitputchar: add esi, byte (putchar - decchar) - 4 emitgetchar: lodsd emit6bytes: movsd emit2bytes: movsb emit1byte: movsb compile: lea esi, [byte ecx + epilog - filesize] xchg eax, ebx cmp eax, 0x00030002 ; ehdr.e_type (0x0002) ; ehdr.e_machine (0x0003) mov ebp, edi ; ehdr.e_version jmp short getchar ;; The entry point for the compiler (and compiled programs), and the ;; location of the program header table. dd _start ; ehdr.e_entry dd proghdr - $$ ; ehdr.e_phoff ;; The last routine of the compiler, called when there is no more ;; input. The epilog code chunk is copied into the code buffer. The ;; text origin is popped off the stack into ecx, and subtracted from ;; edi to determine the size of the compiled program. This value is ;; stored in the program header table, and then is moved into edx. ;; The program then jumps to the putchar routine, which sends the ;; compiled program to stdout before falling through to the epilog ;; routine and exiting. eof: movsd ; ehdr.e_shoff xchg eax, ecx pop ecx sub edi, ecx ; ehdr.e_flags xchg eax, edi stosd xchg eax, edx jmp short putchar ; ehdr.e_ehsize ;; 0x20 == the size of one program header table entry. dw 0x20 ; ehdr.e_phentsize ;; The beginning of the program header table. 1 == PT_LOAD, indicating ;; that the segment is to be loaded into memory. proghdr: dd 1 ; ehdr.e_phnum & phdr.p_type ; ehdr.e_shentsize dd 0 ; ehdr.e_shnum & phdr.p_offset ; ehdr.e_shstrndx ;; (Note that the next four bytes, in addition to containing the first ;; two instructions of the bracket routine, also comprise the memory ;; address of the text origin.) db 0 ; phdr.p_vaddr ;; The bracket routine emits code for the "[" instruction. This ;; instruction translates to a simple "jmp near", but the target of ;; the jump will not be known until the matching "]" is seen. The ;; routine thus outputs a random target, and pushes the location of ;; the target in the code buffer onto the stack. bracket: mov al, 0xE9 inc ebp push ebp ; phdr.p_paddr stosd jmp short emit1byte ;; This is where the size of the executable file is stored in the ;; program header table. The compiler updates this value just before ;; it outputs the compiled program. This is the only field in the two ;; headers that differs between the compiler and its compiled ;; programs. (While the compiler is reading input, the first byte of ;; this field is also used as an input buffer.) filesize: dd compilersize ; phdr.p_filesz ;; The size of the program in memory. This entry creates an area of ;; bytes, arraysize in size, all initialized to zero, starting at ;; DATAORG. dd DATAOFFSET + arraysize ; phdr.p_memsz ;; The code chunk for the "." instruction. eax is set to 4 to invoke ;; the write system call. ebx, the file handle to write to, is set to ;; 1 for stdout. ecx points to the buffer containing the bytes to ;; output, and edx equals the number of bytes to output. (Note that ;; the first byte of the first instruction, which is also the least ;; significant byte of the p_flags field, encodes to 0xB3. Having the ;; 2-bit set marks the memory containing the compiler, and its ;; compiled programs, as writeable.) putchar: mov bl, 1 ; phdr.p_flags mov al, 4 int 0x80 ; phdr.p_align ;; The epilog code chunk. After restoring the initialized registers, ;; eax and ebx are both zero. eax is incremented to 1, so as to invoke ;; the exit system call. ebx specifies the process's return value. epilog: popa inc eax int 0x80 ;; The code chunks for the ">", "<", "+", and "-" instructions. incptr: inc ecx decptr: dec ecx incchar: inc byte [ecx] decchar: dec byte [ecx] ;; The main loop of the compiler continues here, by obtaining the next ;; character of input. This is also the code chunk for the "," ;; instruction. eax is set to 3 to invoke the read system call. ebx, ;; the file handle to read from, is set to 0 for stdin. ecx points to ;; a buffer to receive the bytes that are read, and edx equals the ;; number of bytes to read. getchar: mov al, 3 xor ebx, ebx int 0x80 ;; If eax is zero or negative, then there is no more input, and the ;; compiler proceeds to the eof routine. or eax, eax jle eof ;; Otherwise, esi is advanced four bytes (from the epilog code chunk ;; to the incptr code chunk), and the character read from the input is ;; stored in al, with the high bytes of eax reset to zero. lodsd mov eax, [ecx] ;; The compiler compares the input character with ">" and "<". esi is ;; advanced to the next code chunk with each failed test. cmp al, '>' jz emit1byte inc esi cmp al, '<' jz emit1byte inc esi ;; The next four tests check for the characters "+", ",", "-", and ;; ".", respectively. These four characters are contiguous in ASCII, ;; and so are tested for by doing successive decrements of eax. sub al, '+' jz emit2bytes dec eax jz emitgetchar inc esi inc esi dec eax jz emit2bytes dec eax jz emitputchar ;; The remaining instructions, "[" and "]", have special routines for ;; emitting the proper code. (Note that the jump back to the main loop ;; is at the edge of the short-jump range. Routines below here ;; therefore use this jump as a relay to return to the main loop; ;; however, in order to use it correctly, the routines must be sure ;; that the zero flag is cleared at the time.) cmp al, '[' - '.' jz bracket cmp al, ']' - '.' relay: jnz compile ;; The endbracket routine emits code for the "]" instruction, as well ;; as completing the code for the matching "[". The compiler first ;; emits "cmp dh, [ecx]" and the first two bytes of a "jnz near". The ;; location of the missing target in the code for the "[" instruction ;; is then retrieved from the stack, the correct target value is ;; computed and stored, and then the current instruction's jmp target ;; is computed and emitted. endbracket: mov eax, 0x850F313A stosd lea esi, [byte edi - 8] pop eax sub esi, eax mov [eax], esi sub eax, edi stosd jmp short relay ;; This is the entry point, for both the compiler and its compiled ;; programs. The shared initialization code sets ecx to the beginning ;; of the array that is the compiled program's data area, and edx to ;; one. (This also clears the zero flag for the relay jump below.) The ;; registers are then saved on the stack, to be restored at the end. _start: mov ecx, DATAORG inc edx pusha ;; At this point, the compiler and its compiled programs diverge. ;; Although every compiled program includes all the code in this file ;; above this point, only the three instructions directly above are ;; actually used by both. This point is where the compiler begins ;; storing the generated code, so only the compiler sees the ;; instructions below. This routine first modifies ecx to contain ;; TEXTORG, which is stored on the stack, and then offsets it to point ;; to filesize. edi is set equal to codebuf, and then the compiler ;; enters the main loop. codebuf: mov ch, (TEXTORG >> 8) & 0xFF push ecx mov cl, filesize - $$ lea edi, [byte ecx + codebuf - filesize] jmp short relay ;; Here ends the file image. compilersize equ $ - $$ |
执行:
1 2 3 | yum install nasm -y nasm -f bin -o bf_compiler bf.asm chmod +x ./bf_compiler |
将上面的BrainFuck程序保存在brainfuck.bf文件,在SSH中执行:
1 2 3 | ./bf_compiler < brainfuck.bf > brainfuck.cgi chmod +x ./brainfuck.cgi ./brainfuck.cgi |
如果这时能够看到我们之前说的那两行输出,说明网页编写成功。然后,将这个文件复制到cgi-bin下面,通过浏览器就可以访问了。如果出现HTTP 500错误,请查看Apache日志。
毕竟是一个比较麻烦的事情,我就不再继续用BrainFuck做更多功能的网页了。大家有什么问题可以在下面留言问我。
本文作者为香菇肥牛(http://qing.su/article/119.html),转载请注明原文链接,谢谢。
叼叼叼
233333
666
看这些代码头晕
所以我也只能写一句话,哈哈
前来赞一个
谢谢~~
博主有在用RSS阅读器吗?
没有呢亲
并没有
一大堆加号大于小于号,我服你了。还要编译。用我大php,只需要echo。哈哈
哈哈,偶尔体会一次折腾的快感
会玩,佩服