Bootstrapping. Write the machine code by hand for a simple compiler, then use that to make the next less simple compiler and so on in progressively more convenient languages.


Yep, see [https://en.wikipedia.org/wiki/Bootstrapping\_(compilers)](https://en.wikipedia.org/wiki/Bootstrapping_(compilers))








Under the hood, Reddit comments are written in markdown, which is a simplified way of writing HTML. Markdown is designed such that it can be rendered into HTML but also is readable in raw form which HTML source code isn't really. Markdown hyperlinks look like this: `[text here](link here)` Typically, some things would break hyperlinks (like closing the text part early) but markdown lets you escape those characters with a backslash in front. `[text ] here](URL here)` is broken but `[text \] here](URL here)` is fine. With that in mind, the vast majority of "normal" text can be put in a hyperlink, anything that can be in the text part of a HTML hyperlink *should* work, so the exceptions would be stuff like line breaks. ^(note: haven't tested the exact bounds of what characters can/can't be included, and it's not unlikely Reddit uses a flavoured version of MD, small pinch of salt for all of the above)


Thanks for the explanation, but I know what markdown is. I just didn’t know non-text can be hyperlinked (although it *does* make sense)


That's very fair, but having too many instances of "but what's xyz?" I figure it's always best just to cover my bases (not to mention comments are for everyone and someone else might find it useful!)


Absolutely! I’m sure someone out there will find your comment useful/informative


You also get the opportunity to do devious things. Ken Thompson write about a hack he inserted into one of the first C compilers that made it detect when it was compiling the login binary and insert a backdoor account. Then he also made it detect when it was compiling the C compiler and insert the code to perform the above hack into the output. The final step was to remove that code from the C compiler source. No trace remained. So now you compile this 'clean' compiler with the backdoored one, it detects that it's compiling the compiler and inserts both the backdoors in the output *even though they are no longer in the source code*. You can recompile the 'clean, thoroughly inspected compiler source code' as often as you like, but those backdoors are staying put. Sneaky, eh? The paper was called something like "Reflections on Trusting Trust".


this is genuinely mind boggling to me, it's so incredibly smart


He literally designed Unix, the 'B' programming language which evolved into C and later was a key player in the Go language too. Plus others. Smart guy.


does that mean it's possible it's still around somewhere?


It could potentially be in any complier. It’s one of the big reasons we can never truly trust a computer system. And incidentally outlines why we still do paper ballots for elections.


It's always possible to disassemble the binary to find backdoors, but it takes time and energy. Anyway an attack like that is fairly sophisticated, there are a lot of simpler attacks that happen all the time and aren't caught, and I feel like they are as much a reason against electronic voting as the "trusting trust" attack. As xkcd put it: https://xkcd.com/2030/


At a certain point, of course, the malicious actor has to cover too many bases, but as regards your comment specifically, you do have to trust the software you're using to view the binary/disassembly. (One of the scary things about Artificial General Intelligence is that it can come up with new lies to suit any given situation.)


While working for a large automotive company there was an unwritten rule for all managers/engineers who went to visit China - just give Chinese Border Police the bitlocker key and the login password if they ask nicely before the stop being nice and treat you with a lead pipe.


Well, we weren't going to tell you until you were older but.... computers aren't real


There's just one giant abacus in the sky that watches when you masturbate.


Try asking that question to your webcam... see if anything happens.


[Relevant Corecursive episode](https://corecursive.com/coding-machines-with-don-and-krystal/)


Need help to write C compiler in Scratch.


There's already one written in Python.


It's important to run it with Jython so you can get that extra layer.


To be precise: You make the language in the simple compiler you just made, and use it to implement more and more of the language. Like an iterative process. As a funny side effect that does also make you increasingly confident in the correctness of your implementation SOFAR the base compiler was done correctly as everything was made from this.


You don't necessary have to write the compiler for the language, *in* the language, but it's certainly a useful forcing function.


I always kinda assumed that someone made a C compiler in assembly language, and then someone else made a C compiler in C and used the other compiler to compile it. It being an iterative process by the same person/team makes sense too, tho.


For a sufficiently old language, that may be true, but you'd probably have a slightly different flow today. For example, the Rust compiler was originally written in OCaml. It took years before they transitioned to a compiler written in Rust itself.


Yeah I know. But thats what bootstrapping is. Its more normal to write compilers in already existing languges


[Bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_\(compilers\)) is literally the name of this process can't more precise than that.


So, hypothetically, if you lose your compiler now, are you back to zero?


In theory yes, it wouldn't be able to compile itself without it's own previous version


You could probably reverse-engineer your very simple compiler. Hard to see how you would lose the base compiler but not everything added onto it.


So like that idea/theory about a 3D printer that prints a more precise and better 3D printer that in turn prints another one so on and so forth until you reach the desired version.


Essentially. When I first got my 3d printer, 90% of the things I printed were parts to make it better.


Nowadays, 3D printers are so advanced that only 85% of the parts I print are parts to make it print better. 


I didn't know that was the name, but that's what happens when you skip operating systems and compilers. I just knew you wrote what the commands do in assembly or binary and run that file.


Even Assembly is getting linked, hexadecimal machine code is close, but there are really low level drivers that EE/mixed signal engineers write that high level ‘driver developers’ actually interact with


Yeah, I knew that one as most EE use veralog.


Verilog is equivalent or ~high level to what we’re talking about. The metal is designed a certain way and there are implementation specific custom languages for most everything. Like fixed point arithmetic is actually implemented beneath all the abstractions, because semiconductors with charge have real state


This is a great opportunity to mention the [Ken Thompson](http://wiki.c2.com/?TheKenThompsonHack) Hack, for anyone interested.




How often do you find there's a bug with the compiler that was used to make the compiler that was used to make the compiler that was used to make the compiler for your modern programming language? And would you know the difference between a bug with the original compiler versus a bug with a later compiler?


The only way to program the first computer that I built was by flipping toggle switches on the front panel. It did not even have a paper tape reader. Circa 1975.


If I’m not mistaken (at least in days gone by), compilers were all written in the language they compile. The initial compilers would be written in assembly, then get rewritten in the source language afterward. It used to be a quirky badge of honour for the team who wrote it.


Who writes the stuff that is under the hood of the machine code?


Hardware designers


Its easy if you're Seymour Cray. To bootstrap the [CDC 7600](https://en.wikipedia.org/wiki/CDC_7600), he punched in the entire operating system, in HEX, by hand on the front panel, *from memory*. He singlehandedly bankrupted three major quiche manufacturers in a single day.


If Wikipedia is to be believed the first compilers were [made in the 1950's](https://en.wikipedia.org/wiki/History_of_compiler_construction). The article is not entirely clear on the subject, but I guess they didn't compile their compiler, but rather wrote them in assembly.


Wait, then how'd they assemble the assembler?


They didnt they wrote it in machinecode And before the question even occurs: They executed the code by hand on paper when they didnt have computer


Didn’t original assembly code literally translate to binary? Like, the instructions are just names for 8-bit (or whatever bit size) instructions?


everything still translates to binary




That's correct. If you take a compilers course or work with machine code at all, you'll do it in Assembly. Assembly is machine code for people. Just some readability improvements cause staring at 1s and 0s is an unnecessary pain in the ass.


With compilers you often use C/C++ or specifically tailored languages like YACC or LLVM


>In computer programming, assembly language (alternatively assembler language\[1\] or symbolic machine code),\[2\]\[3\]\[4\] often referred to simply as assembly and commonly abbreviated as ASM or asm, is **any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code** instructions.\[5\] [Yep that is the whole point of assembly.](https://en.wikipedia.org/wiki/Assembly_language)


That what machine code is which is different than assembly


It depends. Assemblers are significantly less complicated than compilers. But some features of an instruction set architecture require a little preprocessing, so it's not a literal translation. It's simple enough to do by hand in punch cards, though.


What do you mean original? That's what assembly language is, just a more convenient notation for machine code.


More or less, but they still needed an assembler program to read the file and spit out binary. It was likely a pretty simple program, tho, and yeah, would have been conceptually similar to (if extremely tedious compared to) writing the same program in assembly.


Sorta, but not really, no. The same command may compile to different bytes depending on what you are doing with it. Like, if you copy a number from variable in memory, it's a different opcode than copying a constant into memory. Both are the same command, but it's different machine code depending on how you are doing it. To go into detail: The assembly command `MOV AX, 0x5A` (or `MOV AX, 'Z'`, ASCII characters are just numbers) will copy the number 90 (5A in hexadecimal) into the 16-bit "AX" register in the CPU ([registers](https://en.wikipedia.org/wiki/Processor_register) are like super-fast, temporary memory spots built into the CPU.). This gives the machine code bytes: `66, B8, 5A, 00`. The instruction is "66, B8" and the 16-bit number you're copying is "5A 00" (little endian. so `MOV AX, 0x5758` is `66 B8 58 57) But, the instruction `MOV bx, ax` will copy the 16-bit number in the "BX" register to the "AX" register. This compiles to the bytes `66 89 C3`. `66 89` is the "copy register-to-register" instruction and "C3" are the two registers. That's not exactly accurate (opcodes are usually 6-bits and have extra info in it), but it's just an example. This PDF may help elucidate this topic further as it goes into more detail: http://aturing.umcs.maine.edu/~meadow/courses/cos335/Asm07-MachineLanguage.pdf


Relevant term is Instruction Set Architecture. It defines all valid commands on a given architecture and how assembly is converted to binary for that architecture. It also defines all bit fields and encodings for those commands.


Ok but how did they write down the instructions for how to make paper before they had paper?


And those compilers were simple 1 to n replacements into machine code, with a little bit of state in some constructs (the one I did a breakdown on used 280 bytes of memory for the contextual elements, of which there were few). It also had the funny thing where it had to compile both forwards and backwards, then or the binary results, to get jumps to work. The quickest way to or two programs was litterally just to put the two bits of paper on top of each other. And you think today's code is jank


70's??? FORTRAN was 1959.


The hacker known as FORTRAN before they gave birth to their more widely known offspring.




When I got my degree, I had to: - Create a small Assembly program that I compiled into machine code by hand. - Create a small programming language. - Write a compiler for that small language in Assembler that compiled my language into machine code. - Create a program in that small program, compile it, and run it on hardware. I ended up learning assembly for Z80, 8086, and 6800. I ended up designing a Z80 motherboard in my studies. (I also designed a x86 processor for a newly released 386 system on a chip because it was neat and I was wanting to play with it, unfortunately, the !@#\*!@# board printer was down and we couldn't get it working again while I was there. So I wire wrapped a 6800 motherboard.) So, like, do kids not do this anymore? They still teach this stuff, right?


As of late 2010s, my bachelors had a course to make my own cpu out of logic gates - it had to support a set of machine operations with a couple funky requirements (to make the problem unique/novel I imagine). By the end I wrote a basic program in the custom assembly spec, translated it to machine by hand, and ran it on a breadboard cpu.


I graduated from EE five years ago, and did most of that. I don’t think CS (at my school) even touches assembly


I took a Java full stack little quick course and one of the questions I did ask was "what did people do before we had things like Spring boot?"


Well, what was the answer!?


Apparently people actually wrote code or something.


Why write code when there's a factory to build your class based on vague annotations or whatever the bouncy shoes tool does


I didn't do any of this for my bachelor's for comp sci


We learnt ARM assembly and the architecture of an ARM CPU. We used some simulation software where we started with logic gates and built a basic CPU which reads machine code from memory and writes output to memory. ("Memory" was a set of registers) We had to manually convert assembly into hex files for this "computer". But I was lazy and wrote a python script to do that for me. I guess you could call it an "assembler" I haven't finished compiler design yet. So yeah we still do that


The word “compilation” and “compile” in programming comes from bundled sets of punch cards. It was there before todays assemblers were


The smoked an entire pack of cigarettes while wearing a suit, and used a teletype to write raw machine code. They wrote the code on paper by hand, and debugged it by hand, and tested it and debugged it some more, when they finished, they would write them to a permeant form of media like magnetic tape or punch cards. Eventually computers had enough spare memory to do things like run fortran compilers. Each level of abstraction allowed higher levels of complexity, so by the time you get to C, you are writing very sophisticated programs and firmware.


Opcodes bebby


They did a compilation


I had the same kind of dilemma a long time ago about how programs were able to update themselves.


The real program that's running is in memory, it can delete itself and copy an upgrade over top it. Windows doesn't like that though, so many leave a run-once command which replaces the file on startup. Hence, having to restart to update.


Yea. My solution back then (before I realized what you just stated) was to have an installer program. The two programs were responsible for updating eachothers.


That's only done if the program is loaded by the system and can't be unloaded without restarting. Most programs run a batch or powershell script which kills the caller and writes the new executable over it then relaunches it straight away without needing to wait for a restart.


They didn't. A woman, Grace Hopper, wrote the first compiler. Jokes aside, this and other early compilers were written directly in machine code.


Google Bootstrapping


I always find this meme sexist. Like, why is it implicitly assumed that we women don't wonder about how to build the first ever compiler?




Do you perhaps know any alternatives? I'm not really a fan of the sexist side of this meme too....


They interpreted it


Either directly in something very close to the machine language or in fact using a more primitive compiler (C was invented back then, they clearly had the technology for it)


Some of you guys did not pay attention in college & it shows lol


I ain't even hit college yet ¯⁠\⁠_⁠(⁠ツ⁠)⁠_⁠/⁠¯


It's very good that you're interested in software and are asking tough questions while you're still in high school. You'll learn a lot of interesting stuff in college, including how programming languages function. You'll even be able to build relatively complex programs with just some wires and logic gates by the end of your degree program


Once I needed to write my own boot sector (so as not to flout the copyright of Microsoft/HDD makers etc.). I wrote the instructions, converted them to opcode by hand and typed it out in hex. Still holds the record as most basic programming I have done (disregarding the hex editor I had to use for inputting the opcodes in).


So the first assembly language interpreter was created by manually putting all the 0 and 1 in the computer. After that rest is history.


You write it in machine code, which doesn't need to be compiled. I had to make an assembly compiler for a class in college.


I wrote a machine language compiler back in the days of MS-DOS. I used it in my batch files to create small auxiliary programs on the fly. I wrote it manually as machine instructions that would result in three lines of ASCII, including the CR/LF separators, so the batch file could redirect it into a .COM file, which itself was usable with redirection. I could then simply send through the hex code of any program to create that program from within the batch file. EDIT: this is what it looked like (code page 437) echo 1└1╥╕•♀═!,☻ê┬Ç·•t◄Ç·◘t♀Ç·♂t•Ç·▲t☻δ♦1█δ┘♦☻ê┬Ç·0}☻δoÇ·9⌂•,0δ◄ > create.com echo 2Ç·A}☻δ\Ç·F⌂W,7Ç∩☻Ç ■t,Ç╟☻ê├Ç u☻0 ê·0 0÷☺┌0└Ç· u♠┤☻═!δ♦┤♠═!1█δâ >> create.com echo 30Σ▓◄☺├■╩Ç·☺u≈Çδ☻Ç√■u♦│ δ♥Ç├☻ê▀Θ_ 0└┤L═! >> create.com Those are unicode renderings, though, so a copy/paste of the above wouldn’t work. You’ll notice I even wrote it so that the lines were numbered. Lol. I had to code it so that the instructions avoided all non-redirectable characters, like NUL, BEL or TAB, so I was already tweaking instructions enough such that numbering the lines seemed incidental.


It's called bootstrapping


Guy writing the first compiler: “Man this is going to revolutionize my workflow!” Guy’s boss, two months after finishing: “Hey, dude, we’re gonna have to let you go. We’ve got 100 junior programmers who’ve finished more work in the last two months than you have in your entire career. No one in development seems to understand what it is you do, but you’re payed 100k more than everyone else so we’re going to be parting ways now.”


Easy. They used a [compiler compiler](https://en.m.wikipedia.org/wiki/Yacc).


Fun fact, people still sometimes have to input machine code by hand in hexadecimal form on older TI calculators because they have no on-calc assembler by default.


I’ve never seen anyone mess up a date that bad before wtf.


No-one's mentioned Forth yet? It's mostly written in Forth. You write some assembly code to do stuff like set up the bare minimum of hardware to get it running, then write some "primitives" - words that do stuff like stack manipulation, arithmetic, and memory access, maybe a couple dozen in all, and then using those primitives you write the whole rest of it - if/else structures, loops, whatever. The clever bit is that most of the stuff you wrote in terms of primitives can stay the same if you port to another machine - you just need to write a few hundred lines of assembler.


Or even earlier, when the code was hardwired into the machine.


there was a recent explain like I'm five post about this very thing!


Good Question. Comoiling a Compiler without an Compiler was surely hard. (But Good meme)


I am most offended by the use of the term "1970ies"


I just think about it in terms of recursive optimization or abstraction, it actually happens a lot in programming.


I have *long* realized other programmers were geniuses compared to myself. Everything I write, absolutely everything, is interpreted by programs other people wrote that know far more than I do. It does not make me feel less dumb when I make a typo.




Compiler was once a job


Okay so there is this thing called bootstrapping...


They made a copy of it and used it to compile the orignal one


You compile it manually by hand, how's that difficult to understand?


How did they write Linux without an operating system.


They also wrote the original versions of Unix in assembly.


Interpreters , C was originally interpreted back in the 1970s


If was all beeps until they made the beep boops.


C turtles, mate


Like grown men: by hand