Why are there so many programming languages?
This was a question I was recently asked, and to which I failed to give much of an answer, being rather engrossed in using of one at the time. Thinking about it later, however, it is a question worth answering, especially for people looking at programming from the outside, for whom it often seems to be a rather mysterious and ill understood subject. This article begins with just a little context as to what a programming language really is, before outlining a few of the things that differentiate them in an attempt to answer the question. Not every detail will be included to keep this at a reasonable length (definitely not anything to do with me not knowing), but enough to hopefully get a feel for the ideas.
Computer processors are exceptionally good at a few simple things - loading numbers from various bits of memory, doing basic arithmetic and logical operations on those numbers, storing the results in other bits of memory, and choosing branches of instructions to follow, based on simple attributes of the number it currently has loaded - namely, whether it is positive, negative, or zero. A program is a sequence of instructions for a processor to execute on given input data. Coding a computer to do any significant work using these extremely primitive operations, known as ‘machine code’, would be a herculean task, especially given that modern processors are designed entirely for performance rather than ease of understanding how to program them. To leverage the power of computers, despite their stupidity, we instead use ‘higher-level’ languages.
‘Higher-level’ in this context does not mean more difficult - it means more abstract. From pretty much the very beginning of computing, it was realised that it is possible to write programs which operate on other programs. In particular, a program could be written which takes a piece of textual data, written with a strictly defined grammar and syntax, and outputs a program that the machine can understand - the transformation program is known as a ‘compiler’, in that it ‘compiles’ the higher level program into machine code. The rules about what input is valid, and can be compiled, make up the rules of a programming language.
This is a wonderfully powerful idea. Whereas when writing directly for the machine each program had to be written specifically for the CPU it was going to run on, now the compiler could take care of it. Feed in a higher level program, specify the type of CPU you want it to run on, and out comes a piece of machine code for that CPU. The same input program can then be used targetting a different architecture, and you get an equivalent program ready to run on a CPU that may be completely different. The other great benefit of higher level languages is the extra level of abstraction that can be provided for the programmer. For example, the fact that output onto a screen was actually executed by loading data into particular memory locations can be hidden. The primitive operation goes from ‘load this number into a particular memory location’ to ‘set the value of a pixel at a coordinate’. This idea, applied to every possible thing we want computers to do, makes understanding and writing programs many orders of magnitude simpler, enabling programmers to write more and more complex programs, without having to necessarily understand exactly how the idiot savant that is the computer actually runs. Of course, this applies to compiler programs themselves - many languages are not, in fact, turned directly into machine code, but go through an intermediate step in another, still high level but not quite as high, language which is in turn compiled into lower level code.
So why so many? We’ve had the best part of a century of active research and massive usage to figure this out, so how come we haven’t come up with ‘the best’ yet?
Firstly, there is the massive range of things we now use computers for. This piece of text sits atop a teetering mountain of code. It has come to you from a computer somewhere on the nebulous internet, was transmitted through many devices which routed it to the correct place, was transferred to your computer’s memory via several pieces of hardware, and is being interpreted and presented to you by an application, which itself sits on top of an operating system. Every one of these components is complex in itself, and each in different ways, so it is perhaps natural that there is no single language which would be ‘the best’ for expressing all of them. A language like C is designed to make it (relatively) easy to work with very low level primitives in the machine, enabling programmers to (pretty much) directly manipulate the memory allocated to their programs. The code running on your computer’s network driver, which deals with communication in very simple packets of data, and has to work very quickly, was likely written in C, as this is the kind of problem that the level of abstraction it provides is good for. On the other hand, this paragraph turns green if you click it because of a piece of javascript. That javascript was executed by your browser, which has a kind of ‘live’ compiler, called an interpreter, built into it - another example of the abstraction that higher level languages enable. Javascript doesn’t include any way of moving pieces of data into specific memory locations, but it does have built-in components for having ‘elements’ of a ‘page’ reacting to ‘events’, and so is clearly more useful for creating interactive web pages than C, but the overhead of all these add-ons make it much worse for code which must be ultra high performance.
Programs in some languages, such as Java, Rust, or Haskell, must be specified with strict ‘types’ for each piece of data. The things you can do with that piece of data are then determined by its type, and if you try to do something outside of these operations, the compiler will reject your program as invalid. Languages like Python or javascript, on the other hand, play fast and loose. Add a string of characters to a number? Python won’t care - until it tries to actually run that line, at which point an error occurs. Javascript will happily turn the whole thing into a string of characters, with the number becoming digits. Small, simple programs are often much quicker to write in these dynamic languages, whilst the stricter languages require lots of extra code to make sure everything is good and proper. The safety and explicit understanding they put into the code can be invaluable once the program starts getting large, and being worked on by multiple people, however, as the types form a ‘contract’ of sorts.
This last distinction between languages is one of many that programmers use to judge languages. Personal preferences are a major factor for these decisions. Strict types, or dynamic? Data that can be changed in place, or data that must be copied to alter it? A preference for long, descriptive variable names, or concise ones? These and a thousand more considerations play into the judgements programmers make about languages. Some are almost purely cosmetic, others concern the structural foundations of a language - interestingly many people seem to make the same amount of fuss about both. Language design is full of tradeoffs, and it’s impossible to have everything. As each programmer has a unique history, set of skills, and use cases for their programs, so each will have their own preferences, and a language that seems perfect to one is deemed unusable by another.
Due to this, and the ever evolving types of software being created, languages regularly fall out of fashion. A handful have lives measured in decades, many become popular for a few years before being discarded for the next new thing, and most are never used by anyone but the creator, and perhaps a small group of people. The corpses of dead and dying languages litter the internet, further adding to the confusion.
New languages come about for many reasons, and for different people creating them is some combination of hard, interesting, and useful. This leads to languages and their compilers being created for reasons of pure curiosity and learning, humour, specific use cases, and as attempts to do better than older languages. There’s nothing stopping anyone with access to a computer writing their own programming language (or at least trying), and thus many people do. Then, for all the aforementioned reasons and many more, some become popular and communities form around them. Tools are created to help write programs in them, and pieces of software common to many programs are created and neatly parcelled up for reuse. Rabid fans evanglize the language at any opportunity. Haters bring up its faults whenever and wherever they can. And another language joins the pantheon - at least for a while.