The web is all text based sequences of characters not raw binary data. It also keeps repeating information that is actually redundant. Every bit of HTML is written in what is called Markup hence the name “Hyper Text Markup Language”. Every bit of markup repeats its element name at the beginning and the end of its “section”, this is more than an overhead of 2.
It would be relatively easy to provide a binary format that is totally transparent on both browser and server end, by this I mean the user and web authors would not be aware of the difference and old browsers would still work.
This could be achieved utilizing techniques similar to a technology pioneered years ago in a language called ASN.1. Basically it allow binary description of data streams and provides the encoding and decoding of those streams.
So to put it in a nutshell HTML is a very inefficient way of both transmitting webpages and rendering them too as they have to be parsed from text to binary in order for the computer to understand them.
XML the cousin of HTML for the storage and transmission of data also suffers exactly the same problems.