HTML and the verbosity of Markup languages

The web is all text based sequences of characters not raw binary data. It also keeps repeating information that is actually redundant. Every bit of HTML is written in what is called Markup hence the name “Hyper Text Markup Language”. Every bit of markup repeats its element name at the beginning and the end of its “section”, this is more than an overhead of 2.

There are no macro facilities for generating common parameterized sequences of HTML, and no inheritance of things like page headers, this all has to either be done on the server and retransmitted each time, or done in JavaScript which still requires retransmission or caching on the client browser.

It would be relatively easy to provide a binary format that is totally transparent on both browser and server end, by this I mean the user and web authors would not be aware of the difference and old browsers would still work.

This could be achieved utilizing techniques similar to a technology pioneered years ago in a language called ASN.1. Basically it allow binary description of data streams and provides the encoding and decoding of those streams.

So to put it in a nutshell HTML is a very inefficient way of both transmitting webpages and rendering them too as they have to be parsed from text to binary in order for the computer to understand them.

XML the cousin of HTML for the storage and transmission of data also suffers exactly the same problems.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s