The NLP revolution has started — My BERT implementation

As you know AI is all the rage these days, with the industry after industry poised to be affected by it.

AI can be broadly divided into three structures — Numbers, Text, Images (including video). While Dealing with numbers has been going on for quite some time, dealing with text ( NLP ) and Video are both very hot right now.

To be clear, NLP is a very old concept but now are seeing massive interest in the field, thanks to

convergence of Data, Algorithms and computing power.

we are now collecting tons and tons of data, our CPU GPU are getting insanely cheap and powerful, last but not least there is tremendous explosion of interest in this area resulting in so many advanced word embedding algos like Word2vec, Glove, BERT, ELMo, Context2vec

Word2Vec and GloVe ultilize the co-occurrence of target word and context word, which is defined by the context window. But the order of words in a sentence is not taken into account.

Context2Vec takes the sequential relationships between words into account. Each sentence is modeled by a bidirectional RNN, and each target word obtains a contextual embedding from the hidden states of RNN that captures the information from words before and after it.

ELMo is quite similar to Context2Vec. The main difference is that ELMo uses language modelling to training the word embeddings but Context2Vec adopts the Word2Vec fashion, which builds the mapping between target word and context word. Also, ELMo is a little deeper than Context2Vec.

BERT benefits from the invention of Transformer. Though ELMo proves to be effective in extracting context-dependent embeddings, BERT argues that ELMo captiures the context from only two directions (i.e., bidirectional RNN). BERT adopts the encoder of Transformer, which is composed of attention networks. Therefore, BERT can capture context from all possible directions (fully connected).

Heres a very basic example of superpower that BERT posses

The result of this as follows

The bert encoder in this example has been trained on a bunch of fantasy novels hence the slightly dystopian view, but this is crazy,

computers sounding like humans will change the world without a doubt.

Originally published at



Data Scientist, NLP Enthusiast, Blockchain Evangelist..

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store