Unleashing Text Generation: The Magic of N-Grams and Padding in Natural Language Processing

Naveen Mathews Renji
5 min readJun 8, 2023

Welcome to the sixth installment in our in-depth series on Natural Language Processing (NLP). We’ve journeyed through the essential concepts such as tokenization, corpus, padding, and embedding, and explored the dynamics of Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM). Today, we dive into the fascinating world of text generation, focusing on a potent tool in the NLP toolkit — the N-Gram Sequences.

image from https://images.deepai.org/glossary-terms/867de904ba9b46869af29cead3194b6c/8ARA1.png

Text Generation: The Whys and Wherefores

Text generation is the automated creation of written content by a machine. It’s a crucial aspect of NLP that finds diverse applications, from auto-completing your sentences in Gmail to writing poetry, generating song lyrics, creating news articles, and even crafting entire books! By using algorithms and language models, machines can produce human-like, readable content that often appears as though a human wrote it. But remember this famous saying in data science: “Garbage in, garbage out.”, you’ll understand why in a bit.

Decoding Text Generation

The heart of text generation lies in understanding the structure of language and predicting the sequence of words that can follow a given input. It’s a lot like how we humans…

--

--