What is Topic Modeling?

2 March 2021

Written by David Gulajan

Are you in need of more clarity in terms of the ever-developing field of Natural Language Processing? Good news! Avineon is launching a series of Natural Language Processing (NLP) blog posts to outline our success with Machine Learning and other Deep Learning endeavors.

 

For your benefit, the list of covered topics is illustrated here below:

 

 

These topics are geared towards those with little to no prior knowledge in the field but will aim to engage those of all knowledge levels. Whether you know only a few terms or whether you have built your own model, we look forward to embarking on this series with you!

 

Website Premium Content Banner_NLP Brochure_Tekengebied 1

 

 

What is Topic Modeling?

 

Word embeddings, described in the last blog, are not the end of how computers process language. In fact, it is just the beginning! If you have ever used a Venn diagram before, you have the foundation to understand topic modeling. Topic modeling attempts to group some amount of text into groups (called topics) even if the groupings are hard to identify. Visually, these topics appear as bubbles, like the circles in a Venn diagram, that can overlap. Each bubble represents a shared topic and where they overlap represents where the topics themselves are similar and contain text which fits multiple topics.

Let us look at a simple example using a basic visualization of the first paragraph:

 

Topig Modeling_V2_Tekengebied 1

 

As you can see from the visualization, each sentence fits into one or more circles, which represent the topics generated by the model. Now that we have this topic model, we can use it to interact with our text.

 

Say that we would need to find two related sentences about one single topic. We now have a clear relationship (each circle) that maps our text for us. And not only that, the relative distance, in addition, represents how "close" the topics are to each other, just like in the "slider example" with word embeddings. Keep in mind, this visualization is a two dimensional representation of a much more complex mapping!

 

 

The Value of Topic Modeling

 

Why would we want to know when sentences are closely related? When we are working off a simple paragraph, it may be hard to see the value. But the value is more easily seen when you consider this same process extended across all of your unstructured data. Now you have a way to interact with your data to answer important and interesting questions.

 

Assume for a minute that you need to find all of the text in your document repository that contains addresses, but your documents are not in a standard format. Normally, you would need a human reader to sift through the documents, extract the addresses, and then input that data into a structured format or you could use a Robotic Process Automation (RPA) tool. Human readers are going to be clearly slower than automation, but even an RPA solution needs specific formatting  to operate. Topic Modeling is often used as a part of a larger machine learning solution, where topics are used to reduce computing time and increase efficiency for text processing.

 

 

Natural Language Processing

 

In this way, the topic model would greatly reduce the amount of text to sift through to find address information. However, it does not find the address information precisely. That is why you can layer NLP solutions on top of each other. Topic modeling is just one tool in a large toolkit to tackle difficult unstructured data problems. We will explore these other layers and Avineon’s developed solutions in future blogs.

 

NLP Webinar Recording_Website Visual_Tekengebied 1