OCTIS — The Future of Topic Modeling

Nicolas Pogeant
8 min readMar 7, 2023

In this blog post, I will introduce a framework that changes the way topic modeling is performed, whatever the algorithm used to obtain the results. To talk about it, I worked on a side project in which I used the Python package and created a topic analysis of song lyrics.

Generated on Lexica.art

Topic modeling is an important task that can be performed in any context for any industry. To better understand how it works, I wrote an article on the subject here, explaining the main algorithms and methods used in natural language processing today.

In short, it consists of obtaining the main topics existing in a corpus of documents that can be articles, books, research papers, tweets… Implementing topic modeling can be a means to understand what’s inside a set of text items, as for example, better assimilate what is talked in a subreddit, or improve clustering of articles of a blog.

The project I worked on to illustrate OCTIS

I wanted to implement topic modeling on an interesting subject: song lyrics.

What is in these lyrics? Could I find interesting topics expressed by the artists? Is genre a determinant of different topics within songs?

--

--