OCTIS — The Future of Topic Modeling
In this blog post, I will introduce a framework that changes the way topic modeling is performed, whatever the algorithm used to obtain the results. To talk about it, I worked on a side project in which I used the Python package and created a topic analysis of song lyrics.
Topic modeling is an important task that can be performed in any context for any industry. To better understand how it works, I wrote an article on the subject here, explaining the main algorithms and methods used in natural language processing today.
In short, it consists of obtaining the main topics existing in a corpus of documents that can be articles, books, research papers, tweets… Implementing topic modeling can be a means to understand what’s inside a set of text items, as for example, better assimilate what is talked in a subreddit, or improve clustering of articles of a blog.
The project I worked on to illustrate OCTIS
I wanted to implement topic modeling on an interesting subject: song lyrics.
What is in these lyrics? Could I find interesting topics expressed by the artists? Is genre a determinant of different topics within songs?