There are so many new text analytics tools today. If you are confused about terminology in Text Analytics, check out our FAQs to clear things up!
What's new in Text analytics tools
There has been a revolution in text analytics recently. With the advent of powerful models, cloud computing, and more powerful language models (Chat Gpt, Claude, Flan T5, etc.), we are now able to analyze hundreds of thousands of words but there is some confusion about what text analytics is and how to take advantage of the recent advancements. We have assembled a set of questions and answers to help people understand text analytics and some important terminology.
What does text analysis enable you to do?
Text analysis can involve summarizing large amounts of text, organizing text into useful categories, or mining from multiple sources like emails, documents, web, and social media. Done well, text analysis will give you greater insight into what people are talking about and how they feel about it.
What is an LLM?
LLM stands for Large Language Model. You might be familiar with Chat gPT from Open AI. There are many LLMs and they vary in size from thousands of parameters to billions. When choosing an LLM you should consider the task (summarizing, topic models, translation, etc.) and the size of the model.
What is a topic model?
A topic model is a way to organize text data into useful categories. Examples of topic models include organizing customer reviews by price, value, and features or organizing information from nature into the taxonomy of species, climate, and physical features. Topic models can be created by letting the computer choose the categories (called unsupervised learning) or you can create one where you already have the categories and need a model to place text into those categories (called supervised learning).
What is sentiment analysis?
A sentiment analysis is measuring how people feel about what they said. There are typically three types of sentiment, good, bad and neutral. At times text data can contain mixed sentiment.
How do you turn text into structured data to analyze?
Turning text data into structured data begins with a body of text in a list or sentence or paragraph and converting it into a table like a spreadsheet. Once text is in a table you can begin to analyze it.
What is labeled data and why do you need it?
Labeled data is text data that has been reviewed and rated by a human being. We need labeled data to check the accuracy of our text models. Otherwise we might have a model with a good statistical score but it’s not very useful to people who need to rely on the model doing its job well.
What does it mean to have "Human in the loop"?
"Human in the loop" (HITL) is an approach in machine learning (ML) where people collaborate with models to improve model performance. People can create and provide training data or provide feedback on model output. People can review machine learning for bias and error. Ultimately people get the final say on the usefulness of AI recommendations. In most cases, people are involved in every step of the analytics process to make sense of the task, the model, and the final product.
What are the main challenges in text analytics?
There are a handful of challenges when working with text data. The first is how to generate labeled data. We have been using GenAI to create text on specific topics that we are studying. A second challenge is how much text do you have to work with for a particular task. Language models need context to be useful so too few examples can be problematic. Another problem is human labelers. It takes time and resources to label data. And you might need special expertise in the domain such as finance, legal, or medical to understand the text as you label it. A final challenge is measuring the accuracy of your models.
Do you have a question about text analytics?
Reach out to us and we can help you answer it!
© 2025 ORRO AI. All rights reserved.