The three c's
FUELING YOUR TEXT CLASSIFICATION MODEL WITH GENAI CREATED DATA
FUELING YOUR TEXT CLASSIFICATION MODEL WITH GENAI CREATED DATA
Working with Generative AI tools can be fun and challenging. See how we are using the Three C's to manage the creativity of Gen AI and create useful supplemental text data.
When you want to build a small language model to divide text into pre-decided categories, it can be hard to build an accurate model when you have a small dataset or you do not have a lot of samples of one category, you can use generative AI like ChatGPT and Claude to obtain additional examples to build a strong model.
To generate these additional examples, follow the Three Cs.
Create
Use one or more GenAI models, which give you the right to use the output as sample data for model building. Llama is an open-source model that generally allows you to do this. (You must assess the rights issue for your use case.)
Constrain
Provide constraints in your directions to the GenAI model (called a prompt) to force the model to be creative. Constraints can be that the GenAI model must take on various personas when creating the data.
Suppose you want to generate sample post-conference survey data. In that case, you can direct the GenAI model that it should provide conference complaints as though the model was a ballerina, who was bored at a choreography conference and then as though it was a French professor experiencing ennui at the Proust conference.
With this creativity, GenAI will develop well-rounded examples. That’s the good news. The bad news is that GenAI can be chatty and veer into topics you didn’t ask about. However, the next step can solve that problem.
Cull
To eliminate off-topic samples, you can again look to GenAI. Instruct a different GenAI model to pick which of your data categories it fits within. If it doesn’t pick the one you intended in the Create stage, then eliminate that example. Repeat the process if you need more examples.
If you follow the Three C's, you will be well on your way to generating helpful text for your text classification model!
© 2025 ORRO AI. All rights reserved.