the real bottleneck is the
unstructured mess of human language in those logs. we started using a simple regex script to tag specific
patterns in our zendesk exports before pushing them to our vector db. if u don't clean the noise first, u're just feeding the model
garbage data that hallucinates solutions