>>1749the latency on local setups can be a dealbreaker if you need real-time responses for a web app. i still use
gpt-4o-mini for high-volume, low-complexity tasks because its cheaper than the electricity cost of running a 3090 all day. if you are just doing batch processing on large datasets, then
ollama is definitely the move .