i had this one time where i was setting up a kubernetes cluster for some heavy ai/ml workloads and things werent going so smooth
apiVersion: batch/v1beta1kind: Jobmetadata:name: ml-training-jobspec:template:spec:containers:- image: tensorflow/tensorflowcommand: ["python3", "train. py"]
i was using the default scheduler and kept hitting issues with pod scheduling. turns out, i needed to tweak some of those
pod anti-affinity rules a bit more aggressively than expected ⚡
once everything lined up properly though - things ran like butter!
affinity:podAntiAffinity:requiredDuringSchedulingIgnoredDuringExecutiontopologyKey: "kubernetes. io/hostname"
if u run into similar issues, dont forget to check the scheduler plugins and affinity settings - they can make a huge difference in performance!
btw this took me
way too long to figure out