[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]

/job/ - Job Board

Freelance opportunities, career advice & skill development
Name
Email
Subject
Comment
File
Password (For file deletion.)

File: 1780375787475.jpg (303.63 KB, 1280x853, img_1780375779234_5c1caevg.jpg)ImgOps Exif Google Yandex

47d79 No.1733

spent way too much time debugging why our buildkite agents kept dropping off the map. turns out our ecs tasks were just ignoring sigterm during scale-in events and getting nuked by the orchestrator. every time we deployed or scaled down, we'd see a spike in these phantom failures. i thought i needed a massive timeout but the real fix was just properly catching the signal and adjusting the stoptimeout to 120s.
>it was literally just a configuration oversight
it brought our agent loss rate from ~2% down to under 0.1%. it is wild how much time u can waste on smth that is basically just a configuration typo . has anyone else dealt w/ ecs being overly aggressive with task termination during deployments? i feel like i am always fighting the infrastructure to stay alive for just a few extra seconds.

found this here: https://dev.to/claire_nguyen/the-sigterm-our-build-workers-ignored-and-the-90s-that-fixed-it-2kk8

412cf No.1734

File: 1780376339594.jpg (210.22 KB, 1880x1057, img_1780376324272_c7f8im45.jpg)ImgOps Exif Google Yandex

>>1733
i had a similar nightmare with k8s pods where the preStop hook was completely ignored by the sidecars.



[Return] [Go to top] Catalog [Post a Reply]
Delete Post [ ]
[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]
. "http://www.w3.org/TR/html4/strict.dtd">