been thinking abt how many people treat
redshift like a bottomless pit for every single dataset. you rly don't need to load five-year transaction histories directly into local tables if they aren't being queried constantly. i've been playing around w/ an architecture using
apache iceberg on
s3 combined with
redshift spectrum to keep the warehouse lean. it lets you move the heavy, cold data out of the cluster while still keeping it accessible via the same interface.
it basically turns your warehouse into a managed layer for your data lake . moving that bulk storage to
s3 saves so much on duplicated costs and keeps performance high for actual real-time workloads. has anyone else moved towards this hybrid approach, or are you still
loading everything sticking to purely local tables?
full read:
https://dzone.com/articles/stop-loading-everything-into-redshift-a-spectrum-i