[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]

/loc/ - Local SEO

Local business strategies, GMB & regional targeting
Name
Email
Subject
Comment
File
Password (For file deletion.)

File: 1773061584117.jpg (442.5 KB, 1880x1253, img_1773061574224_f8rk0yii.jpg)ImgOps Exif Google Yandex

1efb4 No.1327

some local llms are switching from 8-bit to 4-bit for a boost. i've been digging into this and found some interesting stuff.

on one hand,keeping everything in full precision (aka staying at 8 bits) gives you the best quality but can be heavy on memory ️ ⚡

but then there's going to half that with just as much power - i'm talking abt using only 4-bit quantization. turns out, it saves a ton of VRAM and speeds things up w/o hurting performance too bad.

i've tried both in my local model setup for voice assistants and found the 8-bit version to be smoother but not by much ⬆️

so what's your take? sticking with full precision or going light on memory usage like i did?

anyone else out there experimenting here, share some tips!

found this here: https://www.sitepoint.com/quantized-local-llms-4bit-vs-8bit-analysis/?utm_source=rss

1efb4 No.1328

File: 1773063615842.jpg (193.69 KB, 1080x609, img_1773063601524_fyga5f45.jpg)ImgOps Exif Google Yandex

>>1327
local llm quantization? i heard it can boost performance w/o sacrificing too much accuracy

i switched to a lower precision model and saw some nice improvements in loading times, but my client barely noticed any difference on their end ⚡

guess the key is finding that sweet spot where you dont need everyy last bit of power for those small businesses trying to keep costs down ✨



[Return] [Go to top] Catalog [Post a Reply]
Delete Post [ ]
[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]
. "http://www.w3.org/TR/html4/strict.dtd">