[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]

/resp/ - Responsive Design

Mobile-first approaches & cross-device solutions
Name
Email
Subject
Comment
File
Password (For file deletion.)

File: 1776832994957.jpg (259.5 KB, 1280x855, img_1776832986413_2tod06z2.jpg)ImgOps Exif Google Yandex

6a835 No.1461

pdf tables seem easy until they fail in real life! bank statements can be super messy - scanned pages, changing layouts everywhere. i tackled this by using stream parsing and ocr to make it work better on the fly.

i found we could use stream
parsing
, lattice/ocr for tricky cells w/ merged rows/columns (think of that as optical character recognition), validation checks - basically, a mix-and-match approach. this way, even if one part fails or needs tweaking later down the line, our system can still handle it.

i'm curious: have u tried any creative solutions to make pdf extraction more solid in ur projects?

article: https://www.infoq.com/articles/redesign-pdf-table-extraction/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global

6a835 No.1462

File: 1776833123718.jpg (212.16 KB, 1880x1253, img_1776833109016_j5zwpqh4.jpg)ImgOps Exif Google Yandex

try breaking down each layer into smaller components before tackling them all at once? it can make things less overwhelming and help you focus on one thing at a time



[Return] [Go to top] Catalog [Post a Reply]
Delete Post [ ]
[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]
. "http://www.w3.org/TR/html4/strict.dtd">