YarnScope
Issue № 001Spring / 2026Klaipėda · A stash tracker for serious crafters
04The Engine · OCR Database

OCR for yarn labels — the database, not just the scanner.

An OCR engine without a label database guesses at brand names. YarnScope has a curated database of ball-band layouts from Drops, Madelinetosh, Knit Picks, Cascade, Brooklyn Tweed, Quince, Malabrigo, and dozens more — so the OCR knows where to look.

Why OCR alone isn't enough

A generic OCR model can read every word on a ball band, eventually. The problem is that ball bands are not press releases. They are a dense, brand-specific layout: the colour code lives in one corner, the fibre percentages in another, the dye lot in a third. Where exactly depends on the brand.

Without a database, OCR returns a wall of unstructured text and the parser has to guess. With a database, the parser knows that on a Drops Karisma band the meterage is to the right of the weight grams, in a smaller font, and that "100%" reliably prefixes the fibre line. The parsing accuracy is the difference between "yes, save" and "fix the dye lot and the yardage and the brand line".

Brands in the database (selected)

  • Multilingual: Drops Design / Garnstudio (the largest single corpus)
  • North American mainstream: Knit Picks / WeCrochet, Lion Brand, Cascade, Berroco, Plymouth
  • Indie / luxury: Madelinetosh, Malabrigo, Quince & Co., Brooklyn Tweed, Manos del Uruguay
  • Nordic: Sandnes Garn, Rauma, Isager, Holst, Pickles, Du Store Alpakka
  • UK: Rowan, West Yorkshire Spinners, Jamieson & Smith
  • Continental EU: Schachenmayr, Lana Grossa, Lang Yarns, Adriafil, Plassard, Phildar, Bergère de France
  • D2C / online: We Are Knitters, Wool and the Gang, Scheepjes, Durable
  • Cross-stitch floss: DMC, Anchor, Madeira (skein number recognition)

New brands are added weekly via two channels: explicit indie-dyer requests and the correction loop below.

The correction loop — one tap teaches the engine

When OCR misreads a field, the review card highlights the suspect cell in orange. You tap it, edit the value, and tap save. That correction does two things: it lands in your stash entry (correctly), and it ships an anonymised correction record back to YarnScope. After thirty users correct the same field on the same brand, the OCR engine re-trains on that pattern. The next person scanning that brand sees it parsed correctly on the first pass.

No corrections are shipped without consent. The setting is on by default but can be turned off entirely in Settings → Privacy → OCR feedback. Turning it off does not degrade your own scans; it just stops your corrections from improving the engine for others.

What we capture, and what we don't

What we capture, transiently: the camera frame, decoded to text, with the OCR engine on our server. The frame is held in memory only for the duration of the parse — under one second — and then discarded. The decoded text fields land on your stash entry.

What we never capture: faces, hands, the surface behind the ball band, anything outside the framed band. The camera flash never fires automatically. The camera is requested only when you tap Scan.

Questions about the OCR engine

How is OCR with a database different from generic OCR?
Generic OCR reads the text of a photo. A database tells the engine where each field lives on the band — brand top-left, dye lot bottom-right, fibre composition under the yarn name. Knowing where to look turns 'read everything' into 'fill the right boxes'.
Where do you get the ball-band layouts?
From public ball-band photography (brand marketing imagery, Ravelry-licensed user uploads with consent, the YarnScope sample stash). No yarn brand has shared private data with us. Indie dyers can request inclusion at start@djump.io.
What if I scan a vintage Rowan band from 1996?
Out-of-print bands are recognised partially. Brand and fibre usually parse; yardage may need a manual correction. We add vintage bands as users contribute corrections.
Does YarnScope read barcodes on yarn shop tags?
Not yet, and we deliberately scoped the OCR to the ball band itself rather than the LYS price tag. The ball band travels with the yarn; the price tag stays at the shop.
Are my scanned photos used to train the OCR?
Only when you explicitly correct a misread field do we send anonymised corrections back to improve the engine. Photos themselves never leave your device unless you attach them to a stash entry.