OCR for yarn labels — the database, not just the scanner.
An OCR engine without a label database guesses at brand names. YarnScope has a curated database of ball-band layouts from Drops, Madelinetosh, Knit Picks, Cascade, Brooklyn Tweed, Quince, Malabrigo, and dozens more — so the OCR knows where to look.
Why OCR alone isn't enough
A generic OCR model can read every word on a ball band, eventually. The problem is that ball bands are not press releases. They are a dense, brand-specific layout: the colour code lives in one corner, the fibre percentages in another, the dye lot in a third. Where exactly depends on the brand.
Without a database, OCR returns a wall of unstructured text and the parser has to guess. With a database, the parser knows that on a Drops Karisma band the meterage is to the right of the weight grams, in a smaller font, and that "100%" reliably prefixes the fibre line. The parsing accuracy is the difference between "yes, save" and "fix the dye lot and the yardage and the brand line".
Brands in the database (selected)
- Multilingual: Drops Design / Garnstudio (the largest single corpus)
- North American mainstream: Knit Picks / WeCrochet, Lion Brand, Cascade, Berroco, Plymouth
- Indie / luxury: Madelinetosh, Malabrigo, Quince & Co., Brooklyn Tweed, Manos del Uruguay
- Nordic: Sandnes Garn, Rauma, Isager, Holst, Pickles, Du Store Alpakka
- UK: Rowan, West Yorkshire Spinners, Jamieson & Smith
- Continental EU: Schachenmayr, Lana Grossa, Lang Yarns, Adriafil, Plassard, Phildar, Bergère de France
- D2C / online: We Are Knitters, Wool and the Gang, Scheepjes, Durable
- Cross-stitch floss: DMC, Anchor, Madeira (skein number recognition)
New brands are added weekly via two channels: explicit indie-dyer requests and the correction loop below.
The correction loop — one tap teaches the engine
When OCR misreads a field, the review card highlights the suspect cell in orange. You tap it, edit the value, and tap save. That correction does two things: it lands in your stash entry (correctly), and it ships an anonymised correction record back to YarnScope. After thirty users correct the same field on the same brand, the OCR engine re-trains on that pattern. The next person scanning that brand sees it parsed correctly on the first pass.
No corrections are shipped without consent. The setting is on by default but can be turned off entirely in Settings → Privacy → OCR feedback. Turning it off does not degrade your own scans; it just stops your corrections from improving the engine for others.
What we capture, and what we don't
What we capture, transiently: the camera frame, decoded to text, with the OCR engine on our server. The frame is held in memory only for the duration of the parse — under one second — and then discarded. The decoded text fields land on your stash entry.
What we never capture: faces, hands, the surface behind the ball band, anything outside the framed band. The camera flash never fires automatically. The camera is requested only when you tap Scan.