For a long time, if you wanted software to reliably recognize a specific set of objects from photos, the playbook was: collect images, label them, train a model, repeat. It worked. It also took a long time.

We have been down this road before: a client wants a better way to verify devices in the field, or sort a large library of walkthrough snapshot photos, categorize an archive library of images. There was a lot of setup involved.

As I was cleaning out my basement recently, digging through boxes of hard drives and old adapters, I kept thinking: what even is this thing? And then of course: have we not already built something with object recognition via a vision model?!

What is a Jaz Drive?!

How It Works

The idea I was curious to play with: skip the tedious training process. What can a vision model do with a little guidance — not a pile of reference images uploaded over and over, just visual cues in the registry?

So we’re looking at two pieces:

Vision inference — an AI vision model looks at the photo and returns structured text (device type, logos, serial/IMEI if readable, install clues).
Local registry — supported devices, keywords, and short "what to look for" notes a JSON file.

The Flow:

Upload a photo. The model returns JSON only.
The app scores that JSON against the registry
You get supported/unsupported, confidence, extracted identifiers, and a recommended next step.

No training on uploads. No fine-tuning loop.

Notes from Building

1. Text cues did the disambiguation work, not a photo library.

Instead of storing reference photos per device, we store descriptions of what to look for. Take a Jaz and Zip drive you might find in an old plastic bin: they look a lot alike in a bad photo. But a line in the registry ("dark green Jaz housing" vs. "blue Zip enclosure") steers the model when it would otherwise guess wrong. A miss meant editing cues, not adding fifty reference photos to a dataset.

2. Vision is good at reading; the registry is good for categorizing.

The model’s job: “What do I see?” The app’s job: “Is that on our supported list, and what should you do next?” Splitting those concerns keeps the system auditable. You can see exactly why a device matched (keywords, cues) without opening a black-box classifier.
Sometimes the model even gives you more specificity than you need when the goal is just categorization, so the registry steers it back to the right category.

3. One off > custom > API

After a few successful tests with nothing more than a prompt and an image dropped into ChatGPT or Claude, the next step was a simple frontend. Upload an image, get back a match, confidence score, extracted identifiers, and a recommendation.

At that point, the shape of the thing starts to become obvious. Accept an image and an instruction. Return repeatable JSON. Track confidence, match against a registry, auto-tag content, and keep costs and response times predictable.

Suddenly it's not just a way to identify mystery adapters from my basement. It’s a pattern for categorizing photos, verifying equipment, and building lightweight image-recognition workflows without training a model.

Want a thing?

Start with a Township sprint and get your own Thing in as little as 2 weeks.

I want a thing!

Built with ❤️ by Township

You’re receiving this because you signed up for Township’s “We Built a Thing” newsletter. We'll only send these when we, well, build things.

What Is This Thing?

How It Works

Notes from Building