Notes from Building
1. Text cues did the disambiguation work, not a photo library.
Instead of storing reference photos per device, we store descriptions of what to look for. Take a Jaz and Zip drive you might find in an old plastic bin: they look a lot alike in a bad photo. But a line in the registry ("dark green Jaz housing" vs. "blue Zip enclosure") steers the model when it would otherwise guess wrong. A miss meant editing cues, not adding fifty reference photos to a dataset.
2. Vision is good at reading; the registry is good for categorizing.
The model’s job: “What do I see?” The app’s job: “Is that on our supported list, and what should you do next?” Splitting those concerns keeps the system auditable. You can see exactly why a device matched (keywords, cues) without opening a black-box classifier.
Sometimes the model even gives you more specificity than you need when the goal is just categorization, so the registry steers it back to the right category.
3. One off > custom > API
After a few successful tests with nothing more than a prompt and an image dropped into ChatGPT or Claude, the next step was a simple frontend. Upload an image, get back a match, confidence score, extracted identifiers, and a recommendation.
At that point, the shape of the thing starts to become obvious. Accept an image and an instruction. Return repeatable JSON. Track confidence, match against a registry, auto-tag content, and keep costs and response times predictable.
Suddenly it's not just a way to identify mystery adapters from my basement. It’s a pattern for categorizing photos, verifying equipment, and building lightweight image-recognition workflows without training a model.