AI for Inventors — Case Study

A Talented Person with a Clear Problem
and an AI Partner

How an hour of focused collaboration — part description, part troubleshooting, part domain knowledge — produced a working Python automation tool from scratch.

~1 hr To first working test
4 Major pivots made
958 Lines of Python produced
0 Lines written by hand
01 — Context

The Honest Starting Point

The person who built this tool is not a software engineer. But they are an inventor — someone with hands-on experience in image processing and OCR, familiar with how cameras and sensors interpret the physical world, and comfortable thinking in systems. They knew what they wanted. They just needed someone to write it.

The problem was genuinely tedious: a mobile game mini-game called the Sushi Station requires constant clicking, dragging, and grid management to be effective. Boring to do manually. Interesting to automate.

The actual first message

"I would like to create a Sushi combiner application that is an advanced auto clicker based on visual items on the screen... A screenshot is supplied of the screen setup currently."

That was enough. A clear goal, a screenshot, and domain context. The AI did not need more than that to produce a first working skeleton.

Sushi Station board
The screenshot that started it all. A numbered tile grid, a flame button on the right panel. The AI measured cell sizes from pixel distances and began building.
Early GUI working
Early working build. The GUI running alongside the game. Pairs were being found. The first real test happened within an hour of the opening message.
02 — The Process

What the Collaboration Actually Looked Like

This was not "prompt in, perfect app out." It was a genuine back-and-forth — the inventor describing, correcting, testing, and redirecting. The AI generated code, explained decisions, diagnosed errors, and responded to pushback. Neither side could have done this alone.

REAL CONVERSATION EXCERPTS — Calibration Phase
Inventor "It found the flame button and was able to move some sushi, but it did not set the grid up correctly based on the pick grid top left corner box. It also did not identify the sushi as the correct numbers."
AI (diagnosis) "Grid X is wrong — the biggest problem. At Grid X=521, the capture starts at column 3 of the grid, missing the left columns entirely. That's why it found empty=43 — it's looking at the wrong area." → Provided specific corrected values.
Inventor "After tweaking it myself, these seem to be the correct values: grid_x: 469, grid_y: 89, cell_w: 87, cell_h: 87..."
AI "Good — those match what the debug grid image shows. Updating config."

The pixel coordinates were not auto-detected. The inventor eyeballed the game, compared the debug overlay image to the actual grid, and corrected the values manually. That domain judgment — knowing what "correct" looks like — came from the human, not the AI.

REAL CONVERSATION EXCERPTS — OCR Failure
Inventor "The 13 keeps appearing as a 3 and the 12 a 2. A pattern noticed for the font color is 1 to 9 is white, 10 to 19 is brownish in color, 20 to 29 is white again, 30 to 39 is yellow in color."
AI (root cause) "The strip was 87px wide but the number only lives in the rightmost ~25px. Tesseract was reading 62px of sushi rice texture as digit-shaped noise, stopping at the first word it found." → Cropped to bottom-right 42% only.
Inventor "I think there needs to be a way to have the image be simplified to scan better."
AI (pivot) Proposed template matching: capture one screenshot of each tile value, store it, match by pixel correlation. No OCR needed for known values.

The OCR approach — initially chosen by the AI — struggled with the game's styled fonts. The inventor's observation about color-coded tiers led directly to the three-strategy pipeline. The template matching idea was the AI's response to the inventor's push. The inventor did not need to know how template matching works. They needed to know that OCR was failing and say so clearly.

03 — Pivots

Four Moments Where the Approach Changed

Good collaboration includes knowing when something isn't working and being willing to try a different angle. These were the four significant pivots in this session.

What was tried Full-width cell OCR — read the entire bottom strip of each tile
What actually worked Right-justified crop — numbers sit in the bottom-right 40% only. Sprite art was poisoning the left side.
What was tried Single Tesseract OCR pass with a fixed threshold (160)
What actually worked Template matching first, then three OCR fallback strategies per tile color range
What was tried Scan → execute one pair → rescan → repeat (slow, one drag per full scan)
What actually worked Scan once → execute all pairs → summon all empties → rescan (batch mode)
What was tried Complex cascade algorithm (v3/v4) with staircase sorting and tier cycling
What actually worked in practice Simpler v1 loop — scan, find pairs, drag highest first, summon. Reliable and fast.
Honest note on the cascade algorithm

The v3 and v4 cascade strategy — organizing the board into a descending staircase to chain 100% tier-ups — was architecturally correct and mathematically sound. But it required precise board state, reliable OCR of 45 cells, and timing coordination that introduced more failure points than it eliminated. The simpler loop from v1 remained the most reliable daily driver. The cascade work was genuinely valuable as design exploration; it just wasn't the final answer.

04 — The Debug Layer

Making the Invisible Visible

One thing the AI built in from the start — without being asked — was a suite of debug image exports. These turned calibration from guesswork into evidence.

Grid debug overlay
dbg_grid.png — Green lines show where the scanner thinks the cell boundaries are. If these don't align with the actual game grid, every OCR read is wrong. The inventor compared this to the screen and adjusted accordingly.
Cell crop debug
dbg_cell_0_0.png — The exact pixels the scanner sees for a single tile. The number "13" is visible in the bottom-right. The sprite art to the left was the OCR failure source — clearly visible once cropped.

This is a pattern worth noting: the AI anticipated what would be hard to debug and built the tools to debug it before any problem was reported. Every time the inventor said "something is wrong," there was already an image file that showed exactly what the tool was seeing. That feedback loop cut troubleshooting time dramatically.

05 — The Human Contribution

What the Inventor Brought That the AI Could Not

It would be dishonest to say the AI did everything. The inventor's background in image processing and OCR shaped the entire session in ways that a complete beginner would have struggled to replicate.

What the inventor knew How it shaped the session
OCR fails on stylized fonts Immediately recognized the "13 reads as 3" problem as a classic OCR noise issue, not a bug. Described the font color pattern (white/amber/yellow by tier) — exactly the information the AI needed to design the multi-strategy pipeline.
What a grid overlay should look like Knew that the green lines in dbg_grid.png should align with the tile borders. Could evaluate the image at a glance and report "off by one column to the right" instead of just "it doesn't work."
The cascade mechanic logic Discovered through play that a perfectly descending staircase triggers 100% chain upgrades. Communicated the game mechanic precisely enough for the AI to turn it into a priority-based decision algorithm.
When to stop and simplify Recognized that the sophisticated v4 cascade sorter introduced more fragility than it was worth. Made the call to run v1 in practice. That judgment call is engineering, not just coding.
Knowing what "correct" looks like Every test run, the inventor could evaluate the output against the game state. "It merged the wrong pair" or "it scanned row 0 but missed rows 1 and 2" — that ground truth evaluation came entirely from the human.
06 — When It Worked

The Cascade in Action

Despite the complexity challenges, there were clear moments of success. The cascade behavior — drag one tile onto a matching tile and watch the whole board upgrade — worked exactly as predicted. This is what it looked like:

Before cascade
Before. T26 sits at col 0 next to T25, T24... A second T26 exists in the lower rows. Bonus: 2259Q.
After cascade
After one drag. Every cell upgraded by one tier. Bonus jumped from 2259Q to 2638Q. The algorithm predicted this outcome; the board delivered it.

The board state the inventor was working toward — shown below — required precise scanning across 45 cells and reliable tier recognition up to T33. That's where the scanning accuracy and template matching work paid off directly.

Ideal cascade board
The target board state. 15 columns × 3 rows forming a complete descending staircase from T27 down to T1. Six cascade triggers queued in the lower rows. Each one fires automatically when dragged to position (0,0).
07 — For Hardware Inventors

The Same Loop, Different Problem

The architecture of this tool — capture screen, read values, make a decision, act with the mouse — is not unique to games. It appears in almost every hardware development workflow that touches a computer.

01

Visual inspection on a production line

A camera watches parts moving past. The tool scans for defect signatures using template matching — the same TemplateBank used here for sushi badges. When a match exceeds a confidence threshold, it flags the part and logs the image. No machine vision library expertise required to get the first prototype running.

02

Reading instrument panels without an API

A power supply shows voltage and current on an LCD. No USB data port. The same OCR pipeline used to read sushi tier numbers reads the instrument display — same crop, threshold, and Tesseract call. Values log to CSV every second. The FuelReader class from this project does exactly this for the game's resource counter.

03

Automating a test rig's GUI

A piece-testing machine has a Windows UI with Start, Stop, and a Pass/Fail badge. Replacing manual clicks with the pyautogui layer used here means the rig runs unattended: reads the badge color, clicks Next on Pass, pauses for human review on Fail. The decision logic is simpler than the sushi combiner — two states instead of fifty tile values.

04

Rapid BOM extraction from datasheet tables

PDFs rendered to images. The grid-scan loop treats each table row as a "cell." Template matching identifies header formats. OCR reads the values. Output goes to a spreadsheet or directly to a KiCad BOM. The same "scan grid → read values → log" loop that runs 48 cells of sushi handles 20 rows of a pin-assignment table.

08 — Honest Assessment

What This Workflow Is and Isn't

✓ What it genuinely delivers

A working prototype of something non-trivial in under an hour. Code that you can read, modify, and extend without starting over. An architecture that separates config, perception, decision, and action — ready to swap pieces as needs change. A collaborative partner that writes boilerplate, suggests alternatives, and explains tradeoffs on demand.

⚠ What it requires from you

Domain knowledge. You must know what "correct" looks like in your problem space. You must be able to evaluate output — not as a programmer, but as someone who understands the domain. You must be willing to push back when something isn't working and describe what's wrong precisely enough for the AI to respond usefully.

✓ Where it saves the most time

Boilerplate: config systems, GUI scaffolding, file I/O, logging, threading. Library discovery: the AI knows which Python library handles screen capture, which handles OCR, which handles mouse control — and writes the glue code between them. Error diagnosis: paste a traceback and get a root cause, not a Stack Overflow link.

⚠ Where it still needs you

Calibration to your specific physical setup. Judgment calls about when a "theoretically better" approach is too complex to be reliable. Testing against real hardware. Recognizing when the AI's confident output is wrong because you're the only one who can see the screen.

The Takeaway for Inventors

This is not a story about AI replacing skilled people. It's a story about a skilled person with a clear problem getting a capable coding partner on demand. The inventor's background in image processing let them recognize OCR failures, diagnose alignment issues, and evaluate cascade behavior — none of that came from the AI.

What the AI provided: speed. The architecture that would have taken days to research and write took an hour of conversation. The debugging that would have required forum searches and documentation reading happened in real time. The pivot from OCR to template matching was a one-exchange decision.

Something worked in an hour. It wasn't perfect. It was real. That's the point.