iOS Lead -> Solo Engineer

MenuGenie

Turning an AI menu translation prototype into a shipped iOS product

App Store-published iOS AI menu translation product built for real travel scenarios. I took it from a course project into a shipped, multi-region, cost-conscious system with OCR, streaming translation, and layout-preserving overlays.

App Store ShippedPrototype-to-ProductiOS ProductLayout-Preserved TranslationStreaming UXLow-Friction OnboardingCost-Conscious ArchitectureMulti-Region Routing

Tech Stack

SwiftUIFastAPICloud RunFirestoreSSECloudflare WorkersDocument AI

Nov 2025 -> now

Started in a course team, then independently pushed into a shipped product with backend convergence, multi-region deployment, and layered cost / abuse control.

Overview

MenuGenie is an App Store-published iOS menu translation product. I took an AI menu translation prototype that began in a course collaboration project and pushed it into a product meant for real travel scenarios, so users could understand foreign-language menus more quickly at the table instead of receiving a detached block of translated text.

Problem Context

The difficulty of foreign-language menus is not only translation. In a restaurant, users usually do not have time to slowly decode a complex menu. Real menus often mix vertical and horizontal text, multi-column layouts, prices placed tightly next to dish names, and inconsistent typography. If the translated result loses the original layout, users still need to remap names, prices, and descriptions by hand, so the real cognitive cost does not go down. In practice, most users simply want to know what they can eat and what they should order.

A second problem is that existing tools often stop at literal translation. When a menu contains regional dishes, unusual ingredients, or non-obvious dish names, users still need to leave the screen to search for dish context, ingredients, or allergen information, then come back and re-orient themselves on the same menu. Some tools also misplace overlays or force users to re-upload the image after switching away. MenuGenie was built to close that broken flow so users could read, decide, and look deeper into a dish within one product.

Scope

This project had two stages. In the earlier stage, it took shape as a product prototype in a course collaboration project. At that stage, I was the iOS frontend lead, responsible for turning camera capture, translation screens, overlay rendering, reading modes, and the core interaction flow into a usable iOS product prototype.

I later continued pushing the project independently and turned it from a presentable prototype into a shipped, deployable, maintainable product. That later stage included not only feature extension, but also backend convergence, performance diagnosis, Cloudflare edge traffic handling, cost protection, and additional stability and troubleshooting work on the iOS side.

What I Built

Early on, I turned backend capabilities into a complete iOS usage flow so users could move from capture into the translation screen and understand the menu through overlay rendering and multiple reading modes. I then pushed that flow further into a more productized experience by adding multi-page menu handling, history replay, map exploration, and dish context, turning MenuGenie from a one-off translation screen into a product flow that supports revisiting, comparing, and extending menu understanding.

On the system side, the earliest version used a mixed engineering structure built from Java Spring Boot, a standalone Python OCR service, Cloud SQL, and AWS S3. That setup worked during the prototype phase, but later exposed fragmented stacks, scattered deployment paths, and higher monthly cost. I then consolidated the system around GCP, converging it into a deployment model centered on FastAPI, Cloud Run, Firestore, and GCS, while also adding cloud cleanup and cost-control mechanisms so the side project could stay economical to operate.

On backend efficiency, I also handled several issues that directly affected waiting time. I broke down the OCR timeline across frontend and backend logs and confirmed that the bottleneck was not Document AI itself, but synchronous side work such as GCS uploads, thumbnail generation, and Firestore writes. I then changed task ordering and moved non-critical work behind the first visible result, which brought OCR processing time down from 10.21s to 6.36s. To make later troubleshooting and interrupted flows more manageable, I also added OSLog structured logging, task cancellation / interruption cleanup, and CameraService modularization, strengthening the project’s performance, stability, and maintenance base.

Technical Architecture

The frontend is a native iOS app. The backend is built primarily on FastAPI, uses Google Document AI for OCR, and relies on Firestore and GCS for state and storage. To support multi-region usage, I deployed the services across Cloud Run regions in Australia and Singapore, and used a Cloudflare Worker for geographic routing so the app only has to talk to a single entry point.

Key Decisions

Challenges

One core issue was the visible wait before OCR results appeared. By comparing frontend and backend logs and breaking down the full timeline, I confirmed that the bottleneck was not Document AI itself, but synchronous side work such as GCS uploads, thumbnail generation, and Firestore writes. The hard part was not only making the system faster, but first locating the true source of the delay and then changing task ordering and background processing accordingly.

Another harder issue appeared in the translation stage rather than in OCR. Translation batches were processed in parallel, but result consumption originally still followed an ordered queue model, so a slower earlier batch could block later completed batches and create classic head-of-line blocking. I later changed that ordered per-batch consumption pattern into a shared event queue, allowing results from different batches to be emitted as soon as they were available. That reduced the impact of slow batches on the visible stream and let users see readable output earlier.

Current Outcome

MenuGenie has moved from a course-stage prototype into an App Store-published iOS product with the basic conditions needed for continued deployment, maintenance, and cost control.

The project has now moved beyond proof of concept into a sustainably operating product state: the core flow is live, latency and cost have both been materially reduced, and the maintenance boundary is clear.

App Store published
OCR processing time: 10.21s -> 6.36s
Estimated monthly cost: AUD 70 -> under AUD 3

App Store

Shipped

−38%

OCR Pipeline

~−96%

Infra Cost

Explore more work

View all projects