Name: Audio 2 Text 2 Image Generation with Analog & Cloudflare Worker AI
Rating: 1.6 (4546 reviews)
Author: dalenguyen

Audio 2 Text 2 Image Generation with Analog & Cloudflare Worker AI

Dale Nguyen - Apr 4 -

- Dev Community

This is a submission for the Cloudflare AI Challenge.

What I Built

This is simple app where you generate images from text input.

Demo

Demo link: https://cloudflare-challange.pages.dev/

My Code

You can check my code at: https://github.com/dalenguyen/cloudflare-challenge

Journey

This is an interesting challenge since I haven't used CloudFlare Pages to deploy web applications. Turns out that, the deployment process is really straightforward and can be done via Cloudflare dashboard.

Another thing is that this's done with Analog - a full-stack Angular meta framework which means that you create an entire application with full support from backend.

Here is the stack detail:

Analog
Nx Workspace
Github
Cloudflare Pages
Worker AI
@cf/bytedance/stable-diffusion-xl-lightning for text to image model generation
@cf/openai/whisper for audio to text
uform-gen2-qwen-500m for image to text

Multiple Models and/or Triple Task Types

I combined three 3 models to do different tasks that support image generation:

Audio to text: listen to voice command and apply to the input field
Text to image: generate image from text input
Image to text: provide further description on generated image