Audio 2 Text 2 Image Generation with Analog & Cloudflare Worker AI

Dale Nguyen - Apr 4 - - Dev Community

This is a submission for the Cloudflare AI Challenge.

What I Built

This is simple app where you generate images from text input.

Demo

Image description

Demo link: https://cloudflare-challange.pages.dev/

My Code

You can check my code at: https://github.com/dalenguyen/cloudflare-challenge

Journey

This is an interesting challenge since I haven't used CloudFlare Pages to deploy web applications. Turns out that, the deployment process is really straightforward and can be done via Cloudflare dashboard.

Another thing is that this's done with Analog - a full-stack Angular meta framework which means that you create an entire application with full support from backend.

Here is the stack detail:

  • Analog
  • Nx Workspace
  • Github
  • Cloudflare Pages
  • Worker AI
  • @cf/bytedance/stable-diffusion-xl-lightning for text to image model generation
  • @cf/openai/whisper for audio to text
  • uform-gen2-qwen-500m for image to text

Multiple Models and/or Triple Task Types

I combined three 3 models to do different tasks that support image generation:

  • Audio to text: listen to voice command and apply to the input field
  • Text to image: generate image from text input
  • Image to text: provide further description on generated image
. . . . . . . . . . . . . .