Skip to main content

Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

·1 min read

by Frank Doka

Article

Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

Phase 3 adds the core processing: translate the text, synthesize speech, store the audio file, and update the database.

What I Built

  1. Processing Lambda — Subscribes to the SNS topic from Phase 2. When triggered, it reads the submission from DynamoDB, runs the translation, synthesizes speech, and writes everything back.
  2. Amazon Translate — Converts the submitted text into the user's selected target language. The Lambda calls Translate before passing the result to Polly, so the synthesized speech matches the chosen language.
  3. Amazon Polly — Converts the translated text into realistic spoken audio. Polly selects the appropriate voice based on the target language automatically.
  4. S3 audio storage — The generated audio file is uploaded to S3 with a path organized by user ID and timestamp for clean retrieval.
  5. DynamoDB update — After uploading to S3, the Lambda updates the original DynamoDB record with the S3 URL and changes the status from "processing" to "complete."

Processing Flow

SNS message → Lambda reads DynamoDB record
  → Amazon Translate (text → translated text)
  → Amazon Polly (translated text → audio stream)
  → S3 (store audio file)
  → DynamoDB (update status + S3 URL)

What's Next

Phase 4 builds the retrieval flow — a Lambda that serves the results back to the React frontend for playback.

More articles

Text-to-Speech Serverless App — Phase 4: Frontend Integration & Retrieval

Completing the end-to-end flow — retrieval Lambda, React audio playback, and the full submit-to-listen pipeline.

Read more

Text-to-Speech Serverless App — Phase 2: Serverless Backend & Data Storage

Building the submission pipeline — API Gateway receives text, Lambda writes to DynamoDB, and SNS triggers async processing.

Read more