Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

November 15, 2023·1 min read

by Frank Doka

AWS Serverless

Article

Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

Phase 3 adds the core processing: translate the text, synthesize speech, store the audio file, and update the database.

What I Built

Processing Lambda — Subscribes to the SNS topic from Phase 2. When triggered, it reads the submission from DynamoDB, runs the translation, synthesizes speech, and writes everything back.
Amazon Translate — Converts the submitted text into the user's selected target language. The Lambda calls Translate before passing the result to Polly, so the synthesized speech matches the chosen language.
Amazon Polly — Converts the translated text into realistic spoken audio. Polly selects the appropriate voice based on the target language automatically.
S3 audio storage — The generated audio file is uploaded to S3 with a path organized by user ID and timestamp for clean retrieval.
DynamoDB update — After uploading to S3, the Lambda updates the original DynamoDB record with the S3 URL and changes the status from "processing" to "complete."

Processing Flow

SNS message → Lambda reads DynamoDB record
  → Amazon Translate (text → translated text)
  → Amazon Polly (translated text → audio stream)
  → S3 (store audio file)
  → DynamoDB (update status + S3 URL)

What's Next

Phase 4 builds the retrieval flow — a Lambda that serves the results back to the React frontend for playback.

Follow me

Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

What I Built

Processing Flow

What's Next

More articles

Text-to-Speech Serverless App — Phase 4: Frontend Integration & Retrieval

Text-to-Speech Serverless App — Phase 2: Serverless Backend & Data Storage