Skip to main content

Text-to-Speech Serverless App — Phase 2: Serverless Backend & Data Storage

·1 min read

by Frank Doka

Article

Text-to-Speech Serverless App — Phase 2: Serverless Backend & Data Storage

Phase 2 builds the submission pipeline: user text enters through the API, gets persisted, and triggers the processing Lambda asynchronously.

What I Built

  1. API Gateway — REST endpoint integrated with the Cognito User Pool authorizer from Phase 1. Only requests with a valid JWT token pass through.
  2. Submission Lambda — Receives the user's text, language selection, user ID, and timestamp. Writes a new record to DynamoDB with a "processing" status.
  3. DynamoDB table — Stores each submission: user ID, original text, target language, processing status, and (later) the S3 URL for the generated audio.
  4. SNS topic — After writing to DynamoDB, the Lambda publishes a message to an SNS topic. A separate processing Lambda subscribes to this topic, decoupling submission from synthesis.

Why SNS for Decoupling

The submission Lambda returns immediately after writing to DynamoDB and publishing to SNS. The user gets a fast response. The actual translation and speech synthesis happen asynchronously in a separate Lambda triggered by the SNS message — so a slow Polly call never blocks the API response.

What's Next

Phase 3 wires in Amazon Translate and Polly to convert submissions into spoken audio, stored in S3.

More articles

Text-to-Speech Serverless App — Phase 4: Frontend Integration & Retrieval

Completing the end-to-end flow — retrieval Lambda, React audio playback, and the full submit-to-listen pipeline.

Read more

Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage

Wiring in Amazon Translate for multilingual support and Polly for speech synthesis, with audio output stored in S3.

Read more