Text-to-Speech Serverless App — Phase 2: Serverless Backend & Data Storage
by Frank Doka
Text-to-Speech Serverless App — Phase 2: Serverless Backend & Data Storage
Phase 2 builds the submission pipeline: user text enters through the API, gets persisted, and triggers the processing Lambda asynchronously.
What I Built
- API Gateway — REST endpoint integrated with the Cognito User Pool authorizer from Phase 1. Only requests with a valid JWT token pass through.
- Submission Lambda — Receives the user's text, language selection, user ID, and timestamp. Writes a new record to DynamoDB with a "processing" status.
- DynamoDB table — Stores each submission: user ID, original text, target language, processing status, and (later) the S3 URL for the generated audio.
- SNS topic — After writing to DynamoDB, the Lambda publishes a message to an SNS topic. A separate processing Lambda subscribes to this topic, decoupling submission from synthesis.
Why SNS for Decoupling
The submission Lambda returns immediately after writing to DynamoDB and publishing to SNS. The user gets a fast response. The actual translation and speech synthesis happen asynchronously in a separate Lambda triggered by the SNS message — so a slow Polly call never blocks the API response.
What's Next
Phase 3 wires in Amazon Translate and Polly to convert submissions into spoken audio, stored in S3.