Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage
·1 min read
by Frank Doka
Article
Text-to-Speech Serverless App — Phase 3: AWS Polly, Translate & S3 Storage
Phase 3 adds the core processing: translate the text, synthesize speech, store the audio file, and update the database.
What I Built
- Processing Lambda — Subscribes to the SNS topic from Phase 2. When triggered, it reads the submission from DynamoDB, runs the translation, synthesizes speech, and writes everything back.
- Amazon Translate — Converts the submitted text into the user's selected target language. The Lambda calls Translate before passing the result to Polly, so the synthesized speech matches the chosen language.
- Amazon Polly — Converts the translated text into realistic spoken audio. Polly selects the appropriate voice based on the target language automatically.
- S3 audio storage — The generated audio file is uploaded to S3 with a path organized by user ID and timestamp for clean retrieval.
- DynamoDB update — After uploading to S3, the Lambda updates the original DynamoDB record with the S3 URL and changes the status from "processing" to "complete."
Processing Flow
SNS message → Lambda reads DynamoDB record
→ Amazon Translate (text → translated text)
→ Amazon Polly (translated text → audio stream)
→ S3 (store audio file)
→ DynamoDB (update status + S3 URL)
What's Next
Phase 4 builds the retrieval flow — a Lambda that serves the results back to the React frontend for playback.