Simple Speech Synthesis with a DTS33A Module

18 Oct 2024

Introduction
Wiring
Protocol
Tones/Alerts
Python Code
Summary

Introduction

Since it was cheap, I randomly purchased a DTS33A module to try out. It produces both Chinese and English speech output on demand, using a UART connection.

The board is tiny (22 x 17 mm). It looks like the main chip might be a pre-programmed microcontroller. The two 8-pin chips on the board are an audio amplifier and a 64Mbit Flash.

Wiring

I wired it up to a USB-UART adapter, and connected the supplied speaker to it:

Protocol

The interface is 3.3V UART (115200 baud).

The protocol is pretty basic. In brief, each stream that you transmit needs to starts with a byte 0xFD, and then there should be a 2-byte length field (most significant byte first) which indicates the length of the remainder of the stream.

The stream typically contains one or more commands. Any text needs to be preceded with 0x01 0x04, which is a command that indicates UTF-8 text is to follow.

So, if you want to play out "BOO", then you'd need to transmit the following:

0xFD, 0x00, 0x05, 0x01, 0x04, 0x42, 0x4F, 0x4F

A useful command:

0x21 - status query. This single-byte command ought to return 0x4f if the DTS33a is idle and ready for input. If it returns anything else, then it's likely that speech is currently being played out.

Some instructions are sent "in-band" within the text string. For instance, to speed up the speech, you could send the text "[s9]BOO". Values from s0 to s9 are accepted.

[s0] to [s9] - speed

[t0] to [t9] - pitch

v[1] to [v9] - volume (even volume 1 is loud enough! higher may distort).

Tones/Alerts

There are a few pre-recorded tones/alerts built-in. To use those, the text string needs to contain, at the beginning, one of the following pieces of text:

ring_1 to ring_5 - these are telephone ring type sounds

alert_1 to alert_5 - these are all warning type sounds

message_1 to message_5 - these are all 'e-mail message arrived' or 'chat message arrived' types of sounds.

Python Code

I wrote a simple DTS33A Python demo, but C code for a microcontroller would be very feasible, too.

To use the code, edit the line containing portname = COM5 to suit your system.

If you wish to modify the speech, you can edit the following lines:

def maincode():

dts33a_speak("Daisy Daisy", vol = 1, speed=0, pitch=5)

dts33a_speak("Give me your answer do", vol = 1, speed=2, pitch=5)

dts33a_speak("I am half crazy", vol = 1, speed=0, pitch=3)

dts33a_speak("All for the love of you", vol = 1, speed=0, pitch=1)

dts33a_wait_for_ready()

print("Done")

See the 30-second video below for a demonstration.

Summary

It's quite easy to use a UART connection to control a DTS33A speech synthesis board. These things are a bit anachronistic these days, since a microcontroller and stored speech files usually provide far better results. Plus, as you can see from the demo, it doesn't respond very quickly. However, it's very convenient to have everything ready-made in a tiny module, so it might still have occasional uses. See the GitHub repo for the source code.

Thanks for reading!

DAB 1 month ago

Very interesting.

A nice simple implementation.

Output would be acceptable for most applications.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
shabaz 1 month ago in reply to beacon_dave

All Daisy's sound the same : ) except the last one it seems! Noise is only at the end.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
beacon_dave 1 month ago in reply to shabaz

Wonder what happens to the waveform if you were to do something like "Daisy Daisy Daisy Daisy Daisy" ?

Does it only apply emphasis to the first "Daisy" or does it vary each occurrence in some way. Also does it still place the erroneous noise after the second "Daisy" or just after the last "Daisy" ?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
shabaz 1 month ago in reply to beacon_dave

I just tried it, adding a space doesn't make any noticeable difference. However, adding a full-stop makes it sound as if the last part of the sound has been cut short. Comparing the two, I think I prefer it without a period if it's a single-sentence announcement.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
beacon_dave 1 month ago in reply to shabaz

Does it make a difference if you add a space or full stop after the second Daisy ?
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel