Speech Solutions

Datatang provides clients with complete speech synthesis technology. Datatang is committed to collecting world’s language corpus and processing voice contents including gender,emotion, and speech transcription. Datatang offers high-quality voice data for Automatic Speech Recognition (ASR), Text to Speech (TTS), etc.

A customized speech data will consider the following:
  • Device: mobile/laptop/high fidelity microphone
  • Recording scenarios: in car/in quiet room/on street
  • Uploaded Contents through Internet: personal information/audio data
Transcription/ Annotation
Datatang is experienced in speech transcription and annotating unstructured information from audio data.
A typical procedure for speech transcription-annotation includes:

Step 1:
Machine transcription to generate text from speech.

Step 2:
Annotator proofreads results from step 1.

Step 3:
Annotator labels speech features (speaker’s gender, accent, time stamp, etc.)

Datatang offers solution package including professional devices, studios, and broadcasters of multiple languages to select from.
Case Study

Client Expectation :A Client requires to collect speech data from 1000 Chinese native speakers at home for smart home research.

Datatang Solution
To deliver the exact data expected,Datatang captures essential features that our clients care most,such as a variety of recorders(accent,age,gender,etc.),recording environment,microphone locations,noise conditiongs.An explicit data collection includes:

Selection of Recorders

1. Total Number: 1000

2. Gender Balance: F/M=50%:50%

3. Accent Balance:

Recorders from 7 dialect areas

including Beijing, Tianjin, Pearl River Delta, Yangtze River Delta Region

 4. Age Balance:
 16-30 (50%)
 30-50 (40%)

 50-70 (10%)

Recording Device

1. Various arrays of Microphone (far or near field)

including the current major arrays of Microphone:

6 Microphone-annular array, and 6+1 Microphone-annular array

2. Mobile Phones (near field)
at least 8 phone model

Environment Set-Up

1. Room Selection

Three types of room: living room, bedroom, and kitchen.

Recorded in 10 set of houses.

Size of room: small (15m²-20m²), medium (20m²-30m²), large (30m²-40m²)

2. Microphone Location

Four recording points on the speech direction and from the source: 0.5m, 1m, 3m, 5m

3.3. Noice Collection

Quiet environment

Daily noise at home: human voice noise, TV noise, household appliance noise

