C S T - RO CornerSoft Technologies - Bucharest
 
 

    Products >> Information Delivery System

Company Profile
Services
Products
Community Access
Technology Corner
Jobs
Talk to us
Download
Links
Search


General Purpose, Neural Networks Based, Text-to-Speech System (E-talk)

E-talk is a text-to-speech (TTS) system for the Romanian and English languages. Starting from any arbitrary ASCII-encoded source text, it produces a spoken equivalent. This is a vocal signal having a sampling frequency of 16 KHz and a dynamic range of 16 bits per sample. Currently, E-talk has three voices: one male voice for the Romanian language and two voices (male and female) for the English language. As the development continues, more voices will hopefully be added in the near future. E-talk has a simple, yet intuitive user interface that allows source text editing and saving, or loading a pre-edited text, and also enables the control of the main synthesis options, such as: pitch, volume, speed, pause duration or the degree of stress emphasis. On a Pentium Pro at 200 MHz PC, for a source text having from 50 to 1000 words, the synthesis process takes no more than 7.5 seconds. The speed dramatically improves on present mid-range to top-level machines.

E-talk uses neural networks models at several stages of the synthesis process, such as: conversion from letters to phonemes, stress recognition and prosody generation. This approach has the advantage of a great flexibility concerning the addition of new languages or voices. At the low level stage of the synthesis process, E-talk uses the concatenation of diphones for producing the vocal signal.

E-talk is written entirely in ANSI C, thus being highly portable. The current implementation is made on the popular Windows (W9x, NT, 2000) platform. A terminal-oriented port for Linux x86 is also available.

Spoken E-mail System (SES)

SES is a system that enables users to hear their e-mail by means of phone (wired or mobile). Basically, SES manages a (potentially) large number of telephone line connections, accepting from each of these user requests, in terms of DTMF (keypad) codes or voice-issued commands. For any incoming connection, from the client's specified e-mail server(s), SES brings the e-mail messages using standard mail delivery protocols such as POP3 or IMAP4. Then, it converts them to their spoken equivalent (a flow of vocal samples) via E-talk TTS technology and delivers them over the appropriate telephone line(s). Using simple commands, clients can control the speed or the volume of the vocal signal, they can cancel the play or they can (re)play the desired message. Prior to hear a whole message, the clients may browse through all e-mail messages, hearing only the header (sender name, the date and the subject).

SES employs a scalable hardware and software architecture that we developed as a framework for any phone-based information delivery system. The hardware structure is built around a PC core architecture, bundled with an appropriate number of computer telephony interfaces (CTI). One or more network adapters are also provided, enabling SES to connect to e-mail servers anywhere in the Internet. The software structure relies essentially on the strong multithreading facilities of the core operating system (Windows NT 4.0, in our pilot system). SES also integrates a multithreaded version of the E-talk text-to-speech engine.

VOICE RECOGNITION SOFTWARE

Over the last year we developed an Automatic Voice Recognition (ASR) system aimed at the recognition of a small vocabulary made of about 64 words (commands), in a user independent mode. The system uses a wavelet transform for the analysis of the vocal signal and some original neural networks models at the decision phase. Being able to learn from examples, the system is easily adaptable to various recording conditions. Two distinct operating modes are being taken into consideration: the recognition of a word within a continuous speech flow (key-word spotting), and the recognition of isolated words (one or more spoken words separated by significant silence intervals).

The system is written in ANSI C and implemented on Windows and Linux x86 operating platforms. The first application we foresee is to "drive" phone-based information delivery systems (such as spoken e-mail, call centers, etc) by means of user simple vocal commands. The system is still under development, but we hope to bring it to a commercial level by the end of the year.

Click here to download this section as PDF file.

/\ TOP of Page 

Virtual Enterprise Management System
  © 2002 CST GROUP

ISO 9001 Certified

Microsoft Certified Partner, Romania, Eastern Europe