E-talk
uses neural networks models at several stages of the synthesis process,
such as: conversion from letters to phonemes, stress recognition and
prosody generation. This approach has the advantage of a great flexibility
concerning the addition of new languages or voices. At the low level
stage of the synthesis process, E-talk uses the concatenation of diphones
for producing the vocal signal.
E-talk is written entirely in ANSI C, thus being highly portable. The
current implementation is made on the popular Windows (W9x, NT, 2000)
platform. A terminal-oriented port for Linux x86 is also available.
Spoken E-mail System (SES)
SES is a system that enables users to hear their e-mail by means of
phone (wired or mobile). Basically, SES manages a (potentially) large
number of telephone line connections, accepting from each of these user
requests, in terms of DTMF (keypad) codes or voice-issued commands.
For any incoming connection, from the client's specified e-mail server(s),
SES brings the e-mail messages using standard mail delivery protocols
such as POP3 or IMAP4. Then, it converts them to their spoken equivalent
(a flow of vocal samples) via E-talk TTS technology and delivers them
over the appropriate telephone line(s). Using simple commands, clients
can control the speed or the volume of the vocal signal, they can cancel
the play or they can (re)play the desired message. Prior to hear a whole
message, the clients may browse through all e-mail messages, hearing
only the header (sender name, the date and the subject).
SES employs a scalable hardware and software architecture that we developed
as a framework for any phone-based information delivery system. The
hardware structure is built around a PC core architecture, bundled with
an appropriate number of computer telephony interfaces (CTI). One or
more network adapters are also provided, enabling SES to connect to
e-mail servers anywhere in the Internet. The software structure relies
essentially on the strong multithreading facilities of the core operating
system (Windows NT 4.0, in our pilot system). SES also integrates a
multithreaded version of the E-talk text-to-speech engine.
VOICE RECOGNITION SOFTWARE
Over the last year we developed an Automatic Voice Recognition (ASR)
system aimed at the recognition of a small vocabulary made of about
64 words (commands), in a user independent mode. The system uses a wavelet
transform for the analysis of the vocal signal and some original neural
networks models at the decision phase. Being able to learn from examples,
the system is easily adaptable to various recording conditions. Two
distinct operating modes are being taken into consideration: the recognition
of a word within a continuous speech flow (key-word spotting), and the
recognition of isolated words (one or more spoken words separated by
significant silence intervals).
The system is written in ANSI C and implemented on Windows and Linux
x86 operating platforms. The first application we foresee is to "drive"
phone-based information delivery systems (such as spoken e-mail, call
centers, etc) by means of user simple vocal commands. The system is
still under development, but we hope to bring it to a commercial level
by the end of the year.
Click
here to download this section as PDF file.
/\
TOP of Page