145
УДК ӘОЖ 81'374
D.A. Karagoishiyeva, M. Zhanabekova, Sh.A. Ospankulova
Al-Farabi Kazakh National University,
Almaty, Kazakhstan
danel.karagoish@mail.ru
EXPERIENCE AND ISSUES OF SPEECH SYNTHESATOR CREATION
The main aim the article is to characterize and analyze the difficulties we faced in the process
of creating speech synthesizer which produces a speech in the Kazakh language. Synthesis of the
speech is a technology which allows to sound (to read) the text by a natural human voice. It is clear
that speech synthesis is novelty for the Kazakh language. It is obvious that there is hardly a speech
synthesizer in the Kazakh language. One cannot find even an artificial voice in our Mother tongue
in the Internet.
Key words: speech syntheziser, reduction, sound, diphthongoid, utterance, synchronize
Speech synthesizers are software libraries (text-to-speech engine) which give other programs
an opportunity to produce a speech from the text. Speech synthesizers differ by the quality of a
produced speech, opportunities to change speech settings and can consist of several voices
including female and male voices. Speechsynthesizers are created on the basis of Speech
Application Programming Interface (Speech API, SAPI) – software library for Windows which
allows to identify and synthesize a voice in applications for the operating system. SAPI 4.0
wascreatedin 1998, nowadays in modern operating systems SAPI 5is installed [1].
The main aim the article is to characterize and analyze the difficulties we faced in the process
of creating speech synthesizer which produces a speech in the Kazakh language.
Synthesis of the speech is a technology which allows to sound (to read) the text by a natural
human voice.
Where is it possible to apply synthesis of the speech? Telephony: in the contact center and for
creation of the IVR menu
The most frequent and effective application of speech synthesis is a scoring of information in
IVR systems of the contact centers. First of all, this is a scoring of dynamic information which is
individual for each client. Scoring of sound rollers by the synthesized voice will be also useful in
the case of quickly changing information, that is, balance of warehouse positions, repertoire of
movie theaters, emergencies of Internet service provider, etc. Scoring of information on the site
You can connect our "cursor" of synthesis of the speech to any the Internet site/web-portal
and sound necessary information aloud.
It can be useful when a visitor of the site has no opportunity to be in front of the monitor or
he/she can listen to the text and at the same time work in other program. Sometimes there is a need
to read the text to a person who isn't near the computer. Also it will be useful for people with
problems on eyesight. Scoring of documents and office information in corporate systems
Our speech synthesis can sound any information put in a system. For example, it can be
educational exercises, medical complexes, warehouse systems of the account. Scoring of books and
their listening on mobile devices
We developed the Reader mobile application which allows to read aloud any book loaded into
the device. Scoring of videos and educational courses
You can sound subtitlings to videos and you won't need the announcer any more [2].
It is clear that speech synthesis is novelty for the Kazakh language. It is obvious that there is
hardly a speech synthesizer in the Kazakh language. One cannot find even an artificial voice in our
Mother tongue in the Internet. And on the 13
th
of March of the current year workers of the
organization of the Rehabilitation Center for children and teenagers with restrictions on mental and
physical development collaborative with Russian scientists produced a device for disabled people
146
with eyesight problems which can voice texts in the Kazakh language too. Specialists of the Russian
company “Elita Group” in the field IT headed by a group of scientists with Olga Yakovleva and
Kazakhstani initiators including R.A. Suleimenova – Head of the Rehabilitation Center for children
and teenagers with restrictions on mental and physical development, E.D. Suleimenova,
D.A. Karagoishiyeva, S.B. Bektemirova, M.A. Zhanabekova - teachers of the Faculty of Philology,
Literary Studies and World Languages of al-Farabi Kazakh National University worked on
producing speech synthesizer in the Kazakh language. Russian specialists were responsible for the
technical side of the Computer Speech synthesis apparatus, and Kazakhtani linguists – for
preparation of the necessary material for the device [3], [4].
Speech synthesizers started to be used even more often in everyday life. Speech synthesizers
as it is already clear according to the name only, carry out synthesis of the speech, that is format the
written text into the oral.
Thanks to speech synthesizers it is possible to learn new foreign words with the correct
pronunciation, to read books without distracting from the affairs or, for example, in a transport.
Initially the organizations specializing on equipment for people with eyesight problems were
engaged in development of such programs. Now, any user can download one of programs, install it
on the computer or a telephone and to synthesize the speech. The set of various programs, enclosed
and even the whole systems were developed for this purpose.
Here is the list of speech synthesizers:
Acapela
– one of the most widespread speech synthesizers around the world. The program
distinguishes and sounds texts more than in thirty languages. Russian is supported by two voices: a
man's voice
– Nikolay, female – Alyona. The female voice appeared considerably after man's and is
more advanced. It is possible to listen how voices soundon the official site of the program. It is
enough to choose a language and a voice, and to type the small text. By the way, the separate
dictionary of accents that allows to reach clearness of a pronunciation was developed for a man's
voice.
Vokalizer – the second in our list, but not by popularity. It has a cursor Milena from the
developer of the Vokalizer program of the Nuance company. The voice sounds very naturally, the
speech is pure. There is an opportunity to establish various dictionaries, and also to correct the
loudness, speed and an accent that is very important. As well as in a case with Akapela, the program
has various versions for mobile, automobile and computer applications. It is perfectlysuitable for
reading books.
RHVoice – a speech synthesizer was developed by Olga Yakovleva. The program sounds the
Russian texts in three voices: To Elena \and, Irina and Alexander.
ESpeak – the first version of a free speech synthesizer was launched in 2006. Since then the
company developer constantly issues more and more advanced versions. The latest version was
presented at the end of spring of 2013.
Festival – this is a complete system of recognition and synthesis of the speech which was
developed in University of Edinburgh. Programs and all modules absolutely free of charge also
extend on source system [5].
Thus, the above given list of speech synthesizers are types that are found in the Internet. If
you analyze these speech synthesizers the number of languages are limited, and you can hardly find
the Kazakh language in them.
Our work included development of a device which can read the text in the Kazakh language,
but not a speech synthesizer in the Internet. The device itself is produced in the USA. The synthesis
for a certain language is made by a local dealer. This is the result of collaborative work of the
Rehabilitation Center for children and teenagers with restrictions on mental and physical
development and Russian specialists. The aim of the speech synthesizer is to produce a Kazakh
speech on the device which is helpful for handicapped people. We have achieved our goal. As the
developers of the project we took part in a special demonstration in our turn, and had the
opportunity to test the device.
As the philologists we passed such stages in creating the speech synthesizer:
Достарыңызбен бөлісу: |