CHAPTER ONE INTRODUCTION 1.1 BACKGROUND OF THE STUDY
Today there is a wide spread talk about improvement of the human interface to the computer because no longer people want to sit and read data from the monitor. Since there is a painstaking effort to be taken, this involves strain to their eyes.
The art of making PC’s talk has always entranced the human community. After all, voice is one of the best alternatives for hours of eyestrain involved in going through any document. Also Voice is a better interface when it comes to illiterate people rather than Graphic User Interface in English. So research is being done throughout the world for improving the Human Interface to the computer and one of the best options found out till date is the ability of a computer to speak to humans. Here comes the role of the Text To Speech (TTS) engines. The operation of a TTS engine falls under Speech Synthesis. Speech Synthesis is the artificial production of human speech, it is a way of applying Artificial Intelligence and is becoming one of the most important steps towards improving the human interface to the computer while Artificial intelligence in its simplest way can be defined as a branch of computer science that concerns making computers behave like humans. It is the simulation of human intelligence processes by machines, especially computer systems.
The applications of AI are broadly in three areas namely expert systems, speech recognition, and machine vision. For the purpose of this project, we will be considering the application of Artificial intelligence in Text Pronunciation this is commonly known as Text-to-Speech (TTS) synthesizer and it falls under the speech recognition as an area of application of AI. Voice/speech synthesis is a field of computer science that deals with designing computer systems that synthesize written text. It is a technology that allows a computer to convert a written text into speech via a microphone or telephone. As an emerging technology, not all developers are familiar with speech technology. While the basic functions of both speech synthesis and speech recognition takes only minutes to understand, there are subtle and powerful capabilities provided by computerized speech that developers will want to understand and utilize. Automatic speech synthesis is one of the fastest developing fields in the framework of speech science and engineering. As the new generation of computing technology, it comes as the next major innovation in man-
machine interaction, after functionality of Speech recognition (TTS), supporting Interactive Voice Response (IVR) systems.
1.1.1 OVERVIEW OF TEXT-TO-SPEECH SYNTHESIS
Text-To-Speech is a process through which input text is analyzed, processed and “understood”, and then the text is rendered as digital audio and then “spoken”. It is a small piece of software, which will speak out the text inputted to it, as if reading from a newspaper. There have been many developments found around the world in the development of TTS Engines in various languages like English, French, German etc and even in Hindi.
TTS is intended to read electronic texts in the form of a book, and also to vocalize texts with the use of speech synthesis. The TTS system gets the text as the input and then a computer algorithm which called TTS engine analyses the text, pre-processes the text and synthesizes the speech with some mathematical models. The TTS engine usually generates sound data in an audio format as the output.
The text-to-speech (TTS) synthesis procedure consists of two main phases. The first is text analysis, where the input text is transcribed into a phonetic or some other linguistic representation, and the second one is the generation of speech waveforms, where the output is produced from this phonetic and prosodic information. These two phases are usually called high and low-level synthesis. The input text might be for example data from a word processor, standard ASCII from e-mail, a mobile text-message. The character string is then pre-processed and analyzed into phonetic representation which is usually a string of phonemes with some additional information for correct intonation, duration, and stress.
The basic idea of text-to-speech (TTS) technology is to convert written input to spoken output by generating synthetic speech. There are several ways of performing speech synthesis:
The most important qualities of modern speech synthesis systems are its naturalness and intelligibility. By naturalness we mean how closely the synthesized speech resembles real human speech. Intelligibility, on the other hand, describes the ease with which the speech is understood. The maximization of these two criteria is the main development goal in the TTS field.
Speech sound is finally generated with the low-level synthesizer by the information from high-level one. The artificial production of speech-like sounds has a long history, with documented mechanical attempts dating to the eighteenth century.
The quality of a speech synthesizer is judged by its
similarity to the human voice and by its ability to be understood. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer.
1.1.2 TYPES OF TTS SYSTEMS
Most Text To Speech engines can be categorized by the method that they use to translate phonemes into audible sound. Some TTS Systems are listed below:-
1.1.2.1 Prerecorded
In this kind of TTS Systems we maintain a database of prerecorded words. The main advantage of this method is good quality of voice. But limited vocabulary and need of large storage space makes it less efficient.
1.1.2.2 Formant
Here voice is generated by the simulation of the behavior of human vocal cord. Unlimited vocabulary, need of low storage space and ability to produce multiple featured voices makes it highly efficient, but robotic voice, which is sometimes not appreciated by the users. 1.1.2.3 Concatenated
In this kind of TTS systems, text is phonetically represented by the combination of its syllables. These syllables are concatenated at run time and they produce phonetic representation of text. Key features of this technique are unlimited vocabulary and good voice. But it can’t produce multiple featured voices, needs large storage space. Various methodologies of implementation, prospects and challenges of implementation of a Kannada TTS engine with regard to speech synthesizer and its high level applications are presented here. The Implementation of this TTS is done using the concatenation method. Integral parts of a Text To Speech engine are phoneme identifier, voice mapping and speech synthesizer.
1.2 STATEMENT OF THE PROBLEM
Reading is fundamental to academic success, Higher Institution graduation and positive transition to employment. Unfortunately, most students find it difficult to read tiny printed letters especially students with disabilities; they continue to have significant reading deficits when they get to higher institutions despite the best efforts of special education intervention during their elementary years. These students frequently fall behind and become at risk for dropping out of school, failing to gather enough credits to graduate, or otherwise failing to successfully have employment or post-secondary education. This is especially true for students with disabilities such as traumatic brain injury, autism, dyslexia and other disabilities that impact reading proficiency; this led to the introduction of TTS.
Numerous tests confirm that we are inefficient listeners. Studies have shown that immediately after listening to a 10-minute oral presentation, the average listener has heard, understood and retained 50 percent of what was said.
Within 48 hours, that drops off another 50 percent to a final level of 25 percent efficiency. In other words, we often comprehend and retain only one fourth of what we hear. We all want to be more than 25 percent efficient, so there is need to continuous hearing of what one has heard before.
Also, it has been discovered that some speech synthesizers read text at a pace faster normal humans, in some cases the words pronounce cannot be comprehended by the listener and at the same time the listener cant pause the reading or repeat what was read.
1.3 AIMS AND OBJECTIVES OF THE STUDY
The Aims of this study is to develop a software that implements Text pronunciation order wise called text-to-speech synthesis. This system will perform the following objectives:
(1) Pronounce valid words of English language by concatenating these words to produce sound or speech
(2) Accept above 35,000 characters which is been typed onto the text box and reads it as an average reader that reads at least 300wpm
(3) Store the read text in the database which can be retrieved and read once and again
(4) The reading can be paused, stopped and resume mostly in the cases of interruption or when the listener did not really understand what he or she heard.
(5) The speed rate(reading pace) and volume can be adjusted to suit the listener.
1.4 SCOPE OF THE STUDY
This project is not concerned with the discussion of the internal structure of the TTS but is mainly concern with the proper pronunciation of texts which are valid vocabulary of the English Language that are typed on to the textbox of the TTS platform.
1.5 SIGNIFICANCE OF THE STUDY
This project has theoretical, practical, and methodological significance: The speech synthesizer will be very useful to any researcher who may wish to venture into the “Impact of using Computer speech program for brain enhancement and assimilation process in human beings”.
This text-to-speech synthesizing system will enable the semi-illiterates assess and read through electronic documents, thus bridging the digital divide. Text to Speech (TTS) software allows you to have text read aloud to you. This is useful for struggling readers and for writers, when editing and revising their work. You can also convert eBooks to audio books so you can listen to them on long drives. TTS can be used as a very good medium for teaching children spelling or teaching illiterate how to read because these text are seen on the screen and will be read to know to know what it is, meaning at the same time one will be learning the spellings of words and how it is been pronounced.
The technology will also find applicability in systems such as banking, telecommunications (Automatic system voice output), transport, Internet portals, accessing PC, emailing, administrative and public services, cultural centers and many others. The system will be very useful to computer manufacturers and software developers as they will have a speech synthesis engine in their applications.
1.6 DEFINITION OF TERMS
AI: It is the simulation of human intelligence processes by machines, especially computer systems.
ALOUD: In a way that can be clearly heard.
APPLICATION: the act of putting new techniques to use
DATA: These are raw facts which are not yet processed. In this project the programmer are unprocessed data because they are not put in place to give the required output. Data can also be seen as unprocessed information.
INFORMATION: Information are the processed and unprocessed fact which are yet to gather or put together, or processed already e.g this project will have no value if the reason why it is written is not yet fulfilled, but when the
TEXT: the original words of a piece of writing or a speech or the words that markup the main part of a book, magazine, newspaper, website etc.
SPEECH: A spoken expression of ideas, opinions, etc., that is made by someone who is speaking in front of a group of people.
SYNTHESIS: Something that is made by combining different things (such as ideas, styles, etc.)
PHONEME: The smallest unit of speech that can be used to make one word different from another word.
PRONUNCIATION: the way in which a word or name is pronounced or a particular person’s way of pronouncing a word or the words of a language
Can't find what you are looking for? Hire An Eduproject Writer To Work On Your Topic or Call 0704-692-9508.
Proceed to Hire a Writer »