Lena Santamarta*
Telia Promotor AB
Abstract
This paper describes the linguistic preprocessing and transcription component of the Swedish reverse directory assistance service launched in November 1997. After a brief overview of the system the approaches to different problems in the directory listings are explained.
Introduction
One of the first applications using text-to-speech for the public that was thought of was the so called "reverse directory assistance service". That is when the user has a telephone number and wants the listing information, commonly the name and address of the holder. In this kind of systems the input is very simple (either using speech recognition or by pressing the telephone pad) and the output is too diverse to be handled with canned speech. In November 1997 Telia, the Swedish Telecom, launched a reverse directory assistance service called "Telia Namnupplysning". Similar services have been in operation in the USA and in Italy for some years and reports from that work were used as inspiration for the current work.
Spiegel and Winslow (1996) point out that this kind of services require a high intelligibility as the user is not familiar with what is going to be said and the semantic context is low. As the synthesis to be used in the Telia service was a concatenation synthesis it was supposed to be intelligible enough. However, the quality of automated services using TTS depends not only on the quality of the synthesizer but also on how the service is built, on how the data is preprocessed and on the transcription quality. This paper describes how some of these problems were solved in the Swedish system, with focus on the listing data preprocessing.
The Swedish Reverse Directory Assistance Service
The system has a client-server structure. The main application takes care of the interaction with the user and calls different resources. Those resources are: the telephone directory database, the preprocessing and transcription module and the TTS-server. The directory database contains information about all of the Telia telephone customers. The TTS is a server using the Infovox synthesis language rules and an MBROLA diphon database. The database includes non-Swedish sounds for covering foreign words, mostly English. The preprocessing and transcription module will be described in the next section.
Briefly, the system works as follows:
The application communicates with the user with canned massages. The user is asked to press the number required. The number is then sent to the Telia directory database. If it is a non-secret number, the listing data is returned as a string. The string is then sent to the preprocessing and transcription module that returns the transcribed data in delimited fields. The transcriptions are sent to the TTS-server that returns a sound file to be played to the user. After listening to the information the user may have it repeated or choose to have different parts of it spelt out.
Problems to be solved
As said above and pointed out by others, two of the factors that influence in the quality of reverse directory assistance services are the pronunciation of names and the preprocessing of the listing data. In the next two sections a general description of the solutions to these problems in the Swedish system is given and it is followed by descriptions of how personal and company names are handled by the system
Name pronunciation
The pronunciation of names by TTS-systems is problematic as names often do not follow common pronunciation rules, have archaic spelling or foreign origin. While developing the American system a synthesizer was trained to pronounce names and thus applying a "rules first, exceptions in the lexicon" approach. As we had the results from the Onomastica project (Gustafson 1996) we decided to apply a "lexicon first" approach. The Swedish Onomastica files contain about 180 000 transcribed names which are classified into first names, surnames, street names and place names. We added about 15 000 new entries covering brand names, common words, acronyms, abbreviations and multiword names. All the material was put into a database with a morphological analyser that manages to find inflected forms and compound words. The database functions as a server that takes a word and returns the transcription and a tag set (part of speech tags, tags classifying the different names and a language tag). If there are more than one interpretation of the word all of them are returned. In this paper, the whole database system will be referred to as "the lexicon".
Listing processing
Spiegel (1993a) lists a number of properties of the telephone directories that made them unsuitable as input to a TTS-system. We studied the Swedish directory with those in mind and also looked for new problems. We found that the Swedish listings were case sensitive, had field delimiters and where classified differently for private customers and companies. However, as there are thousands of people making changes every day and has been so for at least half a century, there are errors, inconsistencies and peculiarities. From the results of quantitative studies of the Swedish directory we decided what resources we needed to make the listing suitable for input to a general TTS-system. The result was the preprocessing and transcription module mentioned above, that we call the Name Transcription Module (NTM).
The NTM consists of two parts, the first has the words as domain, the second has the different fields or even the whole entry as domain. In the first part there are three different devices taking care of the lexicon lookup. The first is specialised in personal names, the second in addresses and the third in company names. These devices are differentiated because they do not only call the lexicon but have different resources to treat the data in a proper way and to take care of words not found in the lexicon. The second submodule is called the "postprocessing submodule" (PP) as it processes the data with the help of the information about the words gathered in the previous part. It contains a set of rules written in a Prolog based language developed for this purpose. There are different rules for different fields and different subscribers. The tasks of the PP are: to disambiguate homographs, to delete unwanted information, to move misfielded names (or parts of names) and to stress or unstress words.
Personal names
Examining the private listings most problems we found concerned the coding of multiword surnames, i.e. names consisting of two different surnames with or without hyphen and surnames consisting of more than one word. There were also many cases of two different subscribers having the same number. The goal was to present the names in the same order as a human operator would do. Beside reordering the PP rules for personal names disambiguate homographs and remove superfluous elements as punctuation marks and numbers that often appeared in the listings but did not have any significance.
This table shows how the different names are reordered
in the PP. The fields are always presented to the user in the same order:
firstname name cosub.
| In | Out |
| name: Person firstname: von Brick Maria | name: Person von Brick firstname: Maria |
| name: Person- firstname: Martin David | name: Person-Martin firstname: David |
| name: Person firstname: Per cosub: Maria | name: Person firstname: Per & Maria cosub: |
| name: Person firstname: Per
cosub: Maria Pettersson |
name: Person firstname: Per
cosub: Maria Pettersson |
| name: Person firstname: Per
cosub: Maria Pettersson- |
name: Person firstname: Per
cosub: Maria Pettersson-Person |
Company names
The company names are more complicated than the personal ones. First, company names are very heterogeneous, from one word (commonly a brand name) to a complex noun phrase. The names could be in different languages and even in more than one language at the same time, commonly Swedish and English. The variation of how names are written into the Telia customer database is also greater for company names than for personal names. While examining the company listings we found problems such as: non-standard abbreviations, the same abbreviation abbreviated in many different ways, ambiguous abbreviations, acronyms written part or whole in lower case, scrambled word order, lots of numbers with different functions, words containing numbers, all upper case listings, misfielded words, extra comments and other things.
Many of these problems had to be faced before lexicon lookup. Detection algorithms for acronyms, abbreviations and numbers were developed. The acronyms are normalized to upper case. If the acronym is not found in the lexicon it is rewritten into a transcription with only a space between each letter. The abbreviations are expanded to the full form if they are not ambiguous and to default forms if they can be expanded in different ways. A detected abbreviation that is not in the dictionary is rewritten into a transcription without periods. The number detector normalizes some common names containing digits that are written in many different ways (e.g. Q8). It also classifies the numbers as being part of an ordinal expression (e.g. 1: st) or as being a cardinal. The numbers are also classified as being either Swedish or English. Swedish cardinal are transcribed by the TTS-system and English numbers are transcribed by lexicon-lookup.
The rest of the problems are solved in the PP. One of the major problems was to find correct order of scrambled names. This is done using both punctuation clues and grammatical information. Commonly when the order of a name is scrambled this is signaled by having a comma between the end and the part that is moved. Thus we had to disambiguate the commas, between being a real comma or being a comma indicating that part of the name had been moved. If there were more than one comma in the name only the last one had to be disambiguated. When the name was written in several fields we had to decide if the words written in the firstname field should be added at the beginning or end of the contents of the name field.
Examples of in- and output of the reordering rules:
name: Gräs, Träd och Stenar --> name: Gräs, Träd och Stenar
name: Röda Korset, Svenska --> name: Svenska Röda Korset
name: Röda Korset firstname: Svenska --> name: Svenska Röda Korset firstname:
name: Lisa Nilssons firstname: Butik --> name: Lisa Nilssons Butik firstname:
name: Nilssons firstname: Butik, Lisa --> name: Lisa Nilssons Butik firstname:
Another major task was to disambiguate between English and Swedish homographs. The language disambiguation is done by looking at all words in the name, if there is a word that only exists in English that will trigger the disambiguation to English of all language ambiguous words. If all words (two or more) in the name are language ambiguous, the name is supposed to be English. All other cases are classified as defaults, i.e. Swedish.
Examples of in- and output from the language disambiguation rules:
name:Import[S/E]1 &[S/E] Export[S/E] Inc.[E] --> name:Import[E] &[E] Export[E] Inc.[E]
name: Import[S/E] &[S/E] Export[S/E] AB[S] --> name: Import[S] &[S] Export[S] AB[S]
The disambiguation of abbreviations is done using the other words in the name to disambiguate the ambiguous word. E.g. the abbreviation "förs" in the listings could stand for "församling" (religious community) or "försäkring" (assurance). If another religious word such as God, church, Jehova or Jesus is part of the name the community interpretation is chosen. If disambiguation fails there is an entry in the lexicon marked as the default one.
Example: name: Jesu förs.[fösäkring/församling] --> name: Jesu församling
Marking prosodic structure
Each field of the listing is read out as unit, i.e. following the intonation of declarative sentences. Further, we marked what we thought were important words to be stressed and unmarked function words to get no stress. For private subscribers we marked the surnames and for companies we marked the brand names and surnames. In some cases we used the synthesizers default comma prosody marker to differentiate two units in the same field. Stressing the important part of the information will give a more accurate prosodic structure and help to improve the intelligibility of the synthetic speech.
Summary and conclusions
This papers describes the structure of the linguistic preprocessing and transcription module of the Telia reverse directory assistance service. Laboratory tests showed that without the NTM it was almost impossible to understand what name and address were presented. Today, some thousands calls per day are handled by the system and the customers find the service useful.
Notes
1. "[S/E]" means that the words are tagged for both Swedish and English. Each interpretation has its own tag set and its own transcription.
References
Gustafson J. 1996. A Swedish name pronunciation system, TMH, KTH, Stockholm.
Spiegel M. 1993a. Coping with telephone directories that were never intended for synthesis applications. Proc. of ESCA workshop on Applications of Speech Technology, 19-22, Bavaria.
Spiegel M. 1993b. Using the ORATOR synthesizer for a public reverse-directory service: Design, lessons, and recommendations. Proc. of Eurospeech ?93, 1897-1900, Berlin.
Spiegel M. and Winslow E. 1996. Database preprocessing and human-interface issues in reverse directory assistance (ACNA) services. Proc. of IVTTA-96, 105-110, Basking Ridge.
Yarowsky D. 1996. Homograph disambiguation in text-to-speech synthesis. In van Santen J. et al. (ed): Progress in Speech Synthesis, 157-172, NY : Springer.