next up previous contents index CD CD Algorithms
Next: Exercises Up: Graph Algorithms Previous: War Story: Nothing but

War Story: Dialing for Documents

 

I was part of a group visiting Periphonics, an industry leader in building telephone voice-response systems. These are more advanced versions of the Press 1 for more options, Press 2 if you didn't press 1 telephone systems that have come to blight everyone's lives in recent years. We were being given the standard tour when someone from our group asked, ``Why don't you guys use voice recognition for data entry. It would be a lot less annoying than typing things out on the keypad.''   

The tour guide reacted smoothly. ``Our customers have the option of incorporating speech recognition into our products, but very few of them do. User-independent, connected-speech recognition is not accurate for most applications. Our customers prefer building systems around typing text on the telephone keyboards.''

``Prefer typing, my pupik!'', came a voice from the rear of our group. ``I hate typing on a telephone. Whenever I call my brokerage house to get stock quotes, I end up talking to some machine, which asks me to type in the three letter code. To make it worse, I have to hit two buttons to type in one letter, in order to distinguish between the three letters printed on each key of the telephone. I hit the 2 key and it says Press 1 for `A', Press 2 for `B', Press 3 for `C'. Pain the neck if you ask me.''

``Maybe you don't really have to hit two keys for each letter?'' I chimed in. ``Maybe the system could figure out the correct letter from context?''

``There isn't a whole lot of context when you type in three letters of stock market code.''

``Sure, but there would be plenty of context if we were typing in English sentences. I'll bet that we could reconstruct English sentences correctly if they were typed in a telephone at one keystroke per letter.''

The guy from Periphonics gave me a disinterested look, then continued the tour. But when I got back to the office, I decided to give it a try.

It was clear that not all letters were equally likely to be typed on a telephone. In fact, not all letters can be typed, since `Q' and `Z' are not labeled on the standard American telephone. Therefore, we adopted the convention that `Q', `Z', and space were all on the * key. We could take advantage of the uneven distribution of letter frequencies to help us decode the text. For example, if you hit the 3 key while typing English, you were more likely to have meant to type an `E' than either a `D' or `F'. By taking into account the frequencies of a window of three characters, we could predict the typed text. Indeed, this is what happened when I tried it on the Gettysburg Address:  

enurraore ane reten yeasr ain our ectherr arotght eosti on ugis aootinent a oey oation aoncdivee in licesty ane eedicatee un uhe rrorosition uiat all oen are arectee e ual

ony ye are enichde in a irect aitil yar uestini yhethes uiat oatioo or aoy oation ro aoncdivee ane ro eedicatee aan loni eneure ye are oet on a irect aattlediele oe uiat yar ye iate aone un eedicate a rostion oe uiat eiele ar a einal restini rlace eor uiore yin iere iate uhdis lives uiat uhe oation ogght live it is aluniethes eittini ane rrores uiat ye rioule en ugir

att in a laries reore ye aan oou eedicate ye aan oou aoorearate ye aan oou ialloy ugis iroune the arate oen litini ane eeae yin rustgilee iere iate aoorearatee it ear aante our roor rowes un ade or eeuraat the yople yill little oote oor loni renences yiat ye ray iere att it aan oetes eosiet yiat uhfy eie iere it is eor ur uhe litini rathes un ae eedicatee iere un uhe undiniside yopl yhici uhfy yin entght iere iate uiur ear ro onaky aetancde it is rathes eor ur un ae iere eedicatee un uhe irect uarl rencinini adeore ur uiat eron uhere ioooree eeae ye uale inarearee eeuotion uo tiat aaure eor yhici uhfy iere iate uhe lart eull oearure oe eeuotioo tiat ye iere iggily rerolue uiat uhere eeae riall oou iate eide io

The trigram statistics did a decent job of translating it into Greek, but a terrible job of transcribing English. One reason was clear. This algorithm knew nothing about English words. If we coupled it with a dictionary, we might be on to something. The difficulty was that often two words in the dictionary would be represented by the exact same string of phone codes. For an extreme example, the code string ``22737'' collides with eleven distinct English words, including cases, cares, cards, capes, caper, and bases. As a first attempt, we reported the unambiguous characters of any words that collided in the dictionary, and used trigrams to fill in the rest of the characters. We were rewarded with:  

eourscore and seven yearr ain our eatherr brought forth on this continent azoey nation conceivee in liberty and dedicatee uo uhe proposition that all men are createe equal

ony ye are engagee in azipeat civil yar uestioi whether that nation or aoy nation ro conceivee and ro dedicatee aan long endure ye are oet on azipeat battlefield oe that yar ye iate aone uo dedicate a rostion oe that field ar a final perthni place for those yin here iate their lives that uhe nation oight live it is altogether fittinizane proper that ye should en this

aut in a larges sense ye aan oou dedicate ye aan oou consecrate ye aan oou hallow this ground the arate men litioi and deae yin strugglee here iate consecratee it ear above our roor power uo ade or detract the world will little oote oor long remember what ye ray here aut it aan meter forget what uhfy die here it is for ur uhe litioi rather uo ae dedicatee here uo uhe toeioisgee york which uhfy yin fought here iate thus ear ro mocky advancee it is rather for ur uo ae here dedicatee uo uhe great task renagogoi adfore ur that from there honoree deae ye uale increasee devotion uo that aause for which uhfy here iate uhe last eull measure oe devotion that ye here highky resolve that there deae shall oou iate fide io vain that this nation under ioe shall iate azoey birth oe freedom and that ioternmenu oe uhe people ay uhe people for uhe people shall oou perish from uhe earth

If you were a student of American history, maybe you could recognize it, but you certainly couldn't read it. Somehow we had to distinguish between the different dictionary words that got hashed to the same code. We could factor in the relative popularity of each word, which would help, but this would still make too many mistakes.

At this point I started working with Harald Rau on the project, who proved a great collaborator for two reasons. First, he was a bright and peristent graduate student. Second, as a native German speaker he would believe every lie I told him about English grammar.

  figure3969
Figure: The phases of the telephone code reconstruction process  

Harald built up a phone code reconstruction program on the lines of Figure gif. It worked on the input one sentence at a time, identifying dictionary words that matched each code string. The key problem was how to incorporate the grammatical constraints.

``We can get good word-use frequencies and grammatical information using this big text database called the Brown Corpus. It contains thousands of typical English sentences, each of which is parsed according to parts of speech. But how do we factor it all in?'' Harald asked.

``Let's try to think about it as a graph problem,'' I suggested.

``Graph problem? What graph problem? Where is there even a graph?''

``Think of a sentence as a list of phone tokens, each representing a word in the sentence. For each phone token, we have a list of words from the dictionary that match it. How can we choose which one is right? Each possible sentence interpretation can be thought of as a path in a graph. The vertices of this graph will be the complete set of possible word choices. There will be an edge from a possible choice for the ith word to each possible choice for the (i+1)st word. The cheapest path across this graph is the right interpretation of the sentence.''

  figure4491
Figure: The minimum-cost path through the graph defines the best interpretation for a sentence  

``But all the paths look the same. They have the same number of edges. Wait. Now I see! To make the paths different, we have to weight the edges.''

``Exactly! The cost of an edge will reflect how likely it is that we will want to travel through the given pair of words. Maybe we can count how often that pair of words occurred together in previous texts. Or we can weight by what part of speech each word is. Maybe nouns don't like to be next to nouns as much as they like being next to verbs.''

``It will be hard to keep track of word-pair statistics, since there are so many of them. But we certainly know the frequency of each word. How can we factor that into things?''

``We can pay a cost for walking through a particular vertex that depends upon the frequency of the word. Our best sentence will be given by the shortest path across the graph.''  

``But how do we figure out the relative weights of these factors?''

``Well, try what seems natural to you and then we can experiment with it.''

Harald incorported this shortest-path algorithm. With proper grammatical and statistical constraints, the system performed great, as reported in [RS96]. Look at the Gettysburg Address now, with all the reconstruction errors highlighted:

FOURSCORE AND SEVEN YEARS AGO OUR FATHERS BROUGHT FORTH ON THIS CONTINENT A NEW NATION CONCEIVED IN LIBERTY AND DEDICATED TO THE PROPOSITION THAT ALL MEN ARE CREATED EQUAL. NOW WE ARE ENGAGED IN A GREAT CIVIL WAR TESTING WHETHER THAT NATION OR ANY NATION SO CONCEIVED AND SO DEDICATED CAN LONG ENDURE. WE ARE MET ON A GREAT BATTLEFIELD OF THAT WAS. WE HAVE COME TO DEDICATE A PORTION OF THAT FIELD AS A FINAL SERVING PLACE FOR THOSE WHO HERE HAVE THEIR LIVES THAT THE NATION MIGHT LIVE. IT IS ALTOGETHER FITTING AND PROPER THAT WE SHOULD DO THIS. BUT IN A LARGER SENSE WE CAN NOT DEDICATE WE CAN NOT CONSECRATE WE CAN NOT HALLOW THIS GROUND. THE BRAVE MEN LIVING AND DEAD WHO STRUGGLED HERE HAVE CONSECRATED IT FAR ABOVE OUR POOR POWER TO ADD OR DETRACT. THE WORLD WILL LITTLE NOTE NOR LONG REMEMBER WHAT WE SAY HERE BUT IT CAN NEVER FORGET WHAT THEY DID HERE. IT IS FOR US THE LIVING RATHER TO BE DEDICATED HERE TO THE UNFINISHED WORK WHICH THEY WHO FOUGHT HERE HAVE THUS FAR SO NOBLY ADVANCED. IT IS RATHER FOR US TO BE HERE DEDICATED TO THE GREAT TASK REMAINING BEFORE US THAT FROM THESE HONORED DEAD WE TAKE INCREASED DEVOTION TO THAT CAUSE FOR WHICH THEY HERE HAVE THE LAST FULL MEASURE OF DEVOTION THAT WE HERE HIGHLY RESOLVE THAT THESE DEAD SHALL NOT HAVE DIED IN VAIN THAT THIS NATION UNDER GOD SHALL HAVE A NEW BIRTH OF FREEDOM AND THAT GOVERNMENT OF THE PEOPLE BY THE PEOPLE FOR THE PEOPLE SHALL NOT PERISH FROM THE EARTH.

  figure4256
Figure:   Our telephone-code reconstruction system applied to various text samples

While we still made a few mistakes, the results are clearly good enough for a variety of applications. Periphonics certainly thought so, for they later licensed our program to incorporate into their products. Figure gif shows that we were able to reconstruct over 99% of the characters correctly on a megabyte of President Clinton's speeches, so if Bill had phoned them in, we would certainly still be able to understand it. The reconstruction time is fast enough, indeed faster than you can type it in on the phone keypad.

The constraints associated with many different pattern recognition problems can be formulated as shortest path problems in graphs. In fact, there is a particularly convenient dynamic programming solution for these problems known as the Viterbi algorithm, which is used in speech and handwriting recognition systems. Despite the fancy name, all the Viterbi algorithm is doing is solving a shortest path problem. Hunting for a graph formulation for any given problem is always a good way to proceed.   


next up previous contents index CD CD Algorithms
Next: Exercises Up: Graph Algorithms Previous: War Story: Nothing but

Algorithms
Mon Jun 2 23:33:50 EDT 1997