TY - JOUR
T1 - Probabilistic Character Disambiguation for Reduced Keyboards Using Small Text Samples
AU - Arnott, John L.
AU - Javed, Muhammad Y.
N1 - Funding Information:
This research was supported by the Government of Pakistan Ministry of Science and Technology under S&T Scholarship No. HRD-87/C(4)0130-ASA(TRG). This paper is based partially on material presented at the 13th Annual RESNA Conference (RESNA ’90), “Capitalizing on Technology,” Washington DC, USA, June 15 to 20, 1990, and partially on the Master of Science thesis “Character Disambiguation in Text Input Functions for the Handicapped” by M. Y. Javed, University of Dundee, 1988, supervised by J. L. Arnott, Ph.D. The authors gratefully acknowledge the assistance of C. A. Lim in preliminary work for this project, in particular the preparation of the “frequency” keyboard layout used in this research.
PY - 1992
Y1 - 1992
N2 - Reduced keyboards are text typing keyboards which contain fewer than 26 alphabetic keys, and which may therefore be accessed and used by certain physically disabled persons more easily than a conventional QWERTY typing keyboard. Automatic character disambiguation systems enable text to be typed upon a reduced keyboard with a keying efficiency approaching one key/character, despite the fact that each key represents more than one alphabetic character. Existing disambiguation systems typically use probabilistic models of character sequences (n-grams) from representative text samples to predict the next character, and hence disambiguate among the different characters on each key. N-grams for such disambiguation models have been extracted previously from large (1 million word) text corpora. The research reported here shows that a much smaller corpus of limited domain can be used with similar results, thus facilitating development of disambiguation systems by eliminating the need for a large corpus. Four reduced keyboard layouts are compared, three of which were used in earlier research on character disambiguation in the Dutch and English languages, and a fourth based on character frequency, which achieves similar efficiency to the first three. Models of different order are compared (the order of the model being determined by the length of the longest n-grams in it), the principal result being that, while higher-order models (containing long n-grams) give better performance than lower-order models (which contain only short n-grams), lower-order n-grams can contribute significantly to the disambiguation performance of a higher-order model, and should therefore be included in order to maximize disambiguation efficiency.
AB - Reduced keyboards are text typing keyboards which contain fewer than 26 alphabetic keys, and which may therefore be accessed and used by certain physically disabled persons more easily than a conventional QWERTY typing keyboard. Automatic character disambiguation systems enable text to be typed upon a reduced keyboard with a keying efficiency approaching one key/character, despite the fact that each key represents more than one alphabetic character. Existing disambiguation systems typically use probabilistic models of character sequences (n-grams) from representative text samples to predict the next character, and hence disambiguate among the different characters on each key. N-grams for such disambiguation models have been extracted previously from large (1 million word) text corpora. The research reported here shows that a much smaller corpus of limited domain can be used with similar results, thus facilitating development of disambiguation systems by eliminating the need for a large corpus. Four reduced keyboard layouts are compared, three of which were used in earlier research on character disambiguation in the Dutch and English languages, and a fourth based on character frequency, which achieves similar efficiency to the first three. Models of different order are compared (the order of the model being determined by the length of the longest n-grams in it), the principal result being that, while higher-order models (containing long n-grams) give better performance than lower-order models (which contain only short n-grams), lower-order n-grams can contribute significantly to the disambiguation performance of a higher-order model, and should therefore be included in order to maximize disambiguation efficiency.
KW - augmentative and alternative communication (AAC)
KW - character disambiguation
KW - information theory
KW - motoric disability
KW - physical impairment
KW - prediction
KW - reduced keyboards
UR - http://www.scopus.com/inward/record.url?scp=84961422723&partnerID=8YFLogxK
U2 - 10.1080/07434619212331276203
DO - 10.1080/07434619212331276203
M3 - Article
AN - SCOPUS:84961422723
SN - 0743-4618
VL - 8
SP - 215
EP - 223
JO - Augmentative and Alternative Communication
JF - Augmentative and Alternative Communication
IS - 3
ER -