Unification-Based Parsing Applications for Intelligent Foreign Language Tutoring Systems

L. Kirk Hagen
HUM 2305
Fall 1998
In this paper I describe a computer-based intelligent tutoring system which I have developed for students of French as a second language (FSL). This system, called FrenchWriter, is modular in design; its components include 1) a grammar reference driven by hypertext, 2) a bilingual dictionary with a rule-based verb conjugation module, 3) a vocabulary drill with error-specific feedback, and 4) a grammar checker (called GrammarMaster) built on a unification-based natural language processor. The first three of these modules are examples of software already familiar to those who have some experience with computer-assisted language learning (CALL), and my discussion of them will therefore be limited. I intend instead to devote most of my paper to the grammar checker since this is where the tutorial's novelty is to be found. Following an overview of some of the problems and possibilities of parsing and intelligent tutoring, I will give a sketch of FrenchWriter's architecture. Then I discuss the fundamentals of unification-based parsing. Finally, I give some specific examples of how the grammar checker interacts with FSL students. I will show that a number complex and historically problematic aspects of grammatical analysis like conjunctions, reflexive binding, phrase embedding, and dislocated, missing, or superfluous parts of speech can be handled straightforwardly by unification-based parsing algorythms.

The Scope of Natural Language Processing (NLP) and CALL

As a preface let me provide some background on the considerations that motivate the use of parsing technology in the foreign language classroom, as well as on some of the computing and linguistic considerations that ought to restrict its scope. The traditional view in CALL research is that NLP's most valuable potential contribution to foreign language (FL) tutoring would be its support of open-ended writing activities, e.g. short biographies, narratives, essays, mock letters, and the like. Although assignments like these are generally "structured" in that they come with guidelines about what sort of discourse is expected, it is not possible to enumerate all of the conceivable sentences which would fulfill the assignment's requirements. Thus simple drills or other tutorials based on string-matching algorithms alone are of little use, because one obviously cannot enter arbitrarily many sentences into memory for purposes of comparison. Ideally one would prefer an intelligent, rule-based, interactive tutorial capable of accommodating an infinite variety of input and thus offering the possibility of versatile, long-term and intensive individualized tuttoring.

Regrettably, artificial intelligence (AI) has yet to make a successful entrˇe in the world of computer-based language tutoring, and the technology required for the mission I have just described is not commercially available. If it is to become a part of a curriculum it must be developed on-site. This presents no small challenge, given the historical lack of interaction between linguists/pedagogues, programmers, and end-users. Many of the early foreign language publications on parsing were programmatic in that they amounted to little more than wish lists about what a successful NLP-based tutorial could do for FL education if ever one were built. Without a detailed account of a parser's architecture, or of precisely what sort of grammatical constructions prototypes could deal with, there was little for subsequent research little to build on. Computer programmers, for their part, have often worked independently of linguists and language instructors, and for that have on occasion relied on a rather tenuous grasp of the complexities of languages and the special problems second languages pose for adult learners. For instance, one textbook on AI (Charniak & McDermott, 1985) proposed the following phrase structure rule to handle reflexive constructions. While this schema works for the simplest case in which a verb is immediately followed by a direct object, as in ŅJohn washed himself,Ó which the authors cite as an illustration (p. 191), it is patently inadequate for more complex cases in which the reflexive pronoun is at a greater distance from the subject, as in

2) John is probably going to think about washing himself.

Such difficulties in reflexive agrement have in fact been well-known to both linguists and language instructors for some time.
The upshot is that it has never been clear exactly what FL pedagogy should reasonably expect of NLP. Most of the published commentary on the matter has had to do with what not to expect. The following admonition from Marty (1984), one of the pioneers in CALL, is a good example (cf. Leech, 1986 as well):

As far as language teachers are concerned, I do not see any possibility that we will ever have a computer program that, for example, would judge free written expression, would perform a phonemic analysis of a studentÕs oral response, or would understand oral free expression and respond coherentaly. (160)

In fact, the field of applied linguistics is somewhat notorious for overstating the computerÕs prowess, and anyone contemplating on-site development would do well to bear in mind that if AI research has taught us anything over the past few decades, it is that the success of an intelligent tutoring system is usually inversely related to breadth of its knowledge domain.

To have a realistic idea of how ambitious one can afford to be in the development of intelligent CALL software, it will help to divide NLP into three classes, following linguistic tradition. Regardless of what use it is put to, a language parser may be 1) syntactically intelligent, if it can characterize a segment of discourse as well-formed or ill-formed, and assign a grammatical structure to those segments that are well-formed, 2) semantically intelligent, if it can characterize a segment of discourse as true or false, given some set of suppositions about the world, or 3) pragmatically intelligent, if it is able to characterize discourse as appropriate or inappropriate in some specific circumstance, given the intentions of the author, the shared world knowledge of author and audience, their conventions of retoric, and so forth (cf. Hagen 1992).

If by "judging free written expression" Marty means software that can pass judgment on the semantic intelligibility, or on the pragmatic appropriateness, of student input, then one would be hard-pressed to disagree. To illustrate some of the problems CALL developers would have to look forward to, let us consider a wholly plausible segment of written discorse in English:

Madeleine wondered if the pharmacy was open. I told her she was crazy because it was snowing outside.

Syntactically intelligent software would tell us merely that no English grammar rules have been violated in these sentences; a modest but nonetheless worthwhile objective if one is primarily interested in grammar-checking software. A semantically intelligent program, on the other hand, would know what each sentence claims about the world (that Madeleine was curious about some pharmacyÕs hours of business, that white ice crystals were falling from clouds, and so on). This in itself would be a formidable accomplishment, but it would still miss the point. To appreciate fully the implications of these two short statements, a machine would have to understand that the writer was trying to dissuade Madeleine from traveling, that travel by car is dangerous during snowstorms of a certain severity, that the pharmacy is question was not within MadeleineÕs walking distance ("walking distance in general" is not a sufficiently unambiguous concept for a computer), that what she planned to buy at the pharmacy was not important enough to warrant a trip in cold weather, or whatever. Although AI is a contentious topic, it is probably safe to say that most who work under its rubric would admit, if pressed, that this is not simply a matter of encoding the relevant information into a computerÕs memory. The machine would likely require both the cognitive capacities and the life experiences of a human. Clearly these are hopelessly sanguine asparations.

In short, creating intelligent CALL tutorials broadly construed to encompass pragmatic and semantic aspects of language use is too intimidating a venture to contemplate for the time being. There has nonetheless been some recent progress within the more restricted domain of grammatical analysis. For instance Loritz (1992) developed an instructional parsing system for Chinese, Russian, Japanese, and English ("GPARS") using an augmented transition network (ATN), which, along with definite clause grammars (DCGs, cf. Labrie & Singh, 1991; Norvig 1991, pp. 690-714) has historically been the most popular approach to NLP (cf. for example Charniak & McDermott, 1985; Sampson, 1986; and Germain 1992). Loritz is cautious about his results. He notes that ATN parsers are "versatile and fast, and...can be applied to a wide range of natural language tasks," but adds that

[i]n education, sophisticated interactive grammar-checkers built on instructional parsing systems might eventually be used for direct student instruction, but we do not expect such use will soon be widespread. (17)

Loritz's appraisal of instructional parsing systems is fairly typical among CALL specialists; the widely-held belief is that highly complex syntactic structures such as those mentioned in the introduction simply do not allow for satisfactory parsing and error analysis beyond a very elemantary level.

More recently, however, ATN parsing of the sort Loritz describes has fallen somewhat out of favor in AI: the current interest is in unification grammars (Haas, 1989; Norvig, 1993, pp. 655-750; Zajac, 1992) and object-oriented programming (OOP, cf. Mullen, 1989; Norvig, 1992, pp. 434-459; Zajac, 1992). Zajac (1992) summarizes the appeal:

Combining object-oriented approaches to linguistic description with unification-based grammar formalisms...is very attractive. On one hand, we gain the advantages of the object-oriented approach: abstraction and generalization through the use of inheritance. On the other hand, we gain a fully declarative framework, with all the advantages of logical formalisms... (160)

Indeed, unification grammar shows signs of unprecedented cooperation between linguists and programmers. Among other things, there is a family of linguistic theories closely aligned with unification grammar--namely Generalized Phrase Structure Grammar (GPSG, cf. Gazdar et. al., 1985; Fisher, 1989; Daelemans & Gazdar, 1992; ) and Head Driven Phrase Structure Grammar (HPSG, cf. Pollard & Sag, 1987, Pollard & Sag 1994; Carpenter, 1991)--which provides a sound theoretical linguistic framework in which NLP development can take place. At the very least, this more promising state of affairs suggests that machine-based grammar-editing for second languages is still an avenue worth pursuing.

Overview of FrenchWriter and GrammarMaster

Before going into detail about the how GrammarMaster's parser works, a quick overview of the tutorial in its entirety is in order. FrenchWriter is a modular program for developed in HyperCard environment for Macintosh computers with at least 25 MHz. microprocessor. Its components are designed to interact with each other as well as with commercial word processing software like WordPerfect and ClarisWorks. Figure 1 shows its overall structure. The base component is an online grammar reference with 160 separate screen displays for approximately 100 French grammar rules linked by hypertext, which allows quick and easy linking of related grammar topics.

The component called GrammarMaster is the one that uses a natural language parser. To begin a session, a user calls up an exercise by selecting a button on screen. The session begins with a general topic, for example "Tell us about yourself". Students can then select a START button on screen, which prompts them to enter some background information--for instance their names, the names of a few friends and family members, and the name of the city where they live. Students then begin to write their sentences. For suggestions, they have access to a set of questions, e.g. "Where do you live?", "What classes do you have?," "What are your friends like?." and so on, although they are still free to write in whatever they choose, provided they use the vocabulary supplied in a lexicon on the right side of the screen (cf. Figure 2). The lexicon for the exercise illustrated in Figure 2 contains approximately 500 words.

When a user enters a sentence, the parser attempts to assign a grammatical structure to the input and flags any errors it detects. As an illustration, suppose that a user enters a description of his sister which includes an error in subject/predicate agreement, as illustrated in Figure 2. The number following the section symbol (¤) is a hyperlink to the FrenchWriter base module. If a user clicks on that number a grammar explanation of subject/predicate agreement will appear on screen. If no errors are detected the user is prompted to save the sentence in a file field and move on to the next sentence, as Figure 3 shows.

The buttons available to the student throughout a session include i) SPELL, which does spell checking and recommends words in the event one entered by the student is not in the lexicon, ii) CHECK, which provides an account of the various parts of speech of the userÕs response. For instance in the example in Figure 3 CHECK returns the following:

Word 1 is the SUBJECT of the sentence. 2. Word 2 is the MAIN VERB of the sentence. 3. Word 3 is the predicate.

iii) FILE, which stores responses entered during a session, and iv) PRINT, which allows all the responses in the file field to be printed at sessionÕs end. There are also hyperlinks connecting the lexicon to an external bilingual dictionary (cf. Figure 1), so that selecting an unfamiliar word in the lexicon will display an English equivalent. The average speed of a parse on a 33 MHz machine is approximately 0.3 seconds per word in a string.

Object-Oriented Programming and Unification Grammar

Now let us turn our discussion to the details and crucial procedures of the parser. To clarify some of the terminology used in this section, let me first point out that GrammarMaster is a particular FSL exercise that runs on a particular parser. In principle, some other type of parser could be used to drive the exercise, and since parsing takes place in memory without showing up on screen, an end-user might not notice any difference if the two were equally equipped for the task at hand. In any case, GrammarMaster is driven by an object-oriented, unification-based parser called HANOI. Zahedi (1993) provides a good working definition of object-oriented programming: In modeling knowledge or programming systems, OOP abstracts from reality by choosing the components (objects) of the real world that matter to the problem at hand, and disregards the irrelevant aspects. In this process, OOP comes close to the way we see the world, as interacting objects. The process of abstraction in object-oriented programming (OOP) is through the selection of relevant objects, and the identification of their relevant features, attributes, or variables. Objects with similar attributes are classified into a class or objects. (289)

Unification grammar has its underpinnings in the work of Gerald Gazdar and others during the late 1970s and early 1980s (Gazdar, 1982; Gazdar & Pullum, 1981; Pollard, 1984). Their research evolved into a fairly homogeneous group of theories during the mid 1980s with the publication of Generalized Phrase Structure Grammar (Gazdar et. al., 1985) and Information-Based Syntax and Semantics (Pollard & Sag, 1987). The distinguishing characteristics of this type of framework--at least for purposes of this discussion--are as follows:

i. The grammar operates on a sets of linguistic objects. The highest level in the HANOI hierarchy is input, which is simply an ordered set of words which a user enters into a particular field. At the next level of the hierarchy is the phrase--e.g. verb phrase, prepositional phrase, sentence, etc. A phrase is any string of words having a grammatical structure. For instance "my your at go the" is a string but it is not a phrase, and "to go to the park" is both a string and a phrase. The next level on the hierarchy is the word, and at the most elementary level is the feature, which is simply a label for some linguistic attribute of a word (cf. Keller 1993, pp. 14-59 for a survey of feature logic). Conceptually these are unordered sets of features. As a matter of convenience, however, they are ordinal features in HANOI's lexicon. This makes it easier to refer to and modify features during a parse.

These feature specifications are never visible on screen; students merely see the word adorer listed alphabetically in a scrolling field (Figure 2). HANOI is a bottom-up parser, and as such requires a mechanism for passing syntactic information from one level of the hierarchy to the next. Its object-oriented design serves just such a purpose in that it allows objects to undergo successive transformations from words to phrases in the course of a parse by preserving or altering features along the way, according to the exigencies of the grammar.

ii. The grammar operates on partial information structures. For illustrative purposes let us imagine for a moment a primitive grammar which treats the phrase (the table) as a combination two features. A parser based on such a simple description would quickly fail the mission we have sketched so far because in French articles must also carry information about NUMBER and GENDER. In other circumstances, however, the parser should be able to "disregard irrelevant aspects" of input, to use Zahedi's words, and to make decisions based on relevant but partial information of objects. For instance, it will recognize "to eat her soup" as a grammatically correct verb phrase without reference to the number and gender of the direct object because verbs in the infinitive do not show any inflectional changes that reflect those properties. A well-formed object is one that is a permissible instance or extension of a general rule.

iii) The grammar makes heavy use of the operation on sets of features called unification, which is defined in Pollard & Sag (1987) as follows:

...if A, B, and C are feature structures, we call C the unification of A and B, written A v B, provided C is the least informative feature structure which is at least as informative as A and at least as informative as B. (p. 35)

To illustrate, the definite article in French is overtly marked as plural, but is not marked as masculine or feminine. Conversely, the word course ("course/courses") is always masculine, but it is not overtly marked as singular or plural. Instead of burdening lexical searches with, say, one masculine and one feminine form, or one singular and one plural form, HANOI's lexicon includes one entry for each but omits unmarked features.

HPSG, which was created to describe first languages, places an important restriction on unification, namely that two categories A and B fail to unify if they contain mutually inconsistent information. Intuitively this means, for example, that in French as a first language there is no such thing as a noun phrase that is both masculine and feminine. However, there are indeed such phrases in second languages; i.e. "errors." A parser designed to accommodate second languages thus requires a slight qualification on the notion of unification. A very common type of error among FSL students--i.e. article/noun agreement, as in the phrase "the table" --will illustrate. Strictly speaking, this is not a phrase at all, if by phrase we mean "a well-formed grammatical unit," though needless to say, a CALL program that returned a message like "not a well-formed grammatical unit" as error feedback would not be of much help to an FL student. When HANOI detects contradictory features, it preserves the features on the major part of speech (in the case of noun phrases, the noun) and inserts the contradictory feature on a minor part of speech like an article into an error stack along with a corrective message. Once the parse is finished, HANOI accesses its error stack and returns its feedback.

iv. HANOI is a cyclical parser in that it analyzes strings repeatedly until it fails or succeeds in determining the overall grammatical structure of the input. As it analyzes a string like

8) Paul lives in London

it first determines that "in" and "London" meet the structural description of a prepositional phrase and then combines the two into a constituent.

During a subsequent cycle it determines that "live in London" is a verb phrase, and on its final pass it will analyze subject/verb agreement and assign the structure "sentence" to the entire string. The number of cycles required for a parse is determined by the number of words in the string. For instance, let us say that L1 in (9) is the initial "line" of a parse, i.e. each item in brackets represents a word entered by the user. Suppose that in the course of some parse line(L+1) were identical to line(L). Since the same procedures which operate on line(L) also operate on line(L+1), it follows that line(L+2) and indeed all subsequent lines would be identical. In other words, any given line must contain at least one fewer constituent than its predecessor if a cycle is to produce any novel output. This in turn means that if the number of cycles is equal to the number of words, the parser will be able to assign a structure to any string so long as it has adequate rules as part of its code. By the same token, if at any point in the course of a parse some line(L) is identical to line(L-1), then the string either cannot be or already is completely parsed, and the procedure is aborted.

Error Analysis: Conjunctions

Now that we have laid out the fundamental operations of HANOIÕs parser we are in a better position to show some concrete examples of how it interacts with students of French, and of what sort of grammatical contingencies it must be prepared for. Let us begin with the analysis of subject/verb agreement in strings like (8) and (10),

9) Marc and Paul live in London.

and focus on the final cycle of the parse for the for the time being. Suppose the relevant algorithms were more or less as follows. Having determined that "live" is a verb in the third person singular, the parser undertakes a search for a noun to the left of the verb. Finding Paul, which is also third person singular, it determines that subject and verb agree, and its work is therefore finished. This simple strategy would succeed only so long as we excluded the conjunction "and" from the lexicon. Otherwise the parser would not detect errors in cases where a verb in the singular has a compound noun phrase as its subject, as in (10), which includes a string (in italics) identical to the one that is judged correct in the case of (8).

Excluding conjunctions from the lexicon is unacceptable on pedagogical grounds, since even elementary-level FSL students should be expected to make frequent use of them. Simplifying for expository purposes, HANOI's solution to (10) is to generate during initial cycles a single part of speech ("Marc and Paul") which is specified as plural. This is what the computer will use in subsequent cycles to evaluate subject/agreement. The output is the feedback shown in Figure 4, where the section symbol and number (¤2) once again are hyperlinked to a short lesson in the grammar module.

Summary

Let us sum up by trying to establish a few general criteria for evaluating parsers like the one described in this paper. Following Allen (1995), we should stipulate that these must include minimally 1) generality, i.e. "the range of sentences the grammar analyzes correctly," 2) selectivity, "the range of non-sentences it identifies as problematic, and 3) understandability, "the simplicity of the grammar itself" (p. 44).

A parser's performance on the first two of these involve reasonably objective standards: the parser either flags grammatical errors or it does not. Consequently it is comparatively easy to pass judgment on them. Allen's third criterion a bit more subjective since there is still widespread disagreement over what constitutes a "simple" grammar. There is no question that the machinery of formal linguistic theory will continue to play a central role in NLP. Yet a common--and not wholly unjustified--complaint in second language pedagogy is that modern syntactic theory has reached a level of abstraction that has rendered it all but opaque to language instructors. But this is by and large a function of the theoretician's objectives. In the case of the most popular of modern theories--Government-binding--the goal is "to depict exactly what one knows when one knows a language: that is, what has been learned, as supplemented by innate principles" (Chomsky, 1986, p. 24). It is, in Chomsky's words, the study the "internalized" or acquired language, and thus properly falls under the rubric of cognitive science and learning theory. Much of the unwarranted resentment against Chomsky-style linguistics frequently expressed in second language pedagogy may be due the false assumption that Chomsky and his disciples somehow intend to exclude all other forms of linguistic inquiry. There is, however, nothing in Chomsky's writing to support such a view, as Newmeyer (1983, pp. 137-148) pointed out a long time ago. In any event, the grammar one uses to drive a parser is properly understood as an example of what Chomsky (1986) calls an "externalized language" (E-Language), in the sense that the construct is understood independently of the properties of the mind/brain...the linguist is free to select the grammar one way or another as long as it correctly identifies the E-language. (20)

Expressed in these terms, "correctly identifying" a suitable grammar for a parser does not in itself entail any questions of cognition. Only practical matters ought to have a bearing on the development of a linguistic framework for FL parsing. For instance, we might add to Allen's list "the extent to which one can generalize," or the extent to which the grammar is applicable across second languages, so that the code that drives a parser of one second language should adaptable to a parser of other second languages with minimal modifications.

These considerations, rather than any epistemological questions raised by language acquisition, were paramount in the development of the software I described above. For instance, by construing the entire grammar in terms of linguistic objects, HANOI parsing can in principle be implemented in any other environment that accommodates object-oriented programming. And OOP carries with it the added practical advantage of code reusability, which is less taxing on memory and facilitates debugging. There remains a substantial amount of work to be done on GrammarMaster and its parser. As of now, for instance, there is a HANOI parser of French only, since the tutorial was created.

specifically for the users I mentioned earlier in this paper. Its lexicon, moreover, needs to be augmented to handle a greater variety of writing tasks. Nonetheless HANOI can now accommodate many of the thorny structural problems of second languages that have had an inhibitory effect on CALL parsing applications, and its design is general enough--and its expressive capacity powerful enough--that software for other second languages could be developed with reasonable speed and facility.

References

Allen, J. (1995). Natural Language Understanding (2nd ed.). Redwood City, CA: Benjamin/Cummings.

Carpenter, B. (1991). The Generative Power of Categorical Grammars and Head-Driven Phrase Structure Grammars with Lexical Rules. Computational Linguistics, 17, 3, 301-313.

Charniak, E., & McDermott, D. (1985). Introduction to Artificial Intelligence. Reading, MA: Addison-Wesley.

Chomsky, N. (1986). Knowledge of Language. New York: Praeger.

Eckman, F. (1994). Local and long-distance agreement in second-language acquisition. In E. Tarone, S. Gass & A. Cohen (Eds.), Research Methodology in Second-Language Acquisition (pp. 207-225). Hillsdale, NJ: Lawrence Erlbaum Associates.

Fisher, A. (1989). Practical Parsing of Generalized Phrase Structure Grammars. Computational Linguistics, 15, 3. 139-148.

Gazdar, G., Klein, E., Pullum, G. & Sag, I. (1985). Generalized Phrase Structure Grammar. Cambridge, MA: Harvard University Press.

Gazdar, G. (1982). Phrase Structure Grammar. In P. Jacobson & G. Pullum (Eds.), The Nature of Syntactic Representation (pp. 131-186). Dordrecht: Reidel.

Gazdar, G., & Pullum, G. (1981). Subcategorization, Constituent Order, and the Notion Head. In M. H. Moortgat, H. van der Hulst, & T. Hoekstra (Eds.), The Scope of Lexical Rules (pp. 107-123). Dordrecht: Foris.

Germain, E. (1992). Introducing Natural Language Processing. AI Expert, 7,8. 30-35.

Haas, A. (1989). A Parsing Algorithm for Unification Grammar. Computational Linguistics, 15,4. 219-232.

Hagen, L. K. (1994). Constructs and measurement in parameter models of second language acquisition. In E. Tarone, S. Gass & A. Cohen (Eds.), Research Methodology in Second-Language Acquisition (pp. 61-87). Hillsdale, NJ: Lawrence Erlbaum Associates.

Hagen, L. K. (1992). Reasonably intelligent tutoring systems for higher order language skills. In F. Attia, A. Flory, S. Hashemi, G. Gouard¸res, & J. P. Marciano (Eds.), EXPERSYS-92 (pp., 53-58). Gournay sur Marne: IITT International.

Keller, Bill. (1993). Feature Logics, Infinitary Descriptions and Grammar. Stanford: CSLI.

Labrie, G., & Singh, L.P.S. (1991). Parsing, Error Diagnostics and Instruction in a French Tutor. CALICO Journal, Autumn 1991, 9-25.

Leech, G. (1986). Automatic grammatical analysis and its educational applications. In G. Leech & C. Candlin (Eds.), Computers in English Language Teaching and Research (pp. 205-217). New York: Longman.

Loritz, D. (1992). Generalized Transition Network Parsing for Language Study: the GPARS System for English, Russian, Japanese and Chinese. CALICO Journal, 10, 1, 5-22.

Marty, F. (1984). Computer-Aided Instruction: Language Teachers and the Man of the Year. In S. Savignon & M. Berns (Eds.), Initiatives in Communicative Language Teaching: A Book of Readings (pp. 155-165). Reading, MA: Addison-Wesley.

Seneff, S. (1992). TINA: A Natural Language System for Spoken Language Applications. Computational Linguistics, 18, 1. 61-86.

Zahedi, F. (1993). Intelligent Systems for Business: Expert Systems with Neural Networks. Belmont, CA: Wadsworth.

Zajac, R. (1992). Inheritance and Constraint-Based Grammar Formalisms. Computational Linguistics, 18, 2. 159-182.