Unknown

The research activities on natural language >rocessing of the FGCS Project are presented, linguistic phenomena are formalized in terms of :omplex structures and constraints on them. The logic programming paradigm is adopted for implementing natural language processing systems because the basic operation for the complex structures is isomorphic with respect to unification. DUALS (Discourse Understanding Aimed at Logic-based System), CIL (Complex Indeterminate Language), and JPSG (Japanese Phrase Structure Grammar) are being developed using the unification-based approach. The large-scale machine readable and understandable dictionaries are also being developed.


Introduction
In our daily we communicate with one another mainly by means of speech and writing in natural languages. We get a lot of information from books and papers.
Communication between human and computer should also be performed in the medium of natural language.
The ability of computers to understand natural language will increase their accessibility and flexibility. The Japanese fifth generation computer project aims to develop such intelligent computer systems.
The special characteristic of the Japanese language, i.e., the use of a great many Chinese characters, has made Japanese text input and processing difficult for a long time.
Recently, however, Japanese language processing technology has advanced a lot as evidenced by Japanese word processors and commercial machine translation systems. These technologies combined with artificial intelligence are expected to provide new Japanese information processing technology and a new computer culture. ICOT began research and development of the Fifth Generation Computer Systems (FGCS) in 1982. Natural language processing technology is one of the most important research themes for the FGCS Project because it is a fundamental technology for knowledge information processing and it is used for the research and development of knowledge-base, intelligent-interface and various basic application systems, such *as machine translation systems.
The results of research in the initial stage have led us to the conclusion that the logic programming framework is the most suitable for implementing natural language processing systems [6].
The linguistic phenomena in natural language can be formalized in terms of the complex structures of the grammatical features and the constraints on them, as is seen in the feature set of Generalized Phrase Structure Grammar (GPSG) [7], the functional structure of Lexical Functional Grammar (LFG) [11] and the "dags" of PATR-II [20]. The basic mechanism of logic programming is unification in Horn clause logic. Definite Clause Grammar (DCG) [18] is one of the bridges connecting the natural language processing and logic programming. Most of our research activities can be regarded as the improvement and the extension of DCG. GALOP (BUP) [12] is a bottom-up left corner parser which overcomes the drawbacks of top-down parser for DCG. A parallel model of DCG is also being developed [13].
CIL (Complex Indeterminate Language) [15,16] was developed to express and operate the complex features. CIL is an extension of Prolog.
The newly introduced "partially specified term" of CIL is suitable for representing the complex features because an extended unification is defined on two partially specified terms. The declarative constraints can be written using the freeze mechanism of CIL. Our approach for semantic analysis, which also deals with pragmatics, is based on situation semantics [1] theory. In this approach, the semantic analysis process corresponds to constructing the relations between situations, and it is implemented as an algebra of events. The merging of events is one of the basic operations and it has an isomorphic structure with respect to unification. Therefore, the basic operations in both syntactic and semantic analyses are isomorphic with respect to those of logic programming, which makes logic programming compatible with natural language processing. DUALS (Discourse Understanding Aimed at Logic-based Systems) is its application system for discourse understanding which reads stories and answers questions on the stories. JPSG (Japanese Phrase Structure Grammar) [9] is a GPSG-based grammar theory for Japanese ianguage whose basic operation is unification. The unification-based parser for JPSG has been developed. We developed some application systems, in order to verify and evaluate this fundamental technology mentioned above. Finally, the processing of large-scale language data is another important aspect of natural language processing. Dictionaries include much information about syntax and semantics which will be utilized for designing the lexicon and the knowledge-base in natural language processing systems. The following three types of machine readable and understandable dictionaries will be developed in the subproject which started in April 1986: (1) Basic Word Dictionaries: Four machine readable master dictionaries with 200,000 entries in each dictionary.
(2) Concept Classification Dictionary: A systematic dictionary for 400,000 concepts including a general thesaurus.
(3) Concept Description Dictionary: A knowledge database containing semantic descriptions of 400,000 concepts These systems mentioned above are implemented on the personal sequential inference machine PSI [17]. The programs are written in its programming language ESP [4]. This paper describes the main research activities and plans for natural language processing of the FGCS Project -CIL, DUALS, JPSG, and Machine Readable and Understandable Dictionaries.

Partially Specified Term
CIL is an extension of Prolog which was designed for the system description language of DUALS. CIL has the freeze predicate, which was originally introduced in Prolog-II [s], as a primitive predicate for realizing various lazy evaluation controls.
CIL introduces a new type of object called "partially specified term" ("partial term" for brief), which is influenced mainly by the notion of assignment developed in the situation theory of [2,3].
We understand partial term as an abstraction from the following data structures, which are widely seen in programming languages, grammar formalisms, etc.: -Herbrand term in first order logic -Association list and property list in LISP.
Frame and unit in knowledge representation.
-Record in programming languages.
Record in relational data base theory.
Category as complex feature in GPSG and functional structure of LFG.
A partial term is written in CIL like this fal/bl, where each ai is a ground term and bi is any term, possibly a partial term. The ordinal unification is extended to the partial terms. CIL can represent a semantic network even including cycles by using partial terras.
For instance, the CIL unifier solves the system of three equations A = B, A -{a/B}, and B = {a/A}, giving A = B = {a/A}, a singleton graph with a self-loop with an edge labelled a. As is easily seen, CIL unification is close to that over infinite trees in Prolog-11. The domain of CIL can be defined formally to be a set of infinite trees.

Reserved Forms in CIL
The current CIL syntax is an extension of the syntax of DEC-10 Prolog. The following symbols * i "' . 8 i » , ii , ! appearing in terms are reserved for the CIL system as follows: (1.) A term of the form X!a is equivalent as a term to the value of the slot of X whose name is a. That is, (2) A term of the form X:C with terms X and C is called a description. C should be an executable form. This term is read "X such that C". (4) CIL includes convenient forms of term which are defined as follows: symbol of arity 1 and X is a new Although the current CIL is not a full implementation of situation theory yet, it is already useful because of the introduction of partial terms and extended unification over them. Partially specified terms have general and natural descriptive power to represent various data structures of objects necessary for situation theory. The most difficult and basic problem which remains open for CIL, however, is to develop some ideas for designing a control library for constraint description. We think that the problem corresponds directly to the implementation of the constraints of situation theory.

Aimed at Logic-based Systems)
DUALS is an experimental discourse understanding system developed to build a computational model for discourse understanding. The semantic framework is situation semantics in which the sentence meanings are represented as relations between situations. DUALS aims at dealing with the following items within this framework. The latest version of DUALS was implemented in CIL. It reads a story written in Japanese language and answers various type of questions about it.
The system has the following characteristics: (1) The semantic structure is constructed with the objects used in situation theory, such as individuals, assignments, relations, locations, conditions, events, parameters, and so on.
(2) Syntax analysis is performed by the parser based on the concurrent process model called SAX (Sequential Analyzer of syntaX and semantics) [13].  [10].
(4) Plan-goal -based discourse structures are obtained by the discourse processing module. The rules to construct the discourse structures are described as constraints between events. (5) The sentence generation module generates the surface sentences from internal meaning structures using grammar rules.
Our technical approach to implementation is to build a package for extended unification in logic programming. An interesting problem, and a more theoretical challenge, is determining what kinds of unification are needed as primitives for implementing situation semantics. Grammar) Grammar is an important component of a system for natural language understanding. JPSG is a new Japanese grammar theory for Japanese language based on GPSG. GPSG is suitable for implementation in the logic programming paradigm because it is a natural language syntax theory based on context free grammar (CFG) and its basic computational mechanism is unification. Besides, GPSG has the following features:

JPSG (Japanese Phrase Structure
(a) Syntactic categories are defined as a complex feature set.
(b) Only phrase structure is used to represent grammatical information. (c) Metarules for phrase structure rules are introduced.
(d) Constraints on features are described in the syntactic principles, which make phrase structure rules general. (c) Syntax and semantics are closely related.
Since the Japanese language has a word order variation called "scrambling", GPSG cannot handle it feasibly. In order to handle the "scrambling", the subcategorization feature (SUBCAT) whose value is a set of syntactic categories, is introduced in JPSG. This is an extension of HPSG [19], Currently, the grammar formalism of JPSG is completed for basic Japanese syntax with the following characteristics: SUBCAT -This is the set of syntactic categories which a head category demands as its complements.
SLASH -This is the set of the missing categories. This feature is used in the same way as in GPSG.
(2) The following phrase structure rule is sufficient for basic Japanese syntax: (2.1) M-> D H Rule (2.1) states that mother category (M) dominates one daughter category (0) on the left and one head category (H) on the right. This simplification of the rule is achieved by describing the constraints on the features in syntactic principles.
(3) Since a new SUBCAT feature is introduced, SUBCAT  The basic operation used in JPSG is "unification" because in the syntactic principles mentioned above, the phrase "be identical to" can be replaced by "can be unified to". The parser for JPSG is being developed in CIL. JPSG and CIL are compatible because the syntactic category as feature set corresponds to a partially specified term and syntactic principles correspond to Horn clauses.

Machine Readable and Understandable Dictionaries
We presented the unification-based approach for natural language processing and its applications in previous sections. On the other hand, processing of large-scale language data is another important aspect of natural language processing. This research aims at developing a large-scale database for various natural language processing and speech processing application systems. The language database WIH be composed primarily of three Jachine-readable dictionaries: a large-scale ba sic dictionary as the master dictionary; a concept classification dictionary including a thesaurus; and a concept description dictionary containing descriptions of the meanings of concepts. Application systems utilizing these dlc tionaries will be developed including machine translation systems and speech rec ognition systems. the term "basic word" means words used in everyday speech, general ' technical terms, proper nouns, and so on.
Machine-readable master dictionaries will be developed containing these basic words.
These are the dictionary types: (1) Japanese (2) English (3) Japanese-English (4) English-Japanese Each dictionary will include about 200,000 entry words. These dictionaries will be developed in accordance with the specifications already established [14].

Concept Classification Dictionary
This dictionary will contain specifications of the relations between concepts and indicate exactly how specific concepts are classified in the concept world. Classification bases for the concept world are 'super-sub', 'whole-part', 'composition-element' and other similar relations.
The multiple inheritance mechanism will be used as well. The standard thesaurus will form a part of this dictionary. At least 400,000 concepts will be included.

Concept Description Dictionary
This dictionary will contain the meaning of each individual concept classified in the concept classification dictionary. The combination of the concept classification and the concept description will form the knowledge base for the "general world", and will be utilized in semantic and discourse analysis.

Application systems
Machine translation systems and speech recognition systems will be developed using these dictionaries. 6 Nevertheless, fruitful results and new ideas have been obtained over the four years of research to date by concentrating on the logic programming framework as discribed in this paper. The last four years has convinced us that the logic programming approach is very promising for implementing natural language processing systems. In the intermediate stage, we will continue this approach to build the subsystems that will be integrated to form the total knowledge information processing system in the final stage.