Hyper-G A Universal Hypermedia System Frank Kappe, Hermann Maurer Institute for Foundations of Information Processing and Computer Supported New Media (IICM), Graz University of Technology, Graz, Austria Nick Sherbakov Polytechnical University of St. Petersburg, St. Petersburg, Russia Abstract Hyper-G is the name of an ambitious hypermedia project currently being developed as a joint effort by a number of institutes of the IIG (I_nstitutes for I_nformation-Processing G__raz) of the Technical University of Graz1 and the Austrian Computer Society. Hyper-G is conceived as a Universal Hypermedia System. Hypertext and hypermedia systems were envisioned by pioneers like Bush (Bush, 1945), Engel- bart (Engelbart, 1963) and Nelson (Nelson, 1965; Nelson, 1981) as one way of obtaining easy to use universal information systems. Hyper-G will be the basis of a University Information System which combines concepts of information retrieval systems, documentation systems, communication and collaboration, and computer supported teaching and learning. This article focuses on the applications of hypermedia technology in university environments, and describes the ideas and concepts behind Hyper-G as far as they are related to the university applications domain. ___________________________________________________________ 1Yes, the 'G' in Hyper-G stands for G__raz. 1 1 INTRODUCTION 2 1 Introduction The creation of a modern University Information System (UIS) is an important and ambitious ap- plication of hypermedia systems. Such a system should help the scientist and administrator in the process of information retrieval (e.g. finding literature), ease the communication and collaboration of university personnel, support the instructional process, and provide a link between university and the public in general. The motivation for a tool that would e.g. support scientists is not exactly a new one. In his 1945 article "As we may think" (Bush, 1945) Vannevar Bush writes: "There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers - conclusions which he cannot find time to grasp, much less to remember, as they appear. [. . .] Mendel's concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.[. . .] Publication has been extended far beyond our present ability to make real use of the record." Since 1945, the situation has become even worse. We are now not only confronted with an increasing number of books and periodicals, but also with electronic media. On the other hand, the availability of knowledge in electronic form potentially allows us to look for specific information and browse through it more efficiently by use of computers and the right tools. And what has been said does not only apply to researchers, but to administrators as well: the mountain of legal documents, of minutes of meetings, of rules and regulations, of student research, of records of inventory or space allocation is gigantic and hard to follow without electronic help. We believe that hypermedia systems are a promising technology to alleviate the problems described. In this article, we focus on application areas of hypermedia systems within a university environment. After that, we derive a rather compact description of requirements for a concrete hypermedia system (Hyper-G), as far as they are related to the university context. We do not elaborate on hypermedia technology in general (refer to, e.g., (Conklin, 1987; Jonassen & Mandl, 1989; Maurer & Tomek, 1991a; McAleese, 1989; Marchionini & Shneiderman, 1988; Nielsen, 1990b; Tomek et al., 1991) for an overview), nor do we provide an in-depth description of the Hyper-G project (see (Kappe, 1991) for that). 2 Applications of a University Information System Support by an information system in a university environment can be roughly divided into four areas: 2 APPLICATIONS OF A UNIVERSITY INFORMATION SYSTEM 3 1. Research: According to Bush's visions we would expect a university information system to give the individual scientist immediate access to the results of other researchers. The system should offer easy-to-use access to publications, libraries and databases of interest. 2. Teaching: As teaching is one of the main responsibilities of a university, a dedicated university information system can be expected to support the processes of teaching, training and learning. 3. Administration: In this this category, the system should support access and maintenance of relevant legal documents, minutes of meetings, rules and regulations, student records, the library, room management, the organization of meetings, etc. 4. Communication: Within all of above areas, communication and collaboration of individuals or groups (e.g. between scientists, teachers and students, students and administration) should be supported in an organized way by the system. Some individual aspects are discussed using concrete applications in the forthcoming sections. We envision a hypermedia system that can be used to implement these applications, but also a myriad of others not explicitly mentioned here. The universal hypermedia system we propose is outlined in section 3. 2.1 Information Retrieval and Maintenance Providing scientists, teachers and students with information is one of the most important aspects of a university information system. The kind of information one would like to see in such a system include: o General information on the university (e.g., study programs available, courses offered including time-tables, prerequisites, registration and examination procedures, maps of the university and the environment, university services, university staff including pictures, functions, and phone number, etc.). o Specialized information on the university (e.g. statistical data like number of students registered, growth over the years; research at the university sorted both by area and institute, etc.). o Closed user group information for administrative purposes (such as budgetary data, room allo- cation, inventory, student records, legal documents, agendas and minutes of meetings). o Electronic courseware and self-testing material in the sense of (Kearsley, 1982). o Information on student problems (e.g. housing, consulting, phone numbers of students). o A number of general-purpose, scientific, and technical encyclopedias in different languages (es- pecially if the local language is not English). o Dictionaries for a number of languages. o Electronic versions of a substantial number of periodicals (at least abstracts), e.g. on CD-ROM. o The Ph.D. and Master theses of the university (at least abstracts). 2 APPLICATIONS OF A UNIVERSITY INFORMATION SYSTEM 4 o The university's library (possibly including the table of contents of the books). o Databases of interest (e.g. chemical compounds, Science Citation Index, astronomical images, etc.) o Geographic information, including maps and pictures from locations within the university and throughout the world. o Administrative information, including databases of courses, e-mail addresses, internet news groups, phone books, etc. o Access to other universities' information systems. In order to offer the user a convenient way to find information, a combination of rather sophisti- cated navigation and retrieval facilities will be necessary (e.g, offering only full-text search will not suffice). We believe that a blend of hyper-navigation, hierarchical structure of information, guided tours through the information universe, and various search facilities in conjunction with state-of-the- art user interface metaphors will serve the user better than a single technique alone. The inverse problem - getting information into the system and keeping it up to date - is one of the major obstacles on the way to a working UIS. In order to alleviate some of the problems, the system has to support the administrator in updating and removing outdated documents, finding and removing bad or obsolete links, and be able to generate and maintain links automatically. Also, it has to be as easy as possible for users to add information to the database. Interestingly, the variety of document types (text, raster and vector images, sound, video) of a hy- permedia system could make the process of document preparation less painstaking. Users could just fax e.g., the contents of a book, to a certain phone number, and the system will incorporate the fax into its database as of type image or convert it to text using optical character recognition (OCR). Unfortunately, today's OCR software delivers only about 98% of the text correctly. This problem may be overcome by so-called fuzzy database searches that will find even slightly misspelled phrases. Ei- ther way (image or text storage), the user would not have to key it in. Small messages (e.g. meeting announcements) could even be spoken to a certain phone number and be stored as sound documents. Still, the user has to somehow determine the document's position (i.e. relation to other documents) within the hypermedia context (link editing). 2.2 Connectivity No system can be assumed to start from scratch. Hence existing systems within the university have to be accessible easily and in such a way that users notice the transition as little as possible. Also, one cannot expect to keep all that information in a local university information system, let alone keeping it up to date. Consequently, the system must allow to access information kept in other systems distributed throughout the world. However, the system should not adopt the user interfaces of the remote databases, but rather present remote documents to the users in a consistent and integrated fashion. 2 APPLICATIONS OF A UNIVERSITY INFORMATION SYSTEM 5 Today, there are several on-line information services (e.g. DIALOG, BIX, CompuServe) available. These services are highly interactive, charging users on the basis of minutes spent on-line, and each has a unique user interface. Also, connectivity between such services is limited. There are at least three projects other than Hyper-G currently under development that will hopefully reduce these limitations: o The Wide-Area Information Server project currently being implemented by Thinking Machines, Apple Computer, and Dow Jones News/Retrieval (Stein, 1991) defines a public WAIS protocol which should then be adopted by information servers and end-user user interfaces. WAISes perform full-text searches (no hypertext features), can be connected to each other and used in a consistent way.2 o The Gopher project developed mainly at the University of Minnesota allows to interconnect Campus-Wide Information Systems (CWIS) that are based on a menu-like user interface meta- phor (no hyperText). The protocol allows to transmit a number of document types, you can search for keywords and connect to on-line information services.3 o The WorldWideWeb (W 3) developed at the European Particle Physics Laboratory (known as CERN) is a wide-area hypertext information retrieval initiative aiming to give universal access to a large universe of documents(Berners-Lee et al., 1992). The documents can be distributed throughout the world (like in Nelsons Xanadu) and are connected by unidirectional links.4 All concepts described above are based on the client-server model (like in Hyper-G) and there are a number of "Viewers" (clients) available for different end-user workstations (PC, Mac, NeXT, Unix, VMS). Also, they are interconnected to some extent (e.g. W 3 allows to integrate Gopher documents and offers gateways to WAIS; Gopher uses WAIS code for full-text searches). The "remote database" concept of Hyper-G allows to integrate such information servers and to make them accessible using the hypermedia paradigm (i.e. links). It is possible to create both links to and from such remote databases by means of dynamic links , even when the remote database does not adopt the hypermedia paradigm. Moreover, it is possible to create links between documents residing in Hyper-G-based university information systems installed at different locations (universities), if this should ever be the case. 2.3 Computer Aided Instruction As teaching is one of the main responsibilities of a university, a university information system can be expected to support the actions of teaching, training, learning and testing. Hypermedia systems in general have proven to be useful tools in these domains. ___________________________________________________________ 2WAIS source code is in the public domain and available by anonymous ftp from think.com in directory /wais. 3Gopher code is available free of charge by anonymous ftp from boombox.micro.umn.edu in directory /pub/gopher. A demo session can be started by telnetting to consultant.micro.umn.edu and logging in as gopher. 4W 3 code is freely available for non-commercial use in collaborating non-military academic institutes by anonymous ftp from info.cern.ch in directory /pub. A demo session can be started by telnetting to info.cern.ch. 2 APPLICATIONS OF A UNIVERSITY INFORMATION SYSTEM 6 The concrete Hyper-G implementation benefits from previous experience in this field in two ways: o Computer Aided Instruction is supported by over 500 COSTOC lessons (Makedon et al., 1987) covering topics ranging from medicine, ecology, and natural sciences to computer science. This material will be converted to Hyper-G tours (section 3.5). New lessons can be edited (as tours) with much greater flexibility. Also, students may at any time consult additional material from other tours, encyclopedias, books, etc. using automatically generated links (section 3.3). The COSTOC database was mainly created using authoring systems as described in (Kearsley, 1982) between '84-'88, recent improvements of run-time and authoring facilities using hypermedia techniques are providing interesting new vistas. o The Electronic Classroom is a more collaborative approach to computer assisted learning. Such a classroom is equipped with a number of Hyper-G workstations. A utility built into Hyper- G allows "WYSIWIS" (What You See Is What I See) distribution of the output of a Hyper-G session to a number of X-terminals (workstations) and is intended to be used in learning/training environments to provide online help from remote tutors. Two scenarios are possible: - Teachers show something to the students by distributing their output to the students' terminals. Input is taken from the teachers' terminal. At some point, the teacher may ask questions. Then, students may grab input and perform some operation (e.g. answer a question, click on some anchor). Output is distributed to the other student's and the teacher's terminal. Thus, the whole class works together to solve a problem. - Students work on their own. When they get stuck, they may ask other students or the teacher for help by sending them a copy of their screen's contents. The teacher or tutor may then grab the input in order to perform the necessary steps while the student watches. Student and teacher may switch control over the input as often as they like to complete the task. Note that it is not necessary that the terminals of the electronic classroom reside in the same room, or even building. Potentially, the electronic classroom may be distributed over the whole world. Additional benefits related to computer supported learning are that the students may call the system via phone lines from home, thus having access to, e.g. the university's library, exam results, e-mail exchange with professors and other students, etc. 2.4 Communication Distributed hypermedia systems can naturally support asynchronous communication between scien- tists, teachers, students and university staff. Users may annotate any document (provided that access rights allow it). In particular, annotations may be annotated, starting an electronic discussion about some document. This is similar to bulletin boards, but requires elaborate support mechanisms for substantial discussions (e.g. certain types of search facilities, graphical browsers, etc.). Users need not be logged in simultaneously. 2 APPLICATIONS OF A UNIVERSITY INFORMATION SYSTEM 7 In addition to this "local communication", users of the university information system may communi- cate with computer users throughout the world using Electronic Mail (E-mail). E-mail documents are incorporated into the system's database as private text documents, and may be linked (either static or dynamic) with other documents (including other E-mail, e.g. questions and answers). In order to find E-mail, keywords have to be attached or extracted (potentially automatically), allowing links to be targeted to the piece of mail in question. Of course, E-mail may be annotated. Today, cost and speed of hardware are no longer limiting the use of Computer Mediated Communi- cation (CMC) systems. However, mechanisms with which the user can organize information to avoid information overload (Hiltz & Turoff, 1985) have to be provided (Quarterman, 1990, page 155). It is believed that hypertext is a promising mechanism for this purpose, and there are hypertext imple- mentations like NLS/Augment (Engelbart & Lehtman, 1988) and gIBIS (Begeman & Conklin, 1988) addressing the E-mail problem. Information Lens , a research project at MIT's Sloan School of Man- agement, addresses electronic information overload by semistructured message templates and rules that define what to do (automatically) with mail of certain type and content (Robinson, 1991). An even larger amount of data (approximately 20 MBytes a day) is distributed by a communication system known as Usenet News (Quarterman, 1990, page 235). There are more than 2000 hierarchically structured newsgroups (discussion topics) available covering computer science, other sciences, social issues, recreational activities and other topics. Unfortunately, the sheer amount of data makes it very time-consuming to follow even only a few newsgroups regularly. However, if the individual news (which, by the way, conform to certain standards) can be automatically supplied with keywords, stored, and cross-referenced in a hypertext system, the user could later on search for news covering a specific topic, which would be a valuable tool for the scientist. A prototype of a system that automatically constructs hypertext links between news articles based on a similarity rating has already been implemented at the Technical University of Denmark (Andersen et al., 1989). We will try to incorporate a similar mechanism into our university information system, i.e. integrate news articles into the Hyper-G database. Of course, synchronous (on-line) communication can also be integrated into the hypermedia system, with added value compared to stand-alone versions: dynamic links are created automatically as users type. 2.5 Computer Supported Collaborative Work (CSCW) Computer Supported Collaborative Work (CSCW), sometimes also referred to as Groupware is one of the buzzwords of the beginning 90's. While some people use Groupware as a rather general term that refers to anything from electronic mail to distributed databases, Ellis et al. (Ellis et al., 1991) define Groupware as: "computer-based systems that support groups of people engaged in a common task (or goal) and that provide an interface to a shared environment" 2 APPLICATIONS OF A UNIVERSITY INFORMATION SYSTEM 8 and give an overview of concepts and commercial products. Engelbart and Lehtman (Engelbart & Lehtman, 1988) define CSCW and Groupware as: "Computer Supported Collaborative Work (CSCW) deals with the study and development of systems that encourage organizational collaboration. Most Groupware products fall under this classification." They distinguish between three groups of CSCW projects: 1. Tools for augmenting collaboration and problem solving within a group geographically co-located in real time. 2. Real-time tools for collaboration among people who are geographically distributed. 3. Tools for asynchronous collaboration among teams distributed geographically. The spectrum of Groupware products ranges from simple or "intelligent" E-mail systems to "joint editing" of documents (text or picture). "Electronic Meeting Systems" can be used to prepare or even carry out meetings. It is speculated that a semi-identified (section 3.9) user mode should support a (semi-)anonymous discussion during the "brain-storming" phase of a meeting, and then discussion should continue in identified mode. Therefore, the UIS should be able to support these user modes and also anonymous and identified voting. Collaboration can further be enhanced by using tools that allow joint editing of text or even technical drawings. 2.6 Other Aspects The user of a university information system can be expected to have some experiences with computers and communication systems, and be familiar with keyboard and mouse. Therefore, the standard desktop user interface metaphor seems feasible. Additional features may include a "Hyper-Calendar" which is similar to the calendar found in most DeskTop environments. However, the Hyper-Calendar allows to tie links to certain dates and times which point to other documents. Also, links may originate at other documents and point to the calendar. 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 9 3 Hyper-G - A Universal Hypermedia System While most of today's hypermedia systems are designed for a specific purpose, Hyper-G is designed as a general-purpose hypermedia system that can be used for a number of applications. Only relatively few 'orthogonal' concepts are combined in a clear design to achieve different features found in different special-purpose hypermedia systems. This section outlines the basic ideas and concepts behind Hyper-G, the system that we will use to implement the UIS applications described in the previous section. The system is described in terms of requirements and not implementation ideas. For ease of understanding, requirements are developed in a rather informal fashion (see (Kappe et al., 1991) for a formal specification of requirements). 3.1 Hypertext - Hypermedia Hyper-G is designed as a flexible hypermedia system. Basically, the term 'hyper-' means that pieces of information (let us call them documents) may be cross-referenced by so-called links. The user is able to traverse these links, moving freely through the web in a non-linear fashion. An 'anchor' is the location within a document where a link is attached to. There may be source and destination anchors. The term '-media' means that documents are not restricted to text, but also vector and raster images, speech, sound, videos etc. are allowed as documents. The traditional term 'hypertext' refers to a hypermedia system that deals only with text. However, because this restriction does not seem to make too much sense, the terms hypertext and hypermedia are often used interchangeably. Almost all current hyper-systems are limited to providing unidirectional links like the ones shown in figure 1. This has two important implications: 1. The system can only show the user the links that have the current document as their departure point but not the ones that arrive at the current document. In other words, systems usually cannot answer the question: 'What other documents refer to the current document?'. They are also unable to give the user an overview of the structure of the information network around the current document (like in Figure 1), although such an overview might decrease the disorientation of users that is often reported in conjunction with hypertext systems. 2. Whenever a document deleted or modified, the system is not able to tell what other documents refer to the document in question. This means that in such cases references to nonexistent or invalid documents remain in the database, making it impossible to guarantee database integrity. Especially in large multi-user systems, where the person deleting a document can not be expected to know of all the links to that document, this is completely unacceptable. Therefore, Hyper-G supports bidirectional links. This is most easily done by keeping links and anchors in a separate link database, as opposed to storing them directly in the documents like most systems do 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 10 Figure 1: A Hypertext Web (an exception is the Intermedia system (Haan et al., 1992; Yankelovich et al., 1988) that also supports bidirectional links and stores links in a link database). This clear separation has a another advantage: Modification (creation, deletion) of a link between two documents requires no modification of the documents. E.g., a user may attach an annotation to an otherwise read-only document. Also, links may easily be filtered (e.g., some links can only be seen by the user that created them) and link maintenance is simplified. 3.2 Distributed Hypermedia The applications described in section 2 suggest to implement Hyper-G as a distributed system running concurrently on a heterogeneous network of computers. To achieve high performance, Hyper-G adopts the client/server model and is therefore split into concurrent processes that may execute on a number of machines simultaneously. The Core System of Hyper-G will consist of a high-speed local area network (LAN) of Hyper-G Servers and Workstations (see Figure 2). Hyper-G Servers will typically be supplied with gigabytes of fast mass storage (harddisks), but there may be special-purpose Hyper-G servers equipped with, e.g, CD-ROMs, Video or Audio Discs or Tapes. The Workstations execute the client software (the so-called viewer). The viewer makes use of local intelligence in the workstation as far as possible. Workstations need not be of the same 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 11 Figure 2: The Hyper-G Core System kink. Most of them will be standard graphic workstations, but some of them may have additional audio/video capabilities or special user interfaces. In addition, Hyper-G will be accessible from remote terminals over a wide area network (WAN , see Figure 3). In most cases, this WAN will be slow (Internet, dial-up phone lines or telematic services networks) compared to the LAN, and the facilities of the terminals will vary, so the access to certain types of documents will be limited. As has been already discussed, it is not feasible to keep all the data in the core system nor is it reasonable to re-implement already existing local systems. Therefore, Hyper-G supports access to other and remote databases (e.g. airline guides, phone books, any kind of data that keeps changing), but the access mechanisms are hidden from the user. In particular, remote data is seamlessly embedded into the user interface metaphor (section 3.6) and is accessible by links. W 3, gopher and WAIS systems (section 2.1) are ideal candidates for such remote databases. On-Line sessions to remote informaton systems are possible, but should be avoided because they destroy the consistent user interface metaphor within the Hyper-G environment. 3.3 Automatic and Dynamic Links In a system with a large number of documents (Hyper-G is targeted to mega-quantities of documents and tera-quantities of bytes) the information provider (author) cannot be expected to know about all 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 12 Figure 3: Hyper-G and the World documents in the system. In order to generate links from the author's documents to other documents that might be useful but the author is not aware of, a large-scale hypermedia system has to support (at least semi-) automatic generation of links. Automatic generation of hypertext links has already been demonstrated in the case of encyclopedias (M"ulner, 1989) and structured text like technical books (Frisse, 1988). However, the algorithms employed were rather crude (i.e. based on string comparison) and generated a number of wrong links. Use of more sophisticated AI methods for automatic link generation has to be further investigated. In a large hypermedia system with a number of general-purpose and specialized encyclopedias and dictionaries (such as the Oxford English Dictionary (Bowers, 1988; Raymond & Tompa, 1988) or the Duden for German), it is pretty likely that, for any word the user sees on the screen, there is a document explaining at least the meaning of that word. Because of automatic link generation, there will be a link attached to that word. However, if there is a link from (almost) every word, it does not make too much sense to mark all these words as 'clickable' (exact meaning depends on user interface). The users are simply allowed to click on any word on the screen. In most cases, they will get information on that word by doing so. If we assume fast database access, we can also postpone the generation of these links to runtime, i.e. to the moment of the user's click. The system would respond by performing a database query, looking for entries of the clicked word. If found, the associated document is presented. This dynamic link generation offers the following advantages: 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 13 o It reduces the number of (static) links, reducing not only storage requirements, but also link maintenance. For example, when a document is deleted, the document is just no longer found in the database, so that no dynamic links to the document are possible any more. Vice versa, no words in the document just deleted can be clicked any more, so the 'outgoing' links are also destroyed automatically. Also, when a new document is inserted, it is automatically linked with all the other documents already in the database or to be added in the future. o The user may define an active list of databases to be searched for links, reducing the number of unwanted links (link filtering ). o Access to remote databases in a way consistent with the hypertext user interfacemetaphor, i.e. by links, is only possible with dynamic links. Because the information in the remote database may change at any time, no maintenance of static links to or from a remote database is possible. However, dynamic links to and__ from remote documents are feasible. Hyper-G supports automatic, dynamic generation and maintenance of links. Of course there will be user-defined links, typically attached to graphical, video and audio documents. In addition, Hyper-G supports automatically generated, static links that may be found using algorithms too complex for runtime execution. As has been stated before, Hyper-G uses bidirectional links. In the case of dynamic links, going in the reverse direction means asking the question "What other documents contain the word I am just clicking on?". In order to support this, appropriate data structures that allow the fast full-text search of text documents have to be implemented. 3.4 Navigation through Structures Users of large hypermedia systems experience navigation problems, including: getting lost in "hyper- space"; having difficulty gaining an overview; not being able to find information that is known to exist; determining how much information on a given topic exists; how much of it has been seen and how much is left (Nielsen, 1990b; Nielsen, 1990a). Typical solutions that work well on small systems fail completely when applied to large systems. We believe that in order to offer the user a convenient way to find information, the 'hyper-' navi- gational model has to be extended by more high-level and structured navigation facilities, such as a hierarchical organization of related information (that we call collections), so-called guided tours through the information universe that have been prepared in advance by experts (CAI lessons would be a typical example), and advanced database queries (full-text, inexact match, form queries . . .). The generic concept of Structures allows us to implement various navigational modules, browsing facilities and user interface metaphors by inheritance from on a single concept realized within the Hyper-G database: 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 14 Every Hyper-G object belongs to at least one structure object (except for top-level structures). During the browsing process, the user "opens" and "closes" structures, navigating from structure to structure. More specific structures are derived from an abstract structure base class: o Documents are structures that contain no sub-structures (may be thought of as leaves of the structure hierarchy). When opening a document the document is displayed. Figure 4: Cluster Structure o Clusters are groups of other structures. When opening a cluster, all the structures contained are opened (see Figure 4).. As a special case, we will call a cluster containing only documents a document cluster. Figure 5: Collection Structure o Collections are unordered sets of other structures. They are used to create a collection hierarchy like the one shown in Figure 7. When users open a collection, they are given an overview of all objects belonging to that collection, and can select (and thus open) any of the objects contained (see Figure 5). o Paths are ordered sets of structures. When a path is opened, the structures contained have to be visited in sequential order (see Figure 6). o Other (possibly more specialized) structures may be derived from the abstract structure class or the structure classes defined above. E.g., a certain structure derived from path may execute 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 15 Figure 6: Path Structure a script to determine the succession of sub-structures to be opened. In general, the structure concept is a very flexible and powerful one and can be most efficiently implemented using object- oriented databases. Please observe the following: o A structure may belong to more than one parent structure. However, the hierarchy of structures has to be free of cycles to allow convenient navigation. o The structure concept is orthogonal to the Hyper-Web concept (Figure 1). In particular, the user may navigate both by browsing the structure hierarchy or following the hyperlinks attached to individual documents. The links are in turn targeted to structures. As a special case, when opening a document that belongs to a document cluster, all the documents of the cluster are displayed, even if they should belong to different structures. o The regularity of the structure classes (and thus the structure hierarchy) makes it feasible to present a pleasing graphical overview of the current position within the hierarchy (like in Figure 7), while it is not always possible to draw nice graphs of irregular hyperwebs. 3.5 Tours In addition to browsing the collection hierarchy or navigating through "deep hyperspace" by following links, the user has the possibility to go on 'guided tours' through the information network that have been provided by some author (potentially any user). Tours are easily realized as structure hierarchies. Computer aided instruction is an example application of the tour concept (Maurer & Tomek, 1990b; Stubenrauch, 1989; Stubenrauch & Tomek, 1990). Two useful applications of the tour concept would be: o Tours that might also be called 'lessons' in CAI terms. This kind of tours will typically be authored by some expert in the field, logically structured and contain a number of cross-references to other tours (lessons) and Hyper-G databases. In fact, the over 500 lessons of the COSTOC 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 16 Figure 7: A Typical Collection Hierarchy project (Makedon & Maurer, 1988) covering topics ranging from medicine, ecology, and natural sciences to computer science, will be transformed (to Hyper-COSTOC (Huber et al., 1989)) and become an integral part of the Hyper-G database. A typical COSTOC tour consists of a table of contents (realized as a collection) and a number of chapters that usually have to be read in sequential order (realized as paths). Sometimes, a COSTOC chapter need not be read in strictly sequential order, but detours or abbreviations are allowed at certain points. This can be simulated by 'sub-chapters' (realized as collections). o Other tours may be created by any user, linking together document clusters of interest, so the user can easily find them again in future sessions, very similar to the 'associative trail' of Bush (Bush, 1945). These tours may be intended for private use only, but may also be made available to the public or specific user groups. Typical tours of this class might be "my favorite impressionist paintings" or "some interesting features of UNIX". 3.6 Metaphors Hyper-G will not adopt one specific user interface metaphor. Rather, it will serve as experimental playground to test a variety of Hypermedia presentation metaphors (Davies et al., 1991). It should be even possible to configure the preferred presentation metaphor on a per-user basis or even switch metaphors in the middle of a session or to look at the same set of documents using different metaphors at the same time. Therefore, Hyper-G employs a strict separation of the user interface and what we 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 17 call the "hypermedia engine". The structure model (that is realized by the hypermedia engine, but is in general too complicated for the user) can easily be metaphorized by the user interface part. E.g, in the well-known Desk- Top-metaphor a document may be seen by the user as a little icon that expands to a window holding the contents when opened (i.e. clicked on), clusters may be presented by arrays of documents, while paths become piles of documents (i.e. you have to open the top document first). Collections can be metaphorized as folders (similar to piles, but with a directory and random access). In the book metaphor (Benest, 1989), however, a document can be a single page of the book. A cluster is a page with a number of items on it (e.g. text and image and animation). The chapters of the book correspond to paths (i.e. a chapter has to be read in sequential order), while the table of contents is a collection. On higher levels of the collection hierarchy, a number of books may be grouped together in shelves, the shelves in libraries, and so on. 3.7 Database Integrity The issue of ensuring database integrity is also addressed by the structure concept. E.g., consider the case where a node of the web in Figure 1 is to be deleted or replaced by a newer version. What happens is that all links from and to that document are rendered obsolete. Simply deleting these links guarantees referential integrity (i.e. no dangling references) but may result in the case where some nodes are no longer reachable through any link. With the overlayed structure hierarchy, however, it is guaranteed that every object is reachable through the structure hierarchy (because it belongs to at least one structure). Moreover, the parent structures may decide what to do when one of their sub-structures is going to be deleted in a way that depends on the class of the structure. E.g., a path like the one shown in Figure 6 may automatically join the predecessor and successor of the deleted node, so that navigational integrity is retained. In an object-oriented environment this can be implemented by sending appropriate messages to the parent structures of an object to be deleted or modified. The corresponding methods of the parent structures can be redefined for derived structure classes, yielding maximum flexibility of the system. 3.8 Queries Another strategy of finding information besides hypernavigation, collection hierarchy and tours, are database queries. This strategy is most useful at the beginning of a session in order to quickly find an entry point into the hyperweb or collection hierarchy or tours from which browsing can continue. Two kinds of queries are supported: o Key queries are rather simple queries that are passed to a user-defined list of active databases. The user can select between searching for keywords given in the title of objects or elsewhere, or 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 18 issue a full-text search. In addition, prefix queries and inexact match queries, possibly combined using boolean operators are possible. Using key queries, the user may find out very fast about existing information on a given topic. Also, this ability is used internally by the system to generate dynamic links (section 3.3). o Form queries are collection-specific. For example, the collection 'fine art', subcollection 'paint- ings' may allow to fill out forms with fields painter, name of picture, year, style, theme, and museum, so that the user would easily find all paintings of Leonardo da Vinci that are on display in the Louvre. Form queries may be very complex dialogs, including incremental form-filling (flex-forms ), i.e. the answer to the query dynamically appears as the form is filled out. Also, it should be noted that for some remote databases (e.g. WAIS's, see section 2.2) queries are the only way to get information. 3.9 User Modes As already mentioned in conjunction with electronic meetings systems, the system can be used anony- mously. This is to avoid the "Big Brother" scenario possibly associated with large information systems (Maurer et al., 1984). On the other hand, certain system features (e.g. user-specific system configura- tion) are impossible or undesirable (e.g. anonymous authoring) if the user remains truly anonymous. In order to combine advantages of both identified and anonymous user modes, Hyper-G supports four identification modes: 1. Identified: In this mode, the identity (name, address, etc.) of users is known not only to the system, but also to other users. E.g., when you receive mail or read annotations or documents from identified users, you know who they are. Users have to log in supplying user name and password. Obviously, monitoring of users' actions is possible in this mode. 2. Semi-Identified: The user may select a pseudonym (provided that no other user has already chosen the same name) and a password in identified mode. Subsequently, the user may use semi- identified mode by supplying the pseudonym plus password. This means that the user is still known to the system by full name. However, other users see only the pseudonym when reading annotations, mail, etc. A user may also use more than one pseudonym. The fact that the user is known to the system makes user monitoring possible. However, it discourages users from misuse of the communication facilities of the system (as shown by our experience with a public mailbox accessible through the Austrian videotex system), and requires only minimal software extensions. Also, accounting may be done in the same way as for identified users. 3. Anonymously Identified: This mode is similar to the semi-identified mode, i.e. users are identified by pseudonyms and passwords. However, the identity of anonymously identified users is not known to the system. E.g., when logging in at a terminal, the user may either enter a valid pseudonym and password or select (or let the system generate) a new combination5 . ___________________________________________________________ 5Of course, passwords may be changed by the user at any time, supplying the old password. 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 19 The advantage of this mode compared to anonymous mode is that the system is not able to tell who the user really is (no "Big Brother"), but can still recognize the user in forthcoming sessions. This allows to store user preferences and configurations and allows private write operations (i.e. modifications of the database visible only to the same user, e.g. private annotations). However, public writes (i.e. changes to the database visible by other users) are disabled in this identification mode. Accounting is still possible in this mode by use of anonymous accounts, i.e. the user pays a certain amount to be added to a certain pseudonym's account. 4. Anonymous: This mode will be used where a user identification is too complicated and time- consuming for the job to be done (e.g. at public terminals, museums. . .). No user preferences or configurations nor any database modifications6 . are possible in this mode. The user's and configuration and rights are determined by the terminal profile. However, accounting is still possible at terminals equipped with special hardware to accept tokens, coins, IC-cards and the like. Also, system monitoring for system administration and tuning purposes is possible in all iden- tification modes. _______________________________________________________________________________________________________________________________* *______________ ||||| Identification | |System | | Private | User | Public * * | ||||| | | | Accounting | | | * * | ||||| Mode |M|onitoring | | Write | Monitoring | Write * * | |||||________________________|_|____________|___________________________|_______________|____________________|_________________* *___|___________ ||||| | | | | | | * * | ||||| Identified | |YES | YES | YES | YES | YES * * | ||||| | | | | | | * * | |||||___________________________|_|____________|________________________|_________________|________________________|___________* *_____|________ ||||| | | | | | | * * | ||||| Semi-Identified | |YES | YES | YES | YES | YES * * | ||||| | | | | | | * * | |||||___________________________|_|____________|________________________|_________________|________________________|___________* *_____|________ ||||| Anonymously | | | | | | * * | ||||| | | YES | YES | YES | NO | NO * * | ||||| Identified | | | | | | * * | |||||________________________|_|_______________|________________________|_________________|________________________|___________* *_____|________ ||||| | | | | | | * * | ||||| Anonymous | |YES | YESa | NO | NO | NO * * | ||||| | | | | | | * * | |||||___________________________|_|____________|________________________|_________________|________________________|___________* *_____|________ a Special hardware required. Table 1: Supported Functions of User Identification Modes Table 1 summarizes the possibilities of the four identification modes. Note that users may access the system in any desired mode. This enables users to use anonymous or anonymously identified mode most of the time, and switch to other modes only when necessary (e.g. for a public write operation). 3.10 Problems in Multilingual Environments Because of its target of becoming a university information system for (among others) a European country, the language problem arises and is addressed by Hyper-G in the following way: ___________________________________________________________ 6You can still send mail to the system administration ("suggestion" or "complaint"). 3 HYPER-G - A UNIVERSAL HYPERMEDIA SYSTEM 20 o The documents themselves may be multilingual, i.e. can be document clusters that consist of several documents, storing approximately the same information in different languages. This obviously applies to text and voice documents, but may also be required for pictures, films, animation etc. in special situations. o The run-time system is able to choose from available languages based on the user's preferences. o The user interface of the runtime system is available in a number of languages, too. o The user may specify an ordered list of language preferences (e.g. 'Hungarian' - 'German' - 'English'). This means that if a Hungarian version of a document is available, show it, else show the German version, else the English one. If none of the user's list is available, the system may display an appropriate error message, or display whatever it has. o The mechanism described above may be temporarily overridden, e.g. to see the document in its original language, or to see more than one version at the same time. Further complications arise with the use of dynamic links. Suppose the user has chosen language preference 'Hungarian' - 'German' - 'English'. Now, for some reason, a German text is on the screen. The user might click on any word, which will most probably be a German word, e.g. 'Rechner' (computer), so a simple dynamic link (as described in section 3.3) will most probably find only German documents that contain the word 'Rechner' as keyword. In order to get existing Hungarian documents on 'komputer' (the Hungarian word for computer) as well, there are two solutions: o We could assume that all documents on 'Rechner' of all languages belong to the same document cluster that stores equivalent documents, but in different languages. When, in our example, the runtime system gets to the German document on 'Rechner', it would see an equivalent Hungarian document on 'komputer' (if it exists) and display that instead. Although this solution may be applicable to small amounts of data (e.g. a single tour), where the author creates the cluster links 'by hand', this approach does not generalize well, because of two reasons: - For large amounts of data (e.g. a number of encyclopedias in a number of languages), it is too much work to look for equivalent documents manually. On the other hand, the decision whether two documents in different languages are 'equivalent' is very hard to make for machines. - If in our example a German document on 'Rechner' does not exist, the runtime system would never find the cluster with documents on 'komputer' and 'computer'. Still the user would want to see the 'komputer' or else the 'computer' document, if one exists. o In order to solve this problem, the Hyper-G database will contain some translation mechanisms (translation programs, dictionaries) to translate the word 'Rechner' to Hungarian before search- ing the database. Of course, this translation process introduces further 'blur' to the database query. For example, 'Rechner' translates to 'computer', but also 'calculator', 'arithmetician' and 'reckoner'. Without regard to the context and background knowledge, the system is not able to decide which translation is the most 'equivalent' one. This is a general problem of automatic translators. 4 STATUS AND OPEN PROBLEMS 21 To make things even more complicated, in order to support n languages and to be able to translate from any language to any other language, we would need n(n - 1) translators or dictionaries. A possibility would be to translate first from the source language to an intermediate language (e.g. English) and then from that language to the target language. This way we would need only 2(n - 1) translators. In our example, we would first translate 'Rechner' to English, and the resulting words ('computer', 'calculator', 'arithmetician' and 'reckoner') to Hungarian. Of course, this translation would be even more inaccurate. Vice versa, it is possible that two German words (e.g. "Taste" - the kind of key that you press on a keyboard - and "Schl"ussel" - the kind of key that you use to lock doors) translate to the same English word (key ). As a result, to the surprise of a German-speaking user looking for information on "Taste", information on "Schl"ussel" may be returned instead. The usual solution is to have additional attributes in the intermediate language (e.g., key (computer) and key (door)). With Hyper-G we will have the opportunity to experiment with both approaches. To our knowledge, Hyper-G will be the first hypermedia system supporting multilingual documents. Of course, languages like Chinese, Japanese, Arabic etc. impose even more problems. For example, a large number of fonts has to be supported, writing is not restricted to the usual "from left to right, from top to bottom" way, and more. 4 Status and Open Problems At the time of writing (March 1992), we have implemented the Hyper-G server and a "terminal client". The terminal client is implemented on top of UNIX' curses package and allows access to Hyper-G from virtually any computer able to act as a terminal. On the other hand, output is text-only, and the user interface is kind of limited7 . The system is already used by staff of the Graz Technical University and contains information about various organizations within the university, research activities of the individual institutes, a calendar of events, phone numbers, on-line software documentation, a general-purpose encyclopedia, etc. Gate- ways to the local library as well as to other universities' information systems (most of them gopher- based) exist. The system is also currently being used by some 200 students as a discussion forum as part of a course on social impact of informatics. A number of other modules are ready as such but are not integrated yet (e.g., display software for raster and vector-based images, a package for display of software movies and animation, conversion software for existing encyclopedias, courseware, animation, sound, etc.) ___________________________________________________________ 7A demo session using the terminal client can be started by issuing the shell-command "rlogin finfo.tu-graz.ac.at" on any UNIX machine connected to internet. In addition, the system is integrated into the gopher and W 3 worlds by means of gateways and can therefore be accessed (in a limited way) by gopher and W 3 viewers. REFERENCES 22 In phase two of our Hyper-G development effort we want to implement a new viewer based on X11 and InterViews(Linton et al., 1989). The new viewer will enable us to display (raster and vector) images and allow support better navigation and authoring facilities (featuring a graphical browser that will also serve as link editor). The fax gateway mentioned in section 2.1 will be installed. In order to reduce the effect of OCR failures, fast mechanism for full-text queries that can handle spelling errors have to be developed. Problems that remain to be solved in phase two include better (AI) algorithms for automatic link generation, and display algorithms that produce pretty layouts for the graphical browser. Also, the current design requires a single link server that centralizes storage of links and anchors. Distributing the link server would result in an even more flexible system that offers better performance if used in a wide area network (e.g. using a link server per university). Also, the language problem (see section 3.10) has to be experimented with in more detail, and question-answer dialogs more sophisticated than current COSTOC dialogs will be developed (including, e.g. graphical dialogs). A number of other important aspects will also be extended. Such aspects include: authorization procedures, CSCW and computer conferencing, usage metaphors and user profiles, animation as- pects(Kappe & Maurer, 1990), voice, sound, and movie documents including synchronization and linking, undo and redo functions, various hypermedia editors, and a variety of browsing tools. Funding of part of this project by the Austrian Ministry of Science, grant 613.531/5-26/90, is gratefully acknowledged. References Andersen, M. H., Nielsen, J., & Rasmussen, H. (1989). A similarity-based hypertext browser for reading the UNIX network news. Hypermedia, 1(3), 255-265. Begeman, M. L., & Conklin, J. (1988). The right tool for the job. Byte, 13(10), 255-266. Benest, I. D. (1989). A hypertext system with controlled hype. In Proc. of the Hypertext II Conference, York. Berners-Lee, T., Cailliau, R., Groff, J., & Pollermann, B. (1992). Worldwideweb: the information universe. Electronic Networking: Research, Applications and Policy, 1(2). Bowers, R. A. (1988). The oxford english dictionary on compact disc. Electronic and Optical Publishing Review, 8(2), 88-91. Bush, V. (1945). As we may think. The Atlantic Monthly, 176(1), 101-108. Conklin, J. (1987). Hypertext: an introduction and survey. IEEE Computer, 20(9), 17-41. Davies, G., Maurer, H., & Preece, J. (1991). Presentation metaphors for a very large hypermedia system. Journal of Micro Computer Applications, 14(2), 105-116. REFERENCES 23 Ellis, C. A., Gibbs, S. J., & Rein, G. L. (1991). Groupware: some issues and experiences. Communi- cations of the ACM, 34(1), 39-58. Engelbart, D. C., & Lehtman, H. (1988). Working together - the human system and the tool system are equally important in computer-supported cooperative work. Byte, 13(13), 245-252. Engelbart, D. C. (1963). A conceptual framework for the augmentation of man's intellect. In Hower- ton, P. D., & Weeks, D. C., editors, Vistas in Information Handling, Vol. 1, (pp. 1-29), Spartan Books, Washington D.C. Frisse, M. (1988). From text to hypertext. Byte, 13(10), 247-253. Haan, B. J., Kahn, P., Riley, V. A., Coombs, J. H., & Meyrowitz, N. K. (1992). IRIS hypermedia services. Communications of the ACM, 35(1), 36-51. Hiltz, S. R., & Turoff, M. (1985). Structuring computer-mediated communication systems to avoid information overload. Communications of the ACM, 28(7), 680-689. Huber, F., Makedon, F., & Maurer, H. (1989). Hyper-COSTOC: a comprehensive computer-based teaching support system. Journal of Micro Computer Applications, 12, 293-317. Jonassen, D. H., & Mandl, H., editors (1989). Designing Hypermedia for Learning. Vol. 67 of NATO ASI Series F: Computer and Systems Sciences, Springer. Kappe, F., & Maurer, H. (1990). Animation in Hyper-G - an outline. In Haase, V., & Zinterhof, P., editors, Proc. Future Trends in Information Technology '90, Salzburg, Austria (pp. 235-248), Austrian Computer Society, R. Oldenbourg, Vienna, Munich. Kappe, F. (1991). Aspects of a Modern Multi-Media Information System. PhD thesis, Technical University Graz, Austria. Also available as IIG Report #308; IIG, Graz University of Technology (Jun 1991). Kappe, F., Maurer, H., & Tomek, I. (1991). Hyper-G - specification of requirements. In Gy"orgy, A., editor, Proc. Conference on Intelligent Systems (CIS) '91, Veszprem, Hungary (pp. 257-272), John von Neumann Society for Computing Sciences and Austrian Computer Society. Kearsley, G. (1982). Authoring systems in computer based education. Communications of the ACM, 25(7), 429-437. Linton, M. A., Vlissides, J. M., & Calder, P. R. (1989). Composing user interfaces with InterViews. IEEE Computer, 22(2), 8-22. Makedon, F., & Maurer, H. (1988). COSTOC - Co__mputer S_upported T__eaching of_ C__omputer science. In Proc. IFIP Conference on Teleteaching, Budapest (pp. 107-119), North Holland Publ. Co., Amsterdam. Makedon, F., Maurer, H., & Ottmann, T. (1987). Presentation type CAI in computer science educa- tion at university level. Journal of Micro Computer Applications, 10, 283-295. REFERENCES 24 Marchionini, G., & Shneiderman, B. (1988). Finding facts vs. browsing knowledge. IEEE Computer, 21(1), 70-80. Maurer, H., & Tomek, I. (1990a). Broadening the scope of hypermedia principles. Hypermedia, 2(3), 201-221. Maurer, H., & Tomek, I. (1990b). Hypermedia in teleteaching. In Proc. IFIF Congress WCCE '90 (pp. 1009-1015), North Holland Publ. Co., Amsterdam. Maurer, H., & Tomek, I. (1991a). Hypermedia - from the past to the future. In Proc. New Results and New Trends in Computer Science (pp. 320-336), LNCS 555, Springer Pub. Co. Maurer, H., & Tomek, I. (1991b). Hypermedia bibliography. Journal of Micro Computer Applications, 14(2), 161-216. Maurer, H., Roszenich, N., & Sebestyen, I. (1984). Videotex without big brother. Electronic Publishing Review, 4, 201-214. McAleese, R., editor (1989). Hypertext: theory into practice. Blackwell Scientific Publications Ltd. M"ulner, H. (1989). A system of inter-active encyclopaedias. In Proc. 4th Austrian-Hungarian Infor- matics Conference, Budapest (pp. 181-190), John von Neumann Society for Computing Sciences, Hungary. Nelson, T. H. (1965). A file structure for the complex, the changing, and the indeterminate. In Proc. 20th ACM National Conference (pp. 84-100). Nelson, T. H. (1981). Literary Machines, Ed. 87.1. (1987). The Distributors, South Bend, IN. Nielsen, J. (1990a). The art of navigating through hypertext. Communications of the ACM, 33(3), 296-310. Nielsen, J. (1990b). Hypertext & Hypermedia. Academic Press, San Diego, CA. Quarterman, J. S. (1990). The Matrix - Computer Networks and Conferencing Systems Worldwide. Digital Press, Bedford, MA. Raymond, D. R., & Tompa, F. W. (1988). Hypertext and the oxford english dictionary. Communica- tions of the ACM, 31(7), 871-879. Robinson, M. (1991). Through a lens smartly - MIT's information lens project has laid the foundation for intelligent assistants for your e-mail system. Byte, 16(5), 177-187. Stein, R. M. (1991). Browsing through terabytes - wide-area information servers open a new frontier in personal and corporate information services. Byte, 16(5), 157-164. Stubenrauch, R., & Tomek, I. (1990). Navigation and browsing through lessons in a hyper-CAI system. In Proc. 7th International Conference on Technology and Education, Brussels, Belgium (pp. 125-127), CEP Consultants Ltd. REFERENCES 25 Stubenrauch, R. (1989). Touring a hyper-CAI system. In Maurer, H., editor, Computer Assisted Learning. Proc. of 2nd Int. Conf., ICCAL '89, Austin, TX, USA (pp. 541-551), Springer, LNCS 360. Tomek, I., & Maurer, H. (1991). The analyst as a starting point for a hypermedia system. Journal of Micro Computer Applications, 14(2), 139-160. Tomek, I., Khan, S., Muldner, T., Nassar, M., Novak, G., & Proszynski, P. (1991). Hypermedia - introduction and survey. Journal of Micro Computer Applications, 14(2), 63-100. Yankelovich, N., Haan, B. J., Meyrowitz, N. K., & Drucker, S. M. (1988). Intermedia: the concept and the construction of a seamless information environment. IEEE Computer, 21(1), 81-96.