1. Software challenge
This note is concerned with presentations of information which will be possible once a particular computer software problem has been solved. The problem can be illustrated by three examples:
(a) Traffic network mapping: If a database contained entries on 300 subway stations (or airports, or bus stops) and their direct route links to one another, what is required is a software package to construct one or more possible maps of the resulting network. The important point is to be able to optimize the comprehensibility of such maps with minimum manual intervention in the construction process.
(b) Hypercard stack mapping: With the widely acclaimed introduction of the Apple hypercard, whereby complex networks of relationships between database records can be handled, the problem remains of mapping the pattern of relationships in the resulting hypercard stack. The individual entries may be said to constitute "data", but it is the pattern of relationships between them which constitutes "knowledge" and "intelligence"
(c) Mind-mapping: This is a technique currently being strongly promoted in management training and time-management courses. It consists of manually drawing circles to represent key ideas, objectives or activities and then interlinking in a network of relationships. There is a clear need for a software package to facilitate this process. This could take the form of a non-hierarchical form of the standard outline package to manage chapter headings of a report, in which the graphic element is emphasized. There are some resemblances to project scheduling software except that here the emphasis is on relating concepts.
(d) Comment: Consider a relational database with records consisting of subway stations and indications of which station was directly connected to which other stations (and possibly on what "line").
The core problem is how to obtain/adapt/develop software which would generate one or more maps of the subway station network. The principal constraint is that the map should be comprehensible. It is neither required nor desirable that the map should be constrained by some equivalent to "topographic" constraints (namely the position of the stations should not be determined by some form of geographic coordinates). Rather the requirement is that the positions should be determined topologically and mapped, at least for immediate purposes, onto a two-dimensional surface.
There are additional problems which can be treated at lower levels of priority, if at all. They include:
- A second problem is that the database in fact contains over 10,000 nodes and ways must be found to segment the network (possibly filtering out lower levels of detail) so that maps for individual segments can be interrelated. Such maps, in hardcopy form, will be bound together in a book to form an "atlas".
- A third problem is that it is desirable that there should be some editorial interaction with the map to improve its visual quality.
- A fourth problem is that it is desirable that it be possible to update the data base by introducing changes interactively to the map.
- A fifth problem is to open the way to using the map as a menu via which the database can be queried for information on the nodes.
2. Constraints and possibilities
(a) Conventional approach: The conventional approach to databases, and to the reference books produced from them, is to focus on individual entries. The user is not assisted in understanding the relationships between entries, other than by fairly crude grouping of entries into categories.
(b) Hypertext approach: With the development of interactive databases, hypertext (plus the new hypercard approach of Apple) and CD/ROM, data entries can be organized so that they cross-reference one another to a high degree and in a non-hierarchical manner.
For example, the current Yearbook of International Organizations (1994/95) has over 30,000 organizations with some 80,000 relationships between them (with the major organizations having an average of 70 each) and with a further 192,552 links to membership countries. This Encyclopedia provides an equivalent challenge with some 10,000 world problems linked by 120,000 relationships between them. In database terms this is a major step towards what is being called hypertext. Both publications are maintained on a computer network from which a CD-ROM version is being produced.
(c) User need for "maps": Because of the overwhelming volume of data, users need "maps" of the pathway between entries, especially in complex subject areas. Such maps provide a sense of context which is lost in many hierarchical presentations of data in linear text form. It is only from such maps that users can quickly obtain an adequate overview of data in an unfamiliar area to guide their efficient use of conventional information tools. Such maps are of value precisely because they are richer than simple hierarchically structured thesauri.
(d) Editorial need for a graphic interface: In preparing such publications, editorial researchers need to be able to graphically represent the networks of relationships they are endeavouring to clarify. This is in part strongly related to mind-mapping. Without such a tool, editors have to produce extensive mind maps in manual form before building up or modifying the network of relationships. Ideally it should be possible to communicate such maps to key resource people to obtain insights which are not so easily indicated in normal text presentations. Interesting examples of such graph displays, prepared manually, do exist.
(e) Existing techniques: Computer hardware and software for the construction and manipulation of such networks of relationships have only been developed for specific applications such as in chemistry, architecture and engineering (CAD), or electronic circuit board design (PCB). It would be possible to develop similar software to display relationships between database entries.
A number of software packages have been developed, especially for Apple machines, which go some of the way towards the product required. These include MORE and INSPIRATION. The disadvantage of these products is that they have primarily been designed to work around a core concept (a "main idea") which is the point of departure for a hierarchical structure. This does not correspond to the essentially non-hierarchical presentation required.
(f) Atlas production: Once such maps can be successfully produced and manipulated, computer tapes can be made to drive photocomposition machines (with vector generators). These make high quality maps. Alternatively such maps could be generated by standard graph plotters into camera-ready form. A series of such maps, with facing explanatory text and/or mini-index, may then be bound together as an "atlas".
Maps would be designed to cover clusters of organizations and/or problems in a given subject or geographical area. They would have the advantage of provoking input of new organizations and/or relationships when used in the form of proofs. They also have important didactic uses. Enlargements of the maps could also be distributed as wall-charts.
3. Software "modules"
(a) Relational database: The data is currently held and maintained in an Advanced Revelation database (version 1.16) running on a Novell 3.11 network. The database has been specially developed as a text database with facilities to manage networks of relationships between the records. It is desirable that when the data is displayed in map form, interactive changes to the map should be carried back as updates to the database. But since the prime requirement is for publishable hardcopy maps, this requirement may be sacrificed in the short term.
(b) Map design: Several approaches may be taken to the problem of map design:
- (i) Network analysis This uses specialized extensions of sociometrics to take data of the type described above and to position the elements in relation to each other on the basis of various measures of distance, with those most connected tending to be placed at the centre of a network and those least connected at the periphery. The advantage of this approach is that it endeavours to mirror the network on the basis of its internal characteristics. A number of software packages exist to perform the necessary computations. Various ways of describing a network and identifying key components result from such analysis.
The disadvantage of such software is that it has been developed for relatively small networks only (100 to 300 nodes). Few of the packages are designed to permit mapping of the resultant network. Data is output in matrix form only or as indices in relation to key elements. More seriously, such networks when mapped result in maps which, although they reflect the data, are not designed to enhance the comprehensibility of the data (other than in a purely scientific sense). Such computations can consume considerable amounts of computer time, even on fast machines.
This approach has been explored using test data from the UIA Revelation database consisting of some 5,000 nodes. The work was done on a Mac II using software developed at the University of Dartmouth by Joel Levine of the Department of Mathematical Social Sciences. This software has not been adapted to run under MS-DOS.
- (ii) "Crude mapping" A simplistic approach could be taken. This would involve positioning the nodes on a grid determined by the subjects with which they are associated. Such a subject grid (with positions determined by a 4 character identifier) is in use to categorize the UIA data into some 3,000 categories. Relationships would then be plotted between the nodes.
In this case comprehensibility is achieved through the link to the matrix and not through determining the shape of the network. Use of a grid could severely undermine the memorability of the network. It would however be relatively easy to develop and quick to run. A key question would be what kind of interaction it would be possible to have with such a map and whether it would be possible to shift from a detailed focus on a specialized cell of the grid to a wider focus and back (a zoom facility).
- (iii) Topological manipulation In this approach, the network of relationships between nodes would be simplified using topological constraints. For example a string of interlinked nodes would be represented by a straight line. The position of the nodes on the line might be equidistant or determined by some logarithmic function based on the distance from the centre of the line. The aim would be to introduce symmetry elements into the data so that it acquires a distinct and memorable pattern or shape. Some of the algorithms required presumably correspond to those of pattern recognition problems.
(c) Plotting: Once coordinates have been determined, software is required to plot the network, whether onto the screen or onto a graph plotter. Many packages exist for this purpose. A distinction should however be made here between adequate quality plots (for working purposes) and high-quality plots for publication in book form. The latter question is discussed later.
The problem in plotting is to be able to introduce distinguishing elements into the plot. These may include variations in line thickness (corresponding to some measure of importance or proximity), variations in node size (corresponding to the number of connections to the node) and the introduction of identifying labels for the nodes.
A key requirement is that the plot be made from the data as processed by one of the above techniques, rather than from data which is manually input. A distinction must also be made between a curve fitting approach and one which passes through the nodes as is required here. A distinction also needs to be made between plotting a graph (from left to right) and plotting a network in which there is no privileged direction. The latter form is more characteristic of CAD programs (see below).
(d) Drawing: It is desirable to move towards an interactive approach to the data. In other words, once a plot is made for a segment of theoverall network, editors should be able to modify the network. Such modifications might take one of two forms. The first would consist of simply moving portions of the plot to make it more comprehensible, making room for labels and improving the aesthetics. The second might also involve the capacity to add or delete features from the network. It would of course be highly desirable that the latter changes should be carried back into changes to the relational database. This can raise severe problems of compatibility between the relational database and the drawing/plotting software, whether in terms of software or of intermediate files. Such features are available in many CAD programs. It is however important to recognize that the CAD software is here used to "design" logical or topological constructs rather than buildings or mechanical parts. This is not a limitation but it may permit use of simpler (and cheaper) CAD software.
It is appropriate to note that the variant of CAD software used for interactive printed circuit board design (PCB) has many features of value to the present application, especially the "auto-router" feature which positions connections on the circuit board in the most economic manner (avoiding cross-overs, etc). Unfortunately the positioning criteria do not make for maximum comprehensibility.
(e) Interface software: In the case of Advanced Revelation there exists a software product CAD/Base which offers "complete integration of CAD drawings with a database environment", via industry standard DXF files. The drawing is viewed as a Revelation file and the drawing elements as Revelation records and fields. The drawing exists as a master file in both the Revelation and CAD environments. Changes in one environment are reflected in the other automatically without any intermediate file conversion required.
Clearly this offers interesting opportunities for using the network map as a menu through which users can select individual nodes on which they can immediately access additional text data.
(f) High-quality graphic output: One objective is the production of maps to be printed in book form. To achieve this one approach might be to produce output in a form which can be handled by PC-TeX to create files for output on a high quality laser printer.
(h) Integration of features: It is possible that CAD/Base offers an appropriate means of integrating the different features discussed above (except the last). It is also possible that such a product, which is relatively expensive, can be considered as "overkill", and that a more compact approach would be more suitable and easier to make available to others. If the emphasis is on the simpler strategy of generating hardcopy, this would certainly be the case. To the extent that interaction with the data is desirable, then more features would be required, even though only a selection of standard CAD features would be necessary.
For the user, there is obviously great merit in ease of use as an adjunct to normal text editing procedures. Ideally such a package would bear some resemblance to the more sophisticated forms of "outliner", such as MORE and INSPIRATION running on Apple machines. In these an essentially hierarchical outline of topics can be opened up into standard text processing or converted into bullet charts. What is required is an equivalent which is tied into a relational database environment. The different approaches to network "map design" noted above might then be options in the way the data was manipulated for presentation, as is the case in standard business graphics (bar charts, pie charts, etc).