PROCESSES IN BIOLOGICAL VISION has defined a large and totally new set of performance descriptors applicable to the visual process in all animals. Some of these descriptors provide a new foundation for many of the previously defined empirical descriptors. The context for the descriptors of reading will be developed on this page but the descriptors will be delineated on the performance page and in the tabulation of the properties of the Standard Human Eye>.
Before procceding, it is suggested that the definition of the above terms, as used in this work, be reviewed. They differentiate between how humans perceive, interpret and recognize individual objects. Although these definitions apply to bucolic, symbolic (reading) and abstract images, only the bucolic image is illustrated in the definitions themselves.
This page is subdivided into:
The visual system is normally asked to process three fundamental types of images.
There is a fourth type if image found only in the research laboratory.
All visual systems are designed to accommodate the first two forms of images. Only the human species is known to be able to accommodate the symbolic image, particularly when the information is at a fine level of detail.
The differences in the way pictographs and text are perceived and interpreted are sufficiently different as to call for separate sections in the discussion following the introductory remarks that appear below.
Before discussing how an image is decomposed for purposes of interpretation, it is useful to consider the converse, how a scene is composed. When an artist contemplates a painting, he first visuallizes the span of the work, not so much in absolute size as in relative complexity, the amount of detail per unit size. When he starts work:
With the advent of modern computer generated text, another important analog of the visual process has appeared. Computer designers quickly discovered that to store a bitmap of each character in a font for a large variety of font sizes used up considerable computer memory. A solution was developed wherein the characters were not stored as a large number of bitmap images but as a small set of vectors (equations that drew a character of the desired size on demand). These vector generated characters were then placed on the printing surface at the prescribed location.
It is also important to note the saliency map maintained within the mind. This vector-based map constitutes the subjects long and short term memory concerning the environment around it. Without access to this map, the subject is unable to relate to its historical past. The subject acts child-like in his inquisitiveness and apparent lack of experience.
The visual system essentially reverses the above process of composing, rendering and texturing of a scene. It accomplishes this process by initially examining a complete scene using the broad angular capability of its awareness channel. This channel determines the relative complexity of each region within the overall span of the scene. The analytical channel is then used to analyze the objects in the scene sequentially. This analysis determines both the contour of the individual objects and determines their texture.
The process of decomposition employs slightly different strategies for natural scenes (and man-made copies of natural scenes) and for communicating through reading. These strategies will be examined sequentially
When viewing a natural scene, the awareness channel employs a primitive set of learned rules that relate to the total scene. These rules are largely independent of the size and orientation of the scene relative to the observer. The rules aid in the determination of the major objects within the scene and the instruction of the analytical channel to image, perceive and interpret each of these objects in turn. The strategy is roughly as follows.
Following the above series of steps, the mind of the subject has a fully interpreted understanding of the scene presented. It can then take any cognitive actions it desires.
The visual system includes another critically important signaling channel, the alarm channel.
Throughout the above procedure, the vector-based file representing the scene is continually compared to the saliency map of the individual to uncover any conflicts in context (a blue face or only one eye on another human image).
If significant contextual conflicts are detected, the frontal lobe of the cortex is called upon to interpret the significance of the conflict. This can involve significant time delays in the interpretive process and may result in unresolvable situations that must be added to the overall saliency map for future reference.
The repititive process of acquiring cognitive files relative to individual scenes and integrating them into the overall saliency map of the individual is defined as learning in the intellectual process. Through this process, the individual acquires an ever larger context base concerning his environment. He becomes educated. The contextual base of an individual is not limited to inputs from the visual system. The saliency map is in vectorial form and accepts inputs from all of the bodily sensors.
The process of education continues indefinitely. When the individual perceives and interprets an image that is in conflict with his saliency map, he obviously must rationalize the conflict. This rationalization continues the process of learning.
It is important to note the role played by the span of an image. A simple frame of a comic strip in a newspaper usually has a minimum span. It has no color, only one or two simple stylized objects and a baloon with words in it. The interpretation of such a frame is easy and the meaning is avaible quickly. On the other hand, a large mural of the last supper of Christ usually includes a great deal of detail within its span. The observer frequently cannot handle the content associated with such a span and will move closer, or otherwise concentrate on only a portion of the scene at one time. This action helps the researcher size the capacity of the observer to view scenes.
Down through history, man has continually attempted to communicate through the use of pictographs that required less and less effort to create and that carried information more efficiently. After reaching the cartoon like level of cave paintings, his approach diverged (during the early Egyptian Era) along two separate and distinct paths. One path was based on ever more stylized glyphs designed to incorporate an entire concept. This is the approach widely used in languages of the Asian sphere of the globe. Occasionally, the complexity of the individual glyphs has become too great to represent complex concepts and users have resorted to glyph groups to express very precise concepts.
The other approach was to develop a series of ever simpler abstract shapes that no longer represented an entire concept but could be grouped together in various sequences to represent individual concepts. This is the approach widely used in languages of the non-Asian sphere of the globe. These sequences initially formed words. Later, individual sequences were used to form syllables which were in turn grouped to form more complex words.
There is another form of man-made imagery. This form consists of totally abstract diagrams frequently used in the psychophysics laboratory to evaluate the performance of the visual system. The Landout visual acuity chart is such a diagram. It uses only a broken C as a diagram. The common Stellen Eye Chart is one of these diagrams, except recognizable symbols are frequently used to aid in communications between the subject and the observor.
Little is known as to how the visual system accommodates variations in the size and orientation of text material. It appears the orientation is accommodated much as for any other visual material. There is no requirement to re-learn a character set or a font merely because it is inverted or askew. However, it is clear that the oculomotor muscles have a fixed orientation and the POS may cause the head to rotate rather than attempt to compute new muscle commands based on the rotation of the text.
Within the spatial size constraints associated with the foveola, it appears that the analytical system is able to accommodate size variations in a manner similar to the process used in computerized typesetting. The scanning mechanism of the POS appears to determine the longest stroke length associated with a character and to normalize this length in the vectorization process associated with the perception of the character(s).
To understand the perceptual and interpretive process related to the reading of character-group based text, the spatial capability of the foveola must be appreciated. It forms the entry element to the signal processing mechanisms of vision. It has a limited span that is about 157 pixels (stationary resolution elements) in diameter. Because of its scanning capability, the eye is capable of a somewhat higher acuity (depending on contrast) than suggested by the pixel size calculated for an imaging sensor.
The span of the foveola is compatible with perceiving only a handful of characters in a single group at one time. Similarly, it is capable of perceiving only one or two pictographs at one time (making somewhat better use of the available area of the foveola). Once such a group is perceived, it can be interpreted and placed into the initial cognition file and be compared with the experience of the individual as represented by his saliency map. Barring any conflict, the visual system then proceeds to the next character group and repeats the perceptual and interpretive process. In this way, the cognitive file grows until the complete concept is understood.
When speaking of language skills, the term syntax is a somewhat lower level concept that is frequently used interchangably with or replaces the term context. Here, syntax will always be considered a lower level term here. It refers specifically to the rules associated with the sequence of character-groups forming a concept. These rules are part of a larger hierearchal set that includes:
Here again, education leads to additional complexity. The multi-lingual person must first determine the character set (roman, cyrillic, Semitic or Sinhalese), sometimes the direction of procession, the language(Latin, English, French), and finally the font (block or script) of the text before proceeding to perceive and interpret the character groups.
The strategy employed in reading does not differ significantly from that for general scenes. However, to aid in communications, additional sets of rules have been adopted related to spelling and syntax that are not required for general scene interpretation.
When presented a scene containing text, the observer initially examines the complete scene as discussed above. It notes areas of the scene (objects) that appear to contain a particularly structured texture. If this object appears upon further examination to be text, the visual system performs an "entry saccade" that brings the line of sight to the expected entry point of the text (the upper left hand corner for most western languages).
After locating the first word (and character group of that word) the analytical procedure must initially determine the language of the text and the font of the character set used. It then continues based on the set of style, syntax and spelling rules for that language.
Reading involves the interative process of creating an initial concept file just as when examining a natural scene. However, there are two significant variations. The stylistic rules dictate the direction in which the text is to be scanned. The anticipated rules concerning the shape of individual objects ares replaced by a set of syntax rules unique to the specific language.
The syntactic rules include a stop symbol (the period) that defines the end of the presentation of a a particular concept. By this means, a sentence is formed that is similar to an object within a general scene. The concepts associated with multiple sentences can be grouped into a broader concept. This grouping is lableld a paragraph in pedagogy.
The following simple examples illustrate the process of perception and interpretation.
The following example shows how making a determination of the precise spelling in the first character group of the third word, and checking for that form against the saliency map, changes the concept contained in the initial file created during the reading process.
Each of the three letter, and one two letter, groups in the first line is the target of an individual minisaccade. The character-group as a whole is then scanned by a series of horizontal and vertical microsaccades to determine its meaning. In the second line, one word is too long to be perceived and interpreted completely (at the geometric scale of the text). It is treated as a two-group (syllable)word. The first minisaccade brings the character-group "aro" onto the foveola and the next minisaccade brings the term "und" onto the foveola.
As a result of the saccade sequence, and the continual checking of the spelling of each character-group, these two samples are interpreted slightly differently and result in distinctly different concepts. No conflict has arisen in the perception and interpretation of these simple sentences.
This example is obvious. Following the second saccade following the entry saccade, the initial file contains a conflict. Boxes do not run.
This example merely illustrates how additional precision can be added to the concept using a suffix or an adverb.
In both of the above examples, the additional character-group within the two longer words would have caused an additional saccade to be added to the initial saccade sequence list. This would slow the reading process. However, the system has adopted a methodology to avoid this problem a high percentage of the time. After interpreting the first character group in boy or in obliquely, it frequently makes an estimate of the impact of the second character group of the word and calls for an immediate saccade to the next word. In the following example, the error is of negligible importance.
However, proceeding in this manner can result in trouble. In the following example a conceptual conflict arises in the first sentence several saccades later.
When such a conflict is encountered, a regression saccade is called for. This saccade takes the image projected on the foveola back to a point where resolution of the conflict is possible. The word dressed must be distinguished from the word dresses by using two saccades.The frequency of multiple case specific suffixes contributes to the difficulty of a novice trying to read Russian.
This work will not address the perception and interpretation of the glyphs of Asian languages. The procedure appears similar to that used for text in Western languages although the gross scan pattern is frequently up-and-down rather than side-to-side. The span of a single pictograph can obviously not exceed the capacity of the foveola or multiple minor saccades will be required to perceive and interpret it completely.