Image Markup

seeing-descartesIn our day to day lives we typically think very little about seeing. We simply use our visual gift until it is interrupted or diminished. Or until we try to devise a machine that can use or provide visual processing meaningful to humans. Then we begin to appreciate the enormous complexity of "making sense of the senses".

Upon consideration it turns out that much of perception is unconscious. What we "think we see" is an interpretive product of individual circumstance, physiology, and psychology.

Visual expression is aided by understanding and clarifying fundamental factors of image form seen in the human panoply of visual artifacts such as photographs, symbols, drawings, charts, diagrams, scenes, renderings....

Training of visual artists attempts to bring into attentional focus the kinds of factors that affect perceptual awareness and interpretive significance. Students explore exercises that challenge them to work for greatest sensual economy, to elaborate visual space most effectively by rendering various aspects of visual media to arrive at enhanced clarity and coherence of diverse detail orchestrated within integral form.

Shape and composition are two very basic factors underlying image significance. The most rudimentary kind of symbol is "one that looks like what it represents", i.e., an icon. Similarity of shape gathers related icons together simply because they appear to be of a kind. LAVA/SILK provides a notational system whereby computers can be used to retrieve objects that are similar in shape and composition to a reference object used as a search key, also called a SILKey, which encodes as text the results of LAVA analysis, thereby permitting rapid comparison and sorting of archival contents.

Locating and accessing relevant media objects within large collections requires coordinating both conceptual and expressive criteria, organizational schemes, and access tools. Historically, text-labeled hierarchical taxonomies have attempted to systematize visual media archives. But such literally grounded keys have difficulty conveying subtle relationships of appearance. To augment literal keys, collections of media objects can be marked up, organized and accessed in terms of their relative similarities in shape and composition, using textual keys to express, compare and sort items in relation to their visual configurations. LAVA/SILK tools used to build, maintain, and access such collections can combine traditional text labels and support perceptually relevant metadata using shape and compositional markup data.

Markup, whether accomplished manually or by machine, is the process of graphically delineating significant parts of an image to more effectively communicate its visual syntax or manipulate its constituent parts.

In Machine Markup a computer program automatically analyzes rasterized details of images into their equivalent vector forms for LAVA /SILK processing.

lava-leaf 200w

For example, LAVA analysis of the Sweetgum leaf contour shown here generates a textual SILKey that can easily be compared to SILKeys of other images in an archive or other collection to cull and gather candidates in terms of a given degree of similarity (Similar images will have similar SILKeys): bd++++abd8+9++5+67a+78db+77e9226a526+74abb97+ec+++7++21++777+188++75+++++++++++++


Manual Markup Example :

Manual Markup may be needed to address complex or subtle relationships in imagery. The world is too complex and richly detailed to take it in all at once. Seeing is a matter of paying attention to some part(s) of the visible world, one chunk at a time, until we assimilate their overall sense, and gather how their individual subparts are related.

1. Consider the photograph of a young girl at the beach, wearing a straw hat, looking out to sea. What are the relevant visual factors of the image? How are they best organized to succinctly delineate their coherent sense?girl on shore

2. Image markup occurs on the plane of the image format (usually a rectangle); it isolates and relates the various items of interest to which a viewer will likely pay attention.

3. Attentional frames of markup express deployment and relationships among sub-components of the image.



The total space of attention of a rendered image is denoted in markup by a baseframe rectangle. Usually the baseframe coincides with outside edges of a media object such as a photograph or drawing, but can be relocated to isolate any subordinate part of what is available to view--thereby cropping the original. The photographer''s camera has already defined a baseframe in this version. If we were interested in only the the figure, however, we could focus on it by isolating her form within a different rendering by rearranging the baseframe. The sense of the image is founded in the framing of its parts. Change the framing and you alter its gos bf

This visual dynamic constantly occurs in ordinary daily life. Our eyes constantly search among features available to view, picking out, "framing for the moment of attention", sub-framing subordinate objects, and then moving on to the next cascade of attentional moments, and then the next, and so on...a flurry of glimpsed focal acts ''automagically'' pulled into stable and coherent sense. "Paying attention" is a process of framing what is of momentary interest and ignoring everything outside its bounds. We do it so easily and so constantly that usually we are totally unconscious of all that we have taken in! The mercurial ease with which our cognitive process sentiently attends to life belies an unfathomable complexity of processes at work. It requires effort and concentration, such as training in visual arts, to appreciate the immense welter of detail that our adaptive unconscious automatically pumps into our conscious awareness.

LAVA processing ignores any image data outside the markup baseframe.



In a photograph the photographer's camera has already "framed the scene", i.e., fixed the baseframe that shows a scene from the camera's viewpoint. Its format rectangle sets the baseframe. But an interesting scene will contain a number of subordinate items or areas of interest. Each of these competes for our attention (which is what makes them "interesting"). Each is a "mini-scene", a scene within the larger scene. We can articulate this by marking up the image to dilineate its attentional subframe. Such subframing shows where subordinate parts are important to the total sense of the gos bf lines






The eye''s focus tends to move across the girl, her hat, and the ocean, noting and detailing features of interest among separate attentional moments, each with its own nested subframe(s).

 mu gos bf f1      mu gos all







Seeing a scenario such as this young girl at the sea shore comprises nesting awarenesses of so many percepts, concepts and psychic schemas. The scenario itself would be multi-dimensional (hence the notion of sentiarity and to the extent that the 2D image conveys those farther dimensions it would invoke a corresponding order of limnarity). We set analytic frames to account for the limned attentional frames, as well as to gather and orchestrate "sense" from its gos all

Consequently, in attentional frames of the marked-up example:
- baseframe gathers the entire scene,
- subframed girl "anchors" image viewpoint, and
- subframed hat concentrically dominates image center.


Regularities in the SPOT numeric field enable notation for analysis of fundamental factors of visual form such as ''positive/negative'' shape and rhythms of pictorial composition. An example of formal application of such notation using Scalable Vector Graphics (SVG) for image markup and analysis is described in Expressing Shape and Composition by LAVA/SILK, Howard Jones, 2005]