Can Semiotics Survive the Petabytes Era?

By Paul Bouissac

The main stock of semiotics – its epistemological substance so to speak – consists of a set of models that were constructed in order to account for small data samples. First efforts to bring some conceptual order in the diffuse if not confuse realm of symbolic behavior were most likely driven by survival needs. These efforts were grounded in natural semiotics, a basic pragmatic competence that allows adaptive inferences and anticipations. Semiotic behavior is indeed constrained by the necessity of correctly and quickly interpreting the moods and intentions of preys and predators as well as other humans based on limited information. This also applies to the obscure signs that were assumed to be given by the gods in response to human anxiety toward the future. No doubt this latter concern was at first closely related to reading the nature and progress of illnesses.

Historically, various speculative systems were constructed, notably when the data samples became greatly expanded through conquests and travelers’ reports. Methodical observations and experimentations apparently came much later. Nevertheless, semiotics under any other name has always dealt with limited data samples whose comparative examination led to more or less tentative patterns of generalization and classifications. Any universal system of signs or grand narrative of semiosis is by necessity speculative, somewhat like an epistemological gamble that may or may not lead to correct predictions. It has indeed always been constructed in response to limited data samples, some of which are simply made to fit the models such as the ones derived from thought experiments. From this point of view, it could be claimed that theory is fundamentally a symptom of ignorance. It is indeed well known that theories are historical phenomena that change when the human inquisitive industry makes new data available.

But what does happen when the data available are stored in the form of bytes numbering in the tens of thousands of trillions? In our daily use of computers and memory keys, we have gone in only a few years from mega- to gigabytes. But this was only the beginning as first terabytes, then petabytes became the current norm on a still ascending curve. Tera, Peta, Exa, Zetta, Yotta are units of information or computer storage of increasing magnitude. A petabyte is equal to one quadrillion bytes.

The recent assertion by Chris Anderson, the Editor in Chief of WIRED MAGAZINE, that theories are becoming obsolete in the Petabytes Era may be shocking for some people. Are not theories indeed the best asset of human intelligence? Or is it really? Are not many catastrophic errors stemming from beliefs in theories, let them be biological, economic, or politico-philosophical? It usually turns out that all the data available had not been taken into account or that some had been dismissed as irrelevant. But it is often the case that it was impossible to gather enough data or that some kinds of data were simply inaccessible. Wishful thinking data may also be creatively imagined when human intelligence is taken over by irresistible theories.

Anderson’s essay appeared in the July issue of WIRED (16:07), and was abundantly discussed in EDGE , a magazine online edited by John Brockman that fosters dialogues among scientists and philosophers confronted by cutting-edge ideas. During the last sixty years, Anderson notes, the computer made information readable. Then twenty years ago the Internet made it reachable. Now search engine crawlers have transformed all this into a single database whose sheer dimensions demand totally new methods of inquiry. “Petabytes allow us to say ‘Correlation is enough.’ We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.” Anderson’s provocative argument is supported by current scientific developments. It triggered both supportive and bitterly critical, albeit mostly defensive comments. Epistemological niches which, like semiotics, consist more of models than data, should feel particularly concerned. In other words, does semiotics need theory? Can semioticians and their nineteenth century models survive the Petabytes Age and its further advances? It seems that, at the very least, an aggiornamento is highly desirable.

These considerations should be pondered in the context of the special report on “Science in the Petabyte Era” published in Nature (4 September 2008) (1-50). Of particular interest is the essay on “Distilling meaning from data” by Felice Frankel and Rosalind Reid (30). Searching for meaning in large datasets is today’s challenge. How to find unexpected patterns and interpret evidence in ways that frame new questions and suggest further explorations? Old models can be serious hurdles. The authors point to the importance of “the tools of visualization”. They advocate the collaboration of graphic and communication experts, and cognitive psychologists with scientists engaged in data mining and new knowledge representations. As a step in this direction, they refer to the workshops on Image and Meaning run by Harvard University. What becomes of the science of signs in the age of digital representation and simulation? Why are not most semioticians engaged by these challenges?

Can semiotics survive the Petabyte Era, then the Exa, Zeta, Yotta information universes that are on our doorsteps? The obvious answer to the questions raised by this editorial is: yes, as long as semioticians become better informed and can adapt their models and tools of inquiry to this new epistemological environment. They will also have to demonstrate their relevance to the advancement of knowledge in this context.