First Monday

Digitizing More Than Organizational DNA by Jonathan Riehl


 

Contents

A (Somewhat) More Serious Introduction
Life in an Explicit, Decentralized, and Named World
Life in an Implicit, Centralized and Enumerated World
A Bridge Between Words, a Bridge Between Worlds
An Even More Serious Conclusion

 


 

It's 7:30 A.M., aliens have finally made contact, and their intentions are very hostile. Before they reduce the planet to dust, however, they are offering to host the interstellar equivalent of a web site in memoriam of our species and its accomplishments. The site will be built using as much digital data as we can provide them before noon. Luckily, your organization is built around knowledge. The very guts of it already exist in a digital format that can readily be put in a compressed archive and uploaded in time. However, what kind of life will your organization, its ideas, and its values be able to live beyond this rather inconvenient turn of events? Even though your site and its pages are optimized for the Google page rank algorithm, will they fare as well with a more alien front end? Do the aliens even understand RDF?

This paper takes an (hopefully not too) irreverent look at some of the features of current knowledge management technology, and looks at where it fails to do more than just encode knowledge, but actually manage it. Knowledge management begins with Vannevar Bush, the foundations beneath the web, and ends with the birth pains of the semantic web. Or does it? Maybe it begins with Descartes and ends with computational linguistics and machine translation, with some algorithmic information theory thrown into the middle? Regardless of end points, it would appear information technology is going in the same direction: all over the place (even into outer space). This paper will provide two contrasting walks of information technology, with the second offering a less often heard spin. Once these walks are complete, it proposes a means for bringing these two directions together in an attempt to address organizational needs, and possibly the management of all human knowledge, before our time on this planet is up.

++++++++++

A (Somewhat) More Serious Introduction

Now that the abstract has grabbed your attention, this paper would like to pose a contrasting situation that might sound more plausible, and then compare this to a world aligned more with the abstract's premise. To assist in comparison, the following three subsections fall along axes where the possible worlds are complete opposites.

Distributed Versus Centralized Data
The following hypothetical situation is so plausible as to be indistinguishable from the world you most likely find yourself in as you read this. Let this fictional world be called World A. In World A, one critical threat that drove information technology's advance wasn't space aliens, but nuclear weapons. Since whole cities could be lost in even the most friendly forms of nuclear warfare, data networks in this world are distributed and decentralized. This design is so perfectly decentralized, the population of this world draw their information network as a cloud.

World A's existence implies that there should at least be a World B. This section now asks that one assumes the existence of World B, and, as promised, this is a world more worried about the situation posed in the abstract of this paper. Specifically, World B faces an external threat capable of eliminating everything at a single go. While the population of World B may still not want to keep all their data in one place, this paper assumes that they are willing to trust each other in the face of a common enemy and keep a central data repository, possibly with some redundant copies. The population of World B tend to draw their network as a central can, with everything else connecting directly to that can. This provides an initial axis on which one can compare World A and World B: data centralization.

Named Versus Enumerated Resources
With so many nooks and crannies in which to stash information, World A needed a scheme to keep track of where it was putting its data. This scheme had to play well with the distributed and heterogeneous nature of the network. It therefore had to identify two things: where a dataresource lived, and how that resource might be accessed. World A already had two ways to locate a resource container, as well as container specific naming scheme, so these were thrown into the name stew. The heterogeneity of World A's network also fostered a wide variety of protocols that were layered upon each other. This meant that the naming scheme also had to include a protocol in the name (including the protocol in the name gives an easy out from some of this satire since the name scheme can still be switched up by specifying particular protocols). The combination of all these names creates a simple enough super-name that the average World A-ian can easily remember and trade with friends, given a text editor or email client.

A centralized data store gives World B a somewhat more enviable position when its population tries to locate resources. Much like the Library of Congress, the Big Can, as the central database might be called, can simply give a number to each resource. Once a number is issued, a variety of additional resources can be used to map arbitrary names to these numbers. The key at this point is simply remembering the number corresponding to the central lookup table. This table and the key to find its root form the card catalog for the Big Can. Enumeration versus text names form a second axis on which these worlds can differentiate themselves.

Explicit Links Versus Implicit Links
Armed with a large, well distributed network, and the means to easily address resources, World A is finally getting somewhere. The content of World A can be put in context, embedded and cross-referenced. This is done by organizing data in a huge web and making documents explicitly link to other resources on the network. This added context to content and made citation a snap. Furthermore, the addition of automated probes into the resulting network made finding content almost manageable. To help these probes figure out what was going on, the people of World A slowly started adding link qualifiers to links.

With everything centrally stored, filed, and referenced, World B is getting somewhere as well. However, the population of World B is madly trying to give voice to its ideas before total annihilation, and doesn't have much time to cut and paste explicit links, much less contextualize them. Luckily, the Big Can was not always big enough at points in its evolution, and the information technologists of World B had to start using compression to eliminate redundancies. When they eliminated redundancy, the World B-ens also noticed that they were creating implicit links. World B-ens could simply enter a concept and find it automatically replaced by a number that was language neutral, and indexed instantly via the Big Can's card catalog. The final axis of differentiation between World A and B are their linking schemes: explicit versus implicit.

Three Axes, Three Dimensions
Figure 1 summarizes the differences between World A and World B. The horizontal axis indicates the amount of data centralization in these two worlds, with the amorphous cloud on the left and the omnipresent can on the right. The vertical axis indicates how data is addressed, with text names at the top, and numbers at the bottom. Finally, the skewed and flattened depth axis differentiates how data is cross referenced, placing explicit links at the fore and implicit links in the background. In the upper-left-frontal area, one finds World A. World B is in the opposite corner, and thus in the lower-right-back area.

The distance between these fictional worlds is quite a span to bridge. However, this span is being bridged right here on planet Earth, and without the benefit of alien threats. The rest of this paper makes this reality and its bridges more concrete.

Figure 1: Wherein two worlds are conceptually mapped in 3D

 

++++++++++

Life in an Explicit, Decentralized, and Named World

In order to ground the fictive worlds in the introduction to our world, this section and its evil twin, which follows, investigate world views instead of worlds. Further qualifications apply since the flesh and blood thinkers responsible for something like World A have refused to retroactively think inside the boxes of Figure 1 (although no email kindly asking their assistance has been sent). In fact, Vannevar Bush's memex was actually much closer in concept to the Big

Can than the World Wide Web. Ted Nelson's Xanadu would have involved finely granular copyright traceability and payments, most likely managed via a central authority. Still, both of these visions fed into the creation of the web.

The closed and centralized systems that came before the web could not fully withstand a free, open and decentralized document framework. Furthermore, the World Wide Web fit easily on top of the nuke resistant network. By extending and embedding its naming schemes, process and openness, the web has become the Internet to the majority of Earth people. The following subsections address some of the best and most fundamental ideas of the web. These ideas are the ones that helped to carve out the mostly decentralized, textually addressed, and explicitly linked world view that currently dominates information technology.

Tim Berners-Lee to the World: Address Everything!
Tim Berners-Lee's contribution to the world started when he was working at CERN in Switzerland. His job at CERN loosely consisted of making physicists happy by providing them with tools for reporting and sharing their findings. However, there was an attractive bonus for making things scale to the Internet: more minds and eyes could play along if he could somehow make the documents and tools he was working on scale to more than just a local network. His solution was the universal resource identifier. The universal resource identifier specification provided a means for addressing most content on the Internet. This naming scheme worked well with the Internet because it incorporated its protocols, embedded the domain name system (DNS), and adopted much of the Internet's design philosophy. Specifically, Berners-Lee's naming scheme followed the Internet design principle of keeping data human (or at least techie) readable, constraining most data to strings of 7-bit printable ASCII. While the farce presented above takes a brief poke at how this scheme isn't necessarily friendly to either human or machine eyes, it was certainly superior to remaining in the dark ages of protocol specific or proprietary naming schemes.

Addressing everything does not make a web. The second contribution Tim Berners-Lee made was to tie his addressing scheme into a document format. (The third and final piece was the stateless transport protocol, HTTP, which is not important to the points this paper is making; besides, adding a stateless versus stateful axis would require the use of special four dimensional glasses.) The resulting document format was an instance of the much more complicated SGML format, but constrained it to the use of specific tags and attributes. HTML allowed users to embed URLs into their documents, either explicitly embedding resources into their document, via the "IMG" tag, or creating hyperlinks to other resources via the "HREF" attribute of the "A" tag. The resulting web of documents was dubbed the World Wide Web, a consortium was formed, and the world could rest easy because its knowledge management issues were fully addressed, so to speak.

With an addressing scheme to distinguish vertices, and a document structure that allows addressed resources to form directed edges to other resources, the web forms a directed graph. The issue of indexing into the web was solved to some degree by the creation of programs that automatically traversed this graph, and collected information about its content and structure. However, what is explicit to a human is not necessarily explicit to a machine. While search engines continue to compete, they are bound by certain limitations of the web's design. The highly contextual nature of human languages makes inferring both the meaning of a document and the intension of a hyperlink sketchy at best. For example, there are no explicit cues to differentiate a link as being to the web page of a document's originator, or to web site of the arch-enemy of the author. While positional and textual cues may assist a machine, solving this problem with any certainty amounts to making machines that can reliably understand human language (currently classed as an AI-hard problem). Luckily, Berners-Lee and the World Wide Web Consortium are trying to solve this problem, as a later subsection will explain.

Ward Cunningham to the World: LinkEverything!
Earlier there was mention of thinkers refusing to think inside solely one of the two boxes presented in the introduction. Ward Cunningham is one of those thinkers. Cunningham invented the Wiki Wiki Web as a collaborative forum for the discussion and capture of software patterns. He did this on a sole server, and leveraged this centralization to grant two benefits: anyone could edit pages on the site, and the cost of creating content was lowered using shorthand markup.

The web was originally designed to allow edits to be made to arbitrary pages. However, this feature was commonly left disabled for either security reasons, or simply to protect against vandalism. By reducing the scope of content on the original Wiki Wiki Web site, Cunningham's openness fostered the growth of a self-policing culture that allowed the site to remain quickly and easily modifiable by casual and anonymous users.

The site's mutability was coupled with a shorthand markup convention that lowered the amount of text needed to add both style and linkage notation to documents. The most striking of the shorthand conventions was the addressing scheme for internal pages: links could be made by simply joining two or more capitalized words together. Requests for new content or clarification could be made implicitly by simply creating a link to a page that did not previously exist.

The result of the Wiki Wiki Web is a centralized, explicitly linked, directed graph of information. The success of the Wiki Wiki Web is reflected in its adoption by numerous collaborations and projects as a means for capturing documentation and ideas. While other implementations of the Wiki Wiki Web seem to favor adding more linkage markup, it is not too much of a stretch to see how Cunningham's design points towards document centralization and implicit linkage, previously posed in this paper as being a world away.

Tim Berners-Lee to the World: Address Everything, but Twice this Time!
Two subsections ago, Tim Berners-Lee had created the World Wide Web, which has become a very useful tool for people to collaborate and share data. However, the web is currently leaving enough context implicitly embedded in human language so as to be a difficult place for machines to browse, categorize and index data. One solution to this problem is to add metadata, or data that describes the relationship between data. While the web framework does not inherently prohibit metadata, and HTML does in fact support a "META" tag, it does not uniformly explain how to include metadata for the purpose of explaining relationships between resources.

The resource description framework (RDF) attempts to provide a uniform means of specifying metadata. RDF does this by adding triples of universal resource identifiers (URIs) to the kinds of content one might host on a web site. For example, to specify authorship, one would provide three URIs in a RDF document: the URI for the web page, an URI representing the authorship relationship, and an URI that references additional content about the author. The overall result is that the directed graph of the web becomes a directed graph with labeled edges, so a computer now has more explicit information about relationships between web resources.

When RDF is coupled with additional program logic and uniform requirements for relationships, the result is the Semantic Web, a web for both humans and machines. One benefit of these additions is that they allow databases to be uniformly exposed on the web. For example, a database table might be encoded in a single RDF document by keeping triples representing three things: the primary key of a row, the table column, and the value stored in that column for that key. By encoding databases and database schema to the web, more of the "hidden web" is exposed to navigation aides for both humans and machines. Previously, this data has been exposed via name munging in URI's in non-uniform ways that were not easily understood by arbitrary web software.

The ideas behind the Semantic Web are being adopted by the wiki community as well. For example, some Wikipedia documents embed metadata into themselves by providing a table with two columns. The containing page forms the implicit first part of the relationship triple. Each row encodes the relationship in the first column, and the result in the second column. The wiki community is working to make this more uniform, and compatible with the Semantic Web while still lowering the cost of (meta-)data entry.

An Explicit Summary
This section and its constituent subsections have explored two dimensions that could be plotted on yet another Cartesian plane. The first dimension is the weight of the markup and naming schemes. Unlabeled versus labeled edges form the second dimension. The World Wide Web presents an example of a decentralized system with heavyweight markup and unlabeled edges. The Wiki Wiki Web provides an example of a centralized system with lightweight markup and unlabeled edges. The Semantic Web moves into the second dimension by coupling even more markup with labeled edges. Finally the Semantic Wiki reflects the Semantic Web's labeled edges, but is paired with the Wiki Wiki Web's lightweight markup and data centralization.

 

++++++++++

Life in an Implicit, Centralized and Enumerated World

Before one looks too closely at the title of this section and recoils in horror from the fascist nightmare it implies, it is important to note two things: this is currently how most machines view any possible world, and this is also a good way for organizations to work efficiently. Furthermore, the previous section has already illustrated how centralization was leveraged in the Wiki Wiki Web to lower the cost of linking documents. While this section does not have the benefit of widely available software to fully demonstrate its ideas, these ideas have a long history and can be demonstrated in part by coding techniques such as ASCII, compression techniques such as Huffman encoding, relational databases, and parts of the Semantic Web design.

This section uses fiction once again to create a more favorable world for its views to be clearly expressed. The following subsections will trace the operations of a fictive organization called DescartesCo. DescartesCo's mission is to create something similar to the Big Can of earlier fictions. However, DescartesCo also ties in with reality on several fronts, as its primary concern is universal truths that can transcend the ephemeral worlds of both fiction and perception.

René Descartes to the World: Enumerate Everything
This section now asks that you pretend you are a new hire at DescartesCo. Upon inking the terms of your employment, you are promptly assigned a cubicle, a terminal interface to the corporate intranet (also called the Not-As- Big-As-We-Would-Like-It-Can), and a hard copy of the employee manual. Like many employee manuals the majority of the prose is spent explaining benefits, compensation, and enrollment windows. However, one small set of corporate policies (appearing before the typical workplace safety boilerplate) stand out. First, you are to forget everything you know or think you know about how your job is to be performed. Second, you are to take each task given to you and decompose it into subtasks of such detail that further divisions would seem absurd. Third, you are to enumerate and attempt to order the subtasks based on their complexity. Fourth, and finally, you are required to provide proof that your enumerated tasks are complete with respect to your responsibilities. These policies may remind you somewhat of the second part of René Descartes' Discourse on the Method, but provided in more modern business speak.

While this approach starts your contribution to DescartesCo in a vacuum, you are still provided with tools on the intranet for entering your operational procedures and their constituent ideas. The most useful of these tools is an enumerator, something that is capable of providing and remembering numbers. The enumerator tool does break with the third guiding policy of DescartesCo's namesake, allowing the following inequality to be broken: the largest number used in the decomposition of a task or idea must be less or equal to the number assigned to the resulting task or idea. Besides requiring people to guess the approximate complexity for ideas they hadn't decomposed, enforcing such an equation would complicate decomposition of repetitive tasks. Mutual repetition of two possibly different tasks would require them to be given the same number, otherwise one task would have to be ranked greater than another and the lesser task unable to cue the repetition.

Other champions of enumeration include both a real and fictional Gottfried Leibniz. The real Leibniz proposed enumeration as a scheme for labeling, organizing and indexing books. The fictional Lebniz, as chronicled by Neal Stephenson, proposed distinction of a set of atomic ideas, termed monads, which were enumerated with prime numbers. More complicated ideas were then granted numbers by multiplying their constituent monads together. This method adds the first hint of implicit context, since it allows one to look at the underlying ideas by taking the prime factorization of any given number, which is unique. The fictional method just described was fictionally dismissed by DescartesCo for two reasons. First, it required the imposition of a single interpretation of the set of monads. This conflicts with the first policy described above, since each employee starts by throwing out what they thought of as atomic or given, and only arrives at atomicity once they think any further explanation would be unnecessary. Second, it does not allow unambiguous composition of ideas. Factorization only tells how many times a given prime can evenly divide a number, not the grouping and order of multiplication used to arrive at the resulting number. For example, this could result in someone mistaking the number for diluted acid to mean water added to acid, which can cause violent and unfortunate reactions.

Greg Chaitin to the World: Compress Everything
DescartesCo initially had some problems with employees taking its policies to extremes that severely affected productivity and utility. Specifically, some employees were taking great pains to explain each letter of the alphabet each time that letter was used in one of their compositions. These particularly earnest contributions to the corporate intranet were robbing the company of person years spent performing repetitive data entry. Furthermore, these compositions were hard to ship and even harder to sell, since their size was so large and their attention to detail so great they were unreadable (oddly enough, some of these articles are still kept in storage, just in case a space alien market does materialize). Luckily, the chief information officer had studied not just the practice of information, but its theory, as originated by Claude Shannon, and extended by Greg Chaitin.

The focus of much of information theory and algorithmic information theory is this: how many bits does it take to communicate something? Shannon observed that if all redundancy is removed from a message, the remaining bits are as few as they could possibly be without meaning something different. Information theory uses the ideal compression size of a message as a measure of information content, also called entropy and denoted as H(x) for an arbitrary message x. When messages are combined, the sum of redundancies can only increase or stay the same, meaning when message y is communicated after message x, the combined message (x,y) has an entropy equal to or lower than the sum of the individual entropies, or H(x,y) ≤ H(x) + H(y). Algorithmic information theory extends the idea of information theory by allowing messages to contain computations instead of data. This has the surprising feature of allowing infinite, random fractions, such as the fractional part of pi to be sent as a finite number of bits, encoded as an algorithm (it would still take an infinitely long computation to get all the digits in the "message"). Chaitin therefore adds a constant to many calculations of entropy in order to account for the minimal program that outputs an already minimal set of bits for a message

The principles behind enumeration and the practice of numbering tasks and ideas served DescartesCo well when it came time to apply both information theory and algorithmic information theory. First, coding a whole number, and hence enumerated tasks, into binary bits was much easier than doing so with other symbols or representations such as images or audio waveforms. Second, enumerations are associated with ordered compositions of other enumerations, which were intended to be either definitions or proofs. Therefore, where one definition or proof was found as the substring of another, the whole substring could just be replaced with the enumeration referencing the substring. This created implicit links where employees inadvertently repeated themselves, and allowed more astute employees to give lengthy explanations by just using several numbers explicitly. The equation above also meant that the more information input by an employee, the more redundancies could be exploited to compress things. Third, these enumerations were being assigned to not just ideas, but tasks. After some point of decomposition, they were no longer just explaining what they do, but how they do it. They were no longer just writing prose, but programming, and writing proofs that their programs worked.

Brain McConnell to the World: Conceptualize Everything
At this point, the intranet design of DescartesCo bears a structural resemblance to the web: a set of documents organized as ordered, directed, and unlabeled links to other documents. For representing, storing, and even indexing data, the current design is good enough. Without additional pressure, this story would most likely end. However, back in the fictional world of DescartesCo, the employees wanted to eliminate even more redundancy. While even Descartes did throw out his biases, he did not ignore things he could readily verify and prove were true. Similarly, the employees at DescartesCo wanted to reuse other procedures they found on the corporate intranet, ones they verified to be correct and were possibly more succinctly expressed. In essence, assuming their contributions were globally addressable, they needed metadata.

In his Dr. Dobbs Journal article, "Concept Oriented Programming", Brian McConnell discusses the interaction between global enumeration and both code and data. In his original design, each "concept" was given a number, and could be associated with arbitrary data. Some basic metadata was also associated with concepts, including a name, and the data format (such as Java bytecode, VRML models, or a definition written in Spanish). In "Beyond Contact" McConnell explicitly binds a context to numbers, allowing them to be interpreted as symbols, operators or just plain numbers. However, even these tag values are redundant if one is willing to start using arbitrary triples of enumerated values. For example, if a number is used to represent "file type", and another number is used to represent "PNG picture", then the triple (number, "file type", "PNG picture") would indicate that the given concept number referred to a PNG image.

To elaborate, if one assumes that 0 is a reserved enumeration meaning nothing, one can define a binary operation, " ×", on values as follows:

A × B = C if the triple (A, B, C) is in the system.

A × B = 0 otherwise.

In this scheme, the intranet can give everyone their own numeric space that still references globally visible content. This starts with a local number space, defined by the user's identification number, and arbitrarily called an UID here (which should not be confused with a universal identifier, which typically uses this acronym). When a user asks for a new number, the global system actually assigns two numbers to the composition the employee is about to enter. The first number is global and absolute, called a global identifier, or GID. The second number is most likely a smaller number called a local identifier, or LID. The user only sees the LID, but the enumerator silently records the triple (UID, LID, GID).

When the user wants to see, reference and cite the work of others, which is still addressed by GID, they may invert the process by using another operator, "/", defined as:

C / A = B if the triple (A, B, C) is in the system.

C / A = 0 otherwise.

If the result is zero, the GID refers to something they have not looked at, and they will have to look at the definition of the GID. If the GID's in the resulting composition are not in the map either, the user must repeat this process until they arrive at a set of compositions they understand. The "/" operator can also be expressed by simply adding (C, A, B) to the system whenever (A, B, C) is added (however, doing this breaks one possible convention that would keep the first and third items in all global triples as global numbers).

A Network of Ideas, an Implicit Summary
Much like the implicit linkage performed by compression made DescartesCo's system similar to the web, addition of triples makes DescartesCo's intranet structurally similar to the semantic web. However, the walk just taken added tasks, or software, from the start. Coding tasks allows encoding of not just organizational DNA, or data, but the processes that should be associated with that data. For example, if one's company switches their internal document format, new documents are not only globally visible, but so is function for reading the documents (encoded as part of the data type), as well as a proof of the decoder's correctness (encoded as part of the code itself).

A system similar to the one described in this section has already been developed as a proof of concept. Titled "A Network of Ideas" (or ANOI, for short), this system provides mechanisms for enumeration, compression, and metadata. Using coding tricks similar to the one described in the last subsection, the system is also self descriptive, since no enumeration may be given to a concept that does not at least have a place holder. While not intended for extra-terrestrial consumption, the property of being self descriptive could arguably come in useful for making not just the data, but the methods of an organization comprehensible to just about anything with basic computational machinery.

ANOI is not without its problems. If a virtual machine was added to ANOI and exposed to Internet users, denial of service attacks on the system would be almost trivial. The cost of decoding (decompressing) a page and translating it to HTML is bad enough without providing open access for people to write and run infinite loops. Compression is also expensive, and for its benefits to be maximal, each document would have to be re-compressed each time new information was added. Compression is also hindered by a decision to make numbers assignments immutable. The ideal method provided by information theory would require concepts to be dynamically ranked and enumerated based on their frequency of use, not their complexity, nor the order of the request (which is the current behavior of the ANOI prototype).

 

++++++++++

A Bridge Between Words, a Bridge Between Worlds

Bridging the intranet of DescartesCo with the Semantic Web is a straightforward process. Resources from the Internet may be read by DescartesCo (where copyright and site policies permit), with each resource being enumerated, and the link and metadata structures re-encoded as numbers rather than names. Conversely, DescartesCo may expose its global number space via a naming scheme (for example, concept 32768 becomes "http://descartesco.com/concept/32768" or something similar), its metadata via RDF triples, and its metadata structure as RDF schemas. If DescartesCo is feeling really ambitious, it might ask the Big Can of the Internet, the Internet Assigned Numbers Authority, to grant DescartesCo its own uniform resource identifier scheme (making "http://descartesco.com/concept/32768" simplify to something similar to "concept:32768"). This simple mapping may leave one asking: if these systems are so similar, why go to all this trouble? Instead of answering "for the benefit of alien intelligences" this time, perhaps one might find "for the benefit of machine intelligences" ultimately more satisfying.

One of the themes of the Semantic Web is exposing more of the underlying machinery of web understanding to the web itself. This theme is currently being handled in an additive fashion which keeps metadata decoupled from the resources being described. Additionally, the proposed solution uses addresses that humans can't readily read, and machines must spend time decoding into some internal representation. Instead, this redundant translation of text into a graph of numbers (or memory locations) could be exposed from the beginning. By droping the naming charade at the machine level (people would still use names that are embedded in an index as metadata) and adding implicit linking, common web understanding computations (such as constructing models of word frequencies) are simplified.

While a hypothetical extra-terrestrial intelligence might find a fully self-descriptive network of concepts, code and proofs useful in understanding a person, an organization, or a species, this property is also useful in terms of preparing our current information systems for increased durability in the Long Now (defined by the Long Now Foundation as the next 10,000 years). In this context the aliens become future generations of people living in a word that is saturated in information technology that people today can only begin to anticipate. A metadata rich, but compact set of self-descriptive data would be more readily portable to these future systems than the highly distributed, and often broken containers we are currently using to store our ideas, processes and dreams.

 

++++++++++

An Even More Serious Conclusion

There is no widely accepted evidence that an organization would be well served by preparing for demolition of the planet. Organizations do fail, however, and technologies will continue to evolve. This paper followed two lines of thought, both arriving at a similar underlying structure: a directed graph of labeled edges. One line of evolution does well in a heterogeneous and un-trusting world. The other line of reasoning does well when mostly managed by centralized machines. These systems can be bridged easily. Where one offers greater interoperability with the existing distributed architecture of our networks, the other offers programmability, compactness, and self containment.

This paper presented these ideas in the service of three hopes: The first hope is that the Semantic Web keeps the ideas of programmability, compactness, and self containment in mind as it continues to evolve new name spaces, standards, and programmability. The second hope is that forward thinking organizations will adopt these ideas and use them to not only expose their data, but also their processes and the reasoning behind those processes. The third, and final hope, is that organizations will be able to survive and evolve across gaps in both time and technologies by adopting self descriptive metadata and programming standards. End of article

 

References

Berners-Lee, Tim. “RFC 1630: Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web.” June 1994. 30 Apr. 2006 ftp://ftp.rfc-editor.org/in-notes/rfc1630.txt

Berners-Lee, Tim, and Dan Connolly. “RFC 1866: Hypertext Markup Language – 2.0.” November 1995. 30 Apr. 2006 ftp://ftp.rfc-editor.org/in-notes/rfc1866.txt

Bush, Vannevar. “As We May Think.” Atlantic Monthly July 1945. 30 Apr. 2006. http://www.theatlantic.com/doc/194507/bush

Chaitin, Gregory V. Algorithmic Information Theory. Cambridge: Cambridge University Press, 1987. 30 Apr. 2006. http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cup.pdf

Cunningham, Ward, et al. WikiWikiWeb. 30 Apr. 2006. http://www.c2.com/cgi-bin/wiki/

Descartes, René. Discourse on the Method of Rightly Conducting One's Reason and of Seeking Truth in the Sciences. 1637. 30 Apr. 2006. http://www.gutenberg.org/etext/59

Huffman, David. “A Method for the Construction of Minimum-Redundancy Codes.” Proceedings of the I.R.E. September 1953. 30 Apr. 2006. http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf

Lebniz, Gottfried, and Leroy Loemker, ed. and trans. Philosophical Papers and Letters: A Selection (Synthese Historical Library) Berlin: Springer, 1975.

The Long Now Foundation. 02006. 1 May 2006. http://longnow.org/

McConnell, Brian. “Concept Oriented Programming.” Dr.Dobb’s Journal June 1999. 30 Apr. 2006. http://www.ddj.com/184410968

McConnell, Brian. Beyond Contact: A Guide to SETI and Communicating with Alien Civilizations. Cambridge, MA: O'Reilly, 2001.

Nelson, Ted. Transliterature, A Humanist Design. 22 October 2005. 30 Apr. 2006 http://transliterature.org/

Riehl, Jonathan. A Network of Ideas. 30 Apr. 2006. http://www.wildideas.org/anoi/

Shannon, Claude. “A Mathematical Theory of Communication.” Bell System Technical Journal July and October 1948. 30 Apr. 2006 http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf

Stephenson, Neal. The System of the World. New York: Harper–Collins, 2004.

 


Editorial history

Paper received 1 May 2006; accepted 17 May 2006.


Contents Index

Copyright ©2006, First Monday.

Copyright ©2006, Jonathan Riehl.

Digitizing More Than Organizational DNA by Jonathan Riehl
First Monday, volume 11, number 6 (June 2006),
URL: http://firstmonday.org/issues/issue11_6/riehl/index.html