To most people, fractals are the beautiful, ever-changing images that result from constant iterations through a fractal algorithm developed in the 1920's. This article describes an important application for fractals - an application that just might revolutionize the way businesses look at their data.

Fractal Databases - New Horizons in Database Management

by Lisa Lewinson, PC AI, March/April 1994.

Introduction

A fractal (See Figure 1) is far more than a pretty face. Fractal-based techniques have compressed and decompressed images since the 1980s. Standard image compression and images that move through phone lines typify fractal geometry at work. Today, Microsoft uses a fractal decompression routine for its CD/ROM products.

Until recently, however, the application of fractal geometry to hardcore business problems has been limited. No one, for example, used a fractal to represent data as shapes with particular characteristics. As we'll see, such a representation allows rapid and flexible data analysis.

Order from Chaos

At its core, fractal geometry is linked to the study of chaos, a search for order hidden in the apparent disorder of systems. The search for order gave rise to another idea: the application of fractal geometry to database management - not merely small databases but decision support systems that work with terabytes of data.

Andre Szykier, Chief Technology Officer of Cross/Z International, recalls that he and his cousin (and frequent business partner), Mark Chroscielewski, were shushing down the back slopes of Vail in 1987. Szykier had recently been involved in a proposal for the Cruise missile guidance system.

The work involved constructing a map of the last stretch of terrain that a Cruise missile would have to fly over. (There was no satellite positioning at the time). The terrain data came from satellites that had mapped the area. The only way to keep the missile on track was to follow computer-generated directions that described what the missile was flying over and to make necessary corrections.

After ample research, Szykier proposed the use of a fractal equation to represent coordinates for longitude, latitude, and the terrain's altitude. (A fractal equation produced the pattem in Figure 1.) He drew his solution from existing image compression algorithms.

Fractals Meet Data Management

Back to the ski slopes. Chroscielewski ("Cross" for short) was then Senior Marketing Executive at Chase Manhattan Bank. He was having problems building a massive customer database (30 million names), with data from five different credit bureaus. The sheer volume of the data choked Chase's mainframes.

Cross told Szykier ("Z") about the problem. Szykier offered to try to compress the data. Many slopes later, back at the ski lodge, Szykier phoned an associate and asked him to review his files on fractal decompression to see if the technology could work with more than the three levels they had researched for the Cruise missile contract. To explore data effectively, the fractal compression algorithm would have to handle up to seven levels of data. Szykier's associate said he thought it was possible.

Cross and Z immediately went to work. "Within two weeks, we built the first prototype of a fractal database for Chase Manhattan Bank," says Szykier. "We took 30 million records and compressed them onto a PC, and we demonstrated that you could do exact counts instantaneously. Chase did a full-query comparison with their mainframe - which took them several days - and saw that our instantaneous counts were correct. These speed trial encouraged us. In 1988, we decided to form Cross/Z International to specialize in fractal databases."

Today, Cross/Z's clients include Federal Express, Allstate Insurance, American Express, and others. The fractal database management system (F-DBMS) can compress hundreds of millions of records and even billion-record databases, allowing users real-time access on their PCs, using accurate counts.

A query may be of any type (e.g.,"how many customers out of 44 million bought silk scarves in Los Angeles at store #X in the late afternoons in April?"). To generate an answer, however, the system must translate the query's fields (in this instance, custom number, product SKU, city, store #, time of day and month) from physical data into a mathematical representation called a data map or a fractal map.

Building a Fractal Map

According to Szykier, the fractal map stores data as coordinates. Each coordinate, represented by an eight-byte number, is the result of recursively solving a many-term polinomial equation. "Think of this process as storing shapes," Sykier says.

"For example, one could have 10,000 hourly temperature observations from Hong Kong. If you plotted these points, you could see how temperature fluctuated over time, and you can graph this fluctuation with a resulting shape. We compute the equation to represent the curve or shape exactly. We ask, how can you discard the original data (10,000 observations) and store one data point - the fractal. The answer to this question plays a large part in the data compression, with 10,000 to 20,000 bytes reduced to eight bytes.

"What is remarkable," he continues, "is that you can reconstruct the exact shape of the curve from the fractal and arrive at any of the original data observations for the temperature. Even more interesting is that, when you calculate 'shapes' for more than three dimensions, the process is still the same, although the calculations are considerably more intensive."

Conversion

Szykier explains that data items such as strings, images, and arrays are converted to mathematical constructs and stored as coefficients of non-linear polynomial equations. An m-dimensional map of the information serves as a database for relational queries. Typical retrieval time for any query, no matter how complex, is less than one second.

"Fractal compression using variants of the affine transformation method is very efficient for data extraction," Szykier states. "Thirty-two and 64-bit arithmetic calculations are repeated until set of coefficients are solved for the polynomial equation. We use proprietary mathematical algorithms employing fractal, non-Euclidean geometry to take advantage of 64-bit computing (on the DEC Alpha) to compress data 10,000 to 1. The 64-bit chip is allowing us to move into the client/server environment."

The Mirror Datafile

Szykier explains that the F-DBMS's mirror datafile stores a set of mathematical equivalents for each data record in the source database in a one-to-one correspondence. "Every data element receives a corresponding numeric value," he explains. "The collection of numeric data elements corresponding to each source data record can be stored using flat, hierarchial, relational, or networked methods."

"Relational queries are answered by extracting the set of data elements or variables, for all records in the mirror file. The set of (m) variables extracted from the database of (n) elements is compressed into all possible m-tuples. This results in a small data map of every tuple found in the dataspace along with its frequency count, which represents the number of records in the dataspace sharing the same values for the data elements defined by the tuple."

What Fractal Maps Do

According to Szykier, "Fractal maps are an intersection of all the attributes that exist in the data. They are like coordinates in space (n-dimensions). This creates a one-to-many map of relationships occurring in the data. Data is further compressed [using the kinds of procedures used in PC data compression tools like PK-ZIP].

"Compressed data maps are stored on file servers and accessed through standard networked personal computers. A typical 1-terabyte data source can be stored on a .5 gigabyte file server. Since data maps are passive and read-only, any number of users can query and retrieve answers to any relational query in real-time. This fulfills the goal of the decision support system. Updates to the F-DBMS are done in batch mode."

The key advantage of the data map concept, adds Szykier, is that the system only touches the data once. SQL (Standard Query Language) requires that every row in the data structure be examined to qualify the request. That is, an SQL system touches the data in response to every request. Cross/Z's approach touches the mirror database just once, while allowing the user to formulate queries as frequently as one per second.

Szykier feels strongly that this approach makes sense when databases become too large to navigate.

The Biggest Challenge - Selling the Technology

Cross/Z decided early on that technical discussions about fractal geometry were not going to sell the F-DBMS to I.S. departments. The company therefore chose to emphasize functionality, not technology, in its marketing strategy.

Says Chroscielewski, "I don't think any of our customers have asked us to explain the intricacies of a fractal map. I'm not sure they even care. We demonstrate to our customers that understanding the intricacies of fractal geometry is not required to appreciate the technology. What speaks for itself is easy access to exact segment counts from complex queries on very large databases without waiting for hours for a query to run. This kind of data analysis is unprecedented, and our customers have saved millions of dollars through this technology."

Greasing the Engine

Since it was first introduced, the fractal engine has been through many changes. Cross/Z has improved the engine's throughput efficiency and increased the amount of data a user can examine.

"The engine can theoretically produce views of up to 50 dimensions of data;' says Szykier, "but 50 by 50 is so large that it's inherently sparse. A lot of cells have zero counts and many cells have just one entry. We discovered early on that people can drill down to seven levels of data without getting lost. So we concentrated on expanding the number of variables that can be grouped together. We can now display segment counts for an array of 15 by 15 variables, and we're about to go to 22 variables, 7 levels deep. Our next step is to display counts for 50 variables, 7 levels deep, since the 64-bit chip supports that."

For example, says Chroscielewski, a major consumer goods company has an F-DBMS with 40 million records from a DB2 environment. "Even with a complex query," he notes, "they can cycle through the whole mirror database in 15 minutes. This gives them a data map which shows all the intersections that occurred. Once that map is built, they can ask similar questions of the data and incur no price penalty at all. As a result, F-DBMS users become intelligent extractors rather than ad hoc queriers."

Working With Fractal Databases

Although a mirror database has no theoretical limit, Chroscielewski points out that most clients select between 25 to 150 variables for their F-DBMSs. They also typically create multiple mirror databases.

According to Chroscielewski, Cross/Z's interactions with clients generally follow these steps.

  • Data extracts are taken from the client's operational data or the data warehouse.

  • Either Cross/Z or the client transforms the data into an F-DBMS containing all the variables to be used in data views (i.e., into the fractal map).

  • The F-DBMS is accessed by the fractal engine that can run on a PC host or client/server environment under Unix.

  • Using Cross/Z's Private Eye Launch tool on the PC or Mac, the user selects particular variables to be included in a data view. The creation of the data view takes approximately one minute (for one million records) to fifteen minutes (for up to a billion records). Once the fractal engine has processed the data, results are accessible in less than a second via the LAN or standalone PC, using Cross/Z's Private Eye View product.

    Chroscielewski points out that data modeling algorithms such as non-linear regression are easier to build in the F-DBMS than in traditional systems, not necessarily because the Cross/Z technique is better, "but because you can build complex models much more quickly with fractalized data. You can build fully non-linear relationships and avoid sampling error. You have greater flexibility and a faster development and implementation cycle.

    The Shape of Things to Come

    "What is very intriguing for the future of database management," adds Szykier, "is that with the F-DBMS, you can create a complete description of a phenomenon without defining any structure for analysis. You can also choose any starting point in the phenomenon and explore a sequence of variable interactions without investing time and effort into data transformation, storage design, and access methodology. The net effect is unprecedented in database design. And because there are no hidden data structures, the information is structurally independent and portable among systems."

    Chroscielewski and Szykier are now mapping the future of the F-DBMS. They are currently discussing strategic business partnerships with major relational database vendors and makers of MPP systems, among others.

    Thinking further ahead, says Szykier, "We are committed to moving fractals into silicon, putting the fractal engine into a DSP circuit that can sit in a consumer-based product and map real-time consumer usage. Interactive TV is one example where a fractal database could store useful information about viewing habits. Placed in an automobile, a fractal database could measure performance and usage data that would be extremely useful for diagnostics and design. Fractal pattem detectors could be used in ovens, telephones, and washing machines, adapting their data maps with each use. They could be built so inexpensively that everyone will stick one of these intelligent chips into their hardware."

    Chroscielewski and Szykier have not had time to go skiing lately. This is probably a good thing. We might see fractal databases embedded in skis and ski boots, gathering data on terrain, technique, and time spent in ski lodges. The new deluge of data could create an overload on systems, burying marketers and analysts in an avalanche of information. On the other hand, the data blizzard might not cause problems at all - if the systems involved deploy those rugged fractals with the pretty faces.

    Lisa Lewinson is president of Northstar Consulting Services, Inc. (708-786-3922), a Chicago-based consulting firm. Northstar develops user- and developer-friendly intelligent applications and offers seminars.


    Recuadro

    Product Information