Patented Technology: Web Intelligence Engine

Recorded Future is bringing a new category of analytic tools to market.

Unlike traditional search engines which focus on text retrieval and leaves the analysis to the user, we strive to provide tools which assist in identifying and understanding historical developments, and which can also help formulate hypotheses about and give clues to likely future events.

We have decided on the term “temporal analytics” to describe the time-oriented analysis tasks supported by our web intelligence platform.

Understanding Our Web Intelligence Engine

Although the focus of Recorded Future is on temporal analytics, a comparison with traditional search engines is inevitable – since search is one important aspect of analytics.

The history of search goes back to at least 1945, when Vannevar Bush published his seminal article “As We May Think,” where among other things he pointed out:

“The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”

In the decades to follow, a lot of work was done on information management and text retrieval / search. With the emergence of the World Wide Web, both the need and the ability for almost everyone to use a search engine became obvious.

An explosion of search engines followed, with names such as Excite, Lycos, Infoseek, and AltaVista. All these first generation search engines really focused on traditional text search, using various algorithms but really looking at individual documents in isolation.

Google changed that, with its public debut in 1998. Google’s second generation search engine is based on ideas from an experimental search engine called BackRub. At its heart is the PageRank algorithm, and this is the core of Google’s success (together with clever advertising-based revenue models). The main idea of the PageRank algorithm is to analyze links between web pages, and to rank a page based on the number of links pointing to it, and (recursively) the rank of the pages pointing to it. This use of explicit link analysis has proven to be tremendously useful and surprisingly robust (even though Google continuously has to tweak their algorithms to combat attempts to manipulate the ranking algorithm).

Recorded Future has developed an analytics engine, which goes beyond search and explicit link analysis, and adds IMPLICIT link analysis. Our software seeks the “invisible links” between documents that talk about the same, or related, entities and events.

How do we do this?

By separating the documents and their content from what they talk about – the “canonical” entities and events.

(Yes, this model is heavily inspired by Plato and his distinction between the real world and the world of ideas.)

Documents contain references to these canonical entities and events, and we use these references to rank canonical entities and events based on the number of references to them, the credibility of the documents (or document sources) containing these references, and several other factors (e.g. co-occurrence of different events and entities in the same or in related documents is also used for ranking).

In addition to extracting event and entity references, Recorded Future also analyzes the “time and space dimension” of documents – references to when and where an event has taken place, or even when and where it will take place – since many document actually refer to events expected to take place in the future. We also compute another set of metrics, called sentiments, which determine what attitude an author has towards his/her topic, and how strong that attitude is – the affective state of the author.

The semantic text analysis needed to extract entities, events, time, location, sentiment, etc. can be seen as an example of a larger trend towards creating “the semantic web.”

The time and space analysis described above is the first way in which Recorded Future can make predictions about the future – by aggregating weighted opinions about the likely timing of future events using algorithmic crowdsourcing.

On top of that, we use statistical models to predict future happenings based on historical records of chains of events of similar kinds.

The combination of automatic event/entity/time/location extraction, implicit link analysis for novel ranking algorithms, and statistical prediction models forms the basis for Recorded Future’s Web Intelligence Engine. Our mission is not to help our customers find documents, but to enable them to understand what is happening in the world.

This is Recorded Future.