Predicting the General Motors Bankruptcy
By Jason Hines on July 10, 2009
On June 1st 2009 General Motors filed for bankruptcy. As June 1st approached, the potential outcomes (government bailout, traditional bankruptcy, winding down the business) and potential dates converged on the final result – a “government sponsored” bankruptcy around June 1st, 2009. Many parties with serious vested interests in the final outcome were carefully monitoring the stream of events in hopes of gaining an advantage in their final position.
Predicting outcomes is a tricky business – and accordingly people view events such as the GM Bankruptcy as a significant financial opportunity – the form of which could involve any range of financial instruments – buying or shorting short the equity, debt, credit default swaps, etc. of General Motors – or any combination thereof.
According to many rumors, some investors took relatively small debt positions but loaded up on CDS positions to be able to impact the debt structure in the bankruptcy while having substantial gains to be made in leveraged CDS bets. These along with many other postions are a reflection of the investors prediction of the final outcome and timing of that outcome for GM.
Setting the Stage
Assuming that, at some point during the spring of 2009, most investors believed that a GM bankruptcy of some form was inevitable, predicting the actual date might have been the more relevant and interesting activity to pursue. Many approaches to the prediction of the timing of the GM bankruptcy event might be possible – most commonly, investors would look to “one trusted analyst/source” who would have the benefit of years(or decades) of institutional knowledge that is synthesized into their intuition about what is going to happen. Alternatively – we’d like to examine a more systematic and potentially more valuable approach using Recorded Future’s approach to analyzing the “media sphere’s” aggregate view of the potential date of a GM Bankruptcy event.
First – an explanation of our approach: Recorded Future crawls thousands of sources in real time for events (bankruptcies and many others) and matches these events using a rich “semantic event model” with time points picked up in text – everything from specifics (“June 1st, 2009”) to relative (“next week”, “on Monday”). Recorded Future was in development during the Spring of 2009 and so the analytics below are somewhat skewed based on the state of our crawl at the time – however I believe that the results (even considering the early state of the system) are dramatic and represent a significant opportunity for potential users of the system with a more comprehensive crawl.
The core element of our model is an event which is defined as some amount of text that includes reference to a series of predefined event types (using sophisticated computational linguistics methods). Since any given event (especially a high profile event) is likely to be mentioned many times, – we’re likely to find many instances of that event. Tracking and analyzing those instances and their context allows us to see patterns that can be validated over time and the resulting confidence factors/statistical model enable us to begin to estimate potential future outcomes.
Recorded Future can be used to find highly ranked bankruptcy events and perhaps use the search results to find the most relevant articles to study or to see what date the event “around the corner” is converging on. For example, the visualization below was done at the end of May and obviously shows the June 1st bankruptcy cluster:
Recorded Future Data
While this is interesting, an even more compelling analysis can be achieved using Recorded Future by looking at the entire aggregate data set of bankruptcy events for General Motors out of the Recorded Future database of approximately 6,000 events, ranging in time from November 2008 until July of 2009. Each event in the model contains attributes such as source, document title, publication date, and the actual event date – where the publication date is when the document was published whereas the event date is the inferred date of when an event happened/will happen. Each document (such as a NYTimes article) might contain several GM bankruptcy events, either pure duplicates, or with varying semantics. One key element of Recorded Future’s technology is our “event disambiguation technology” which automatically and efficiently minimizes the duplication within our database.
We started our analysis with some core filtering/transformations of the data
- Removing all bankruptcy events where the event date was before the publication date – such dates would have no/little predictive value. There are many reasons why the might be in the dataset (e.g. in document from February 1st “last week the potential GM bankruptcy was discussed”) – but we’re skipping them for this analysis.
- Removing all events where the event date equals the publication date – in this case because a) there would be little predictive power – as in the above and b) Recorded Future will assign the publication date as event date if no event date is found in the analyzed text, and we consider that to have less information content.
This leaves us with a data set of some 1,500 GM bankruptcy events with characteristics in the model such as:
- Source – everything from Reuters to SEC filings to blog entries
- Media type – blogs, mainstream media, government filings, etc.
- Publication date
- Event date (which we will also refer to as prediction date given that we have transformed/filtered the data to only include future event dates)
A key point: since we have both publication and event date we can essentially “backtest” our data here – to analyze the crowd sourced “prediction” by what we knew at each time period.
Exploring the Future
This then allows us to look at a series of visualizations of the data using a variety of state of the art visualization tools – one of which is a system which we built at our last company – Spotfire (http://www.spotfire.com). First we can review the distribution of publication dates. Clearly the rate of publications covering the event has gone up and down over time – but even more clearly the peak is on the day of the bankruptcy – June 1st.
We may then review the event (prediction) date – ranging from back in 1998 until into 2013. Again we can see a peak around June 1st of 2009.
Now to make that more interesting we may combine the publication and event/prediction date – and see if we can find any interesting patterns
The most interesting parts in the above are perhaps
- The straight line/band correlation between publishing and event date – basically as new publications are coming out they are bound to discuss the near term aspects of the event (remember the GM bankruptcy event is not just the “big bang event” – there are many subtleties along the way).
- The band of events around event/prediction date of June 1st – starting with publications on April 1st. The first source mentioning early June in our database is a Reuters article (however we did not crawl NYTimes at that time, and we do believe that NYTimes was actually the first to cover this).
Do the predictions converge?
Now lets dive slightly deeper and look at how the event dates (predictions) potentially converge. A reasonable hypothesis could be that as we get closer to the bankruptcy date of June 1st, actual dates mentioned in the context of the GM bankruptcy will also converge on that date. To test this hypothesis we create a box plot of week of publication across our 2009 data and compare that to the spread of the event/prediction dates – displaying the median date (light blue line) and the 95% confidence intervals. We have transformed event/prediction date to a Day-of-year format (June 1st is day 152) and can accordingly in both visualization and bottom table see how the prediction date both median and confidence interval converges on June 1st by publication date June 1st (week 23).
After the June 1st date (week 23) we still have mentionings of the bankruptcy, and then of course we will have some that refer back to June 1st – but also to other aspects of the event (e.g. bankruptcy court negotiations). The larger set of outliers in week of 23 we attribute to the disproportional coverage of the event that week – which leads to textual “spread” of stories and semantics regarding GM.
Below we can also see this plot filtered to publication date ending a week before June 1st, and also here we can see convergence.
Conclusion and Applications
Our experiment with the GM bankruptcy demonstrates that analyzing crowdsourcing event dates for the use in prediction works very well and that the potential to apply the more complete version of Recorded Future could be extremely powerful. The approach described above could be improved in a whole series of ways and our description above is relatively superficial and merely to provide a sense of the power of the approach. Recorded Future’s methods for algorithmic linguistics has many opportunities for improvement but there also exist opportunities for improving the transformations of the data. In this example where for example we’re looking to predict an event weeks and months out, we may want to not only filter out same-day publishing/event dates – but might also consider filtering across a slightly larger range. We may also want to think about differentiating between events by source quality and entity rank – very doable through Recorded Future’s proprietary rank metric (which will also be available for back testing).
Recorded Future’s approach and the resulting data is clearly valuable for filtering, searching, visualizing, etc. – both domain experts as well as novice investors will realize the true power of the system when they use Recorded Future for qualitative reasoning.
But the news analytics data should also be highly applicable to for example quantitative methods and algorithmic trading in trading and portfolio management – for example a predictive function derived from the data (incrementally improved as new data becomes available) could be applied in event driven strategies one has to assume. Clearly, the market’s price of calendar-spread options on GM should be related to the “certainty” the market, or in our analysis, the crowd, ascribes to that particular date. Other applications of our crowdsourcing approach include estimations of expectations for upcoming data releases and overall market sentiment in a particular industry. The fact that our approach yields a time-varying distribution of outcomes makes it highly applicable to the pricing of time- and volatility- sensitive financial instruments.
May the future be with you!
As always, we welcome your comments below!
P.S. By July 10th GM exited bankruptcy. Will it stay solvent?