Pharma IT on Recorded Future for the Enterprise
By Chris on January 12, 2012
A year ago, Paul Kedrosky wrote a great post titled “Curation is the New Search is the New Curation”. It described a sort of symbiotic relationship between human curation and algorithmic search innovation that has cycled on the Web since its early days. He ends the post with:
“The result will be a subset of curated sites that will re-seed a new generation of algorithmic search sites, and the cycle will continue, over and over. In short, curation is the new search. It’s also the old search. And it’s happening again, and again.”
The post, and Paul’s sticky title, stayed with me throughout all of last year. Where is this all going? Does it relate to the industry I work in? Should we get ready for a Sisyphean future of data work? Or should we more optimistically interpret his quote as indicating a new beginning in the cycle of web curation and search tech? Whatever the future holds, I think the next deep cycle of curation and search technologies will anchor itself inside of the enterprise, and especially in large global companies.
Pushing the Rock Up the Hill: Cleaning Up Enterprise Data
Large companies have troves of (unfortunately) poor quality data. This results in the need for massive curation and cataloging projects for which the tools and standard methodologies remain somewhat primitive. Such efforts also require interaction between domain experts, data specialists, and IT people who hardly work in the same departments, making this kind of work difficult to effectively organize and execute. So, why are most large companies in this situation? A partial answer is that data curation and data management in large companies has never been a strategic focus like it is today.
The good news is that those same large companies with the most daunting data needs typically have the resources for staff and software investments to help them improve the quality of their strategic data and decision making. They also have help from outside. In the pharmaceutical industry, where I work, vendors like Thomson Reuters, IMS Health, Informa, and many other specialty information sources enable the industry to essentially outsource a large chunk of data curation and data management. Probably every major pharmaceutical company buys subscription access to this well structured, regularly updated competitive industry data. However, the frequency of the updates is relatively slow compared to the pace of the web, and since there are no real standard taxonomies across vendor offerings, IT departments end up maintaining a fairly complicated set of commercial systems to enable strategic analyses. While this sounds bleak, I think this is the real sweet-spot for the next technical search and curation cycle and an interesting opportunity for Recorded Future.
Taxonomies and Data APIs to the Rescue
After spending some time using Recorded Future, I find it amazing for analysis of single, discrete entities and event types. I was also impressed with what I could discover about something specific just by interacting with network graph and timeline visualizations. For example, using the events “Medical Condition” and “Medical Treatment” plus the term “malaria” across all publications over the last 30 days, it quickly displayed networks of scientists, policy makers, and people across the web of publications meeting the search criteria.
Where could this be further enriched? From my perspective, funding sources for studies and the organizations where scientists and medical leaders are employed would be interesting additions to the Recorded Future data set. Combining this type of data with the partnership events already in place makes for a robust resource for competitive intelligence in pharmaceutical industry.
I envision 3 things coming together that might seed the next search and curation cycle to the benefit of pharmaceutical competitive intelligence analysis.
1) “The” taxonomy. Recorded Future has a nice chance to become a standard taxonomy for the web. It has an amazing web-based JSON API with an evolving taxonomy learning about new entities at a rate of millions of documents per week. It could help by providing a consistent standard for data curation projects and development teams in large organizations. Why not use it for building new applications pharmaceutical analysts need?
2) Blended queries and feeds from new data sources. Commercial vendor subscription content could be blended with in-house developed tools, and Recorded Future fed with queries sent by business analysts or automatically sent to Recorded Future API. Perhaps vendors of these systems could partner with Recorded Future to improve their scientific and medical taxonomy.
3) In-house software development and data curation project tools. Enterprise teams will continue to develop custom user interfaces and query tools that expose the data from all of these systems. The Recorded Future API could be an effective way to make sure these systems have the most up-to-date information, pulled in from the “real-time” web.
The system usability, the depth of the APIs, and the “speed to analysis” is impressive its own. I can imagine how powerful Recorded Future might be for an experienced intelligence analyst or team in a company when combined with the new data sources and integrated by the right project team.
Coming back to Paul Kedrosky’s point, I agree that it’s likely we will see a growth of algorithmically curated and searchable content sites, but it’s possible that the “consumer web” won’t see much of that information. The most interesting innovations will likely come from the integration of curated data and taxonomies of veteran industry content providers and new technologies like Recorded Future, behind the great firewall of the enterprise.