Jeffrey Needham's BlogRSS Feed
The Big Data Job Gap: Where are the Platform Scientists?Posted on 10:41 pm August 28, 2014 by Jeffrey Needham
Big data continues to live up to its reputation for disruption as it gnaws away at all of the entrenched constituencies – IT silos, vendors, pricing models and now, careers; it’s about to get very personal. On the possibility side, there is a massive skills gap that needs to be filled. Everybody knows there aren't enough Data Scientists to go around, but few realize that a completely new species – Platform Scientists – are also needed although they are less understood with few examples in captivity.
Platform Scientists should emerge from a DevOps environment, but they need to be much more dev-focused than ops-focused. They are not admins, although a chunk of their job is to be administratively responsible for the computing platform that exists and not treat it like the reluctantly cobbled-together aggregation of silos that passes for platforms today: app servers, DB, storage, OS, networks, hardware, even the physical plant. The role requires Platform Scientists to have a working knowledge of what the platform is being asked to do and render that against what it actually can do.
In the past, Data Scientists (and generally speaking, many application developers) have been trained to ignore the platform. Siloed admins were also trained to ignore the platform and focus only on their domain. IT silos are more political than technical because they’re an organizational response to the immutable doctrine of centralized economies of scale. In the old legacy world, both Dev and Ops camps could happily ignore the platform, which might be inefficient (or even failing) and nobody would be held responsible for it; that’s the political beauty of silos.
In the new world, volume, velocity and value are king; platform is queen and so are the Platform Scientists – wait, did I just call them all queens? I did intentionally call them scientists since their software engineering role is closer to that of Data Scientists, but with an emphasis on things like LRU fragmentation, compilers and TLB management, they're not data subject matter experts. Platform Scientists must understand the continuous flow of new workflow (and maybe contribute to its development), how those choices make the data science possible (and maybe efficient over time), and how the underlying physical platform must deliver the data to the right place, at the right time, for the right price.
The few Platform Scientists that exist today are emerging out of data science development, but some candidates can also be drawn from the traditional silos. Some DBAs have been doing a version of this job, but they'll need to learn a new set of technologies and invent some new operational doctrines along the way.
Without Platform Scientists, Data Scientists will be less successful with big data due to platform choices—they think HBASE is the same as HIVE, or that SQL is the only language that matters, or they have difficulty in accessing the data they need to do their science. The Platform Scientist is there to juggle the chainsaws of infrastructure constraints so the big data actually delivers on all the (still very over-hyped) possibilities that have been promised to absolutely everyone.