Interview: John Ashley, General Manager, Financial Services and Technology, NVIDIA
What are the top qualities or skills a quant/PM/data scientist should be able to exhibit and what does the optimal data science team look like? How is this changing and what might we be expecting in the future?
We, like everyone else, look for tuxedo-wearing tap-dancing unicorns, although I always say we’ll settle for a clog dancing zebra with a good attitude and a fun hat. All kidding aside, we look for flexible team players with a good high-level understanding of the business coupled with deep technical expertise in a key area -- AI development, deployment, or AI focused data engineering. For roles that work with or overlap traditional quant roles, add in all the usual mathematical background and some C++ coding skills along with the Python we expect on the data science side. We’d then look to build a team with the most diverse possible educational and career backgrounds to cover the workflow from data preparation to model training and migration to production and monitoring.
There seems to be a lot of cynicism surrounding the use of alt data and its role in alpha generation and if you can truly find value from these datasets. How is your organisations working around this/what are your views on the future of alternative data?
The value of alt data for alpha generation is, we think, a function of several factors. The most obvious, of course, is how “alt” is it? If it’s a convenient data feed that you can subscribe to then it’s really moving closer to “just” data. Nearly as problematic is timeliness -- can you ingest the data and extract signal to gain some sort of temporal advantage? Depth of extraction is another key differentiator -- if you can dig deeper for signal in the noise there may be advantage even in a relatively common dataset. One of the last major opportunities that is often not discussed is “inferred data” -- can you process multiple alt data feeds in a way that lets you capture signal that is smeared across contexts and data feeds? We feel like we can help with all of these areas -- faster networking from our Mellanox unit enables us to bring accelerated data transformation to bear on timeliness and accelerated AI solutions allow you to dig deeper into the noise, with faster models. If you can compare that output to a more standard model you can more concretely identify your information advantage. Finally, leveraging deeply learned embeddings, it is possible to combine multiple otherwise incompatible structured and unstructured data sources to tease out even the faintest signals.
What do you think are the biggest challenges facing data scientists/AI experts/quantitative investors in 2019/2020? Why are they important?
The biggest challenges are all in dealing with change. Once a firm’s competitors start using AI as a source of competitive advantage, it’s not a fight for market share, it will be a fight for survival. And we see the biggest challenges to adoption of AI are rooted in changes to existing business and IT processes. For AI, traditionally people have said the three pillars are algorithms, compute, and data. We’d add a fourth to that -- people. Not just data scientists, but IT people who can build and deploy AI systems. AI is changing at an incredible pace, and the tools of the trade -- TensorFlow, PyTorch, containers, Kubernetes, RAPIDS, GPUs, among a host of others -- have many interdependencies and change rapidly. As a simple example, you need to provide your data scientists with the tools to do their jobs effectively. We see, time and again, that yesterday’s IT operations and budgeting metrics -- utilization as the driving metric of value of a compute system or a focus on infrastructure cost rather than value -- slowing down the ability of firms to get started with AI, grow their AI teams, and get AI projects into production. The truly AI-agile firms we work with don’t care about the utilization of servers, they care about the utilization of data and of data scientists. We can help firms make sure they are using their compute as efficiently as possible but the TCO calculation has to include data scientist costs and opportunity costs of lost projects because of insufficient peak capacity.
Can you share an example of how your system has been used by a new customer? Feel free to include any feedback or practical examples.
Modern investment managers are faced with a deluge of information -- more, in fact, than they can possibly consume. But much of what they are sent is duplicative or “fluff”. So the question is, can AI help them effectively consume more of this information while avoiding wasting their time on fluff. It turns out that AI powered Natural Language Understanding (NLU) can help by summarizing articles and research -- taking 80 pages to 1 page in some cases. The human is still in the loop and can dive deep, anywhere that seems interesting -- but they can build awareness across 10, 20, or 80 times as much research. This is enabling them to make better, more informed decisions than their non-AI assisted competitors.
Cloud computing has been widely adopted in most sectors except financial services. Is this now changing, and if so how will funds decide how and where to include external providers?
It’s definitely changing. Our customers, more and more, are moving to hybrid cloud models -- firm by firm how they decide what can move to the cloud and what stays in house may be different though. The key point in a rational decision is to have a good understanding of how you are going to use your data and how much compute that use requires; how much risk is moving that data outside your firewall going to create and what risks might it remove; and finally, what likely changes to what you do today driven by data science and AI in particular might change the economics of what you’re doing? At that point, you can “do the math” for a couple of scenarios, and make decisions accordingly. Maybe an all cloud solution is right for your firm, and maybe a hybrid one, with trigger points around usage to balance on and off prem compute is the right way to go.
A portion of the industry are adamant that advanced ML techniques such as Reinforcement Learning and Deep Learning cannot be applied to financial data – do you agree? What are the main challenges in preventing this from happening?
We definitely don’t believe that. We see a brisk business in deep learning -- indeed, if you look at the leading conferences like NeurIPS the number of brand name financial firms presenting work on both deep learning and deep reinforcement learning is growing at a rapid clip. It is true that the adoption is faster in business areas with less regulatory burden; there is till no regulatory consensus on AI and the limits haven’t really been tested yet. Another data point -- the big tech firms are moving into payments and banking, and they bring a strong heritage of AI and leveraging data to the game. Will they stop at payments and retail banking licenses or will they keep coming? AI is a fundamental tool for the future and the competition for talent, data, and leadership isn’t going to slow down.
What is your advice to funds hoping to get new systematic strategies into production quickly and more often?
Obviously, you need to look at every step of the process and ask, is it necessary and is it on the critical path? We can help on the compute side, certainly, removing significant time from algorithm design, development, and backtesting either via GPU accelerated Python or C++. We’ve seen cases where we can turn hours or sometimes days into minutes. If you can adapt the rest of your process to accommodate accelerated computing there are huge time to deployment advantages available.
Data engineering is often a topic that can be ignored when dealing with AI implementation – what is your organisations infrastructural data strategy to support advanced analytical investment outcomes?
Data engineering should never be ignored! Especially as we move into accelerated machine learning, where suddenly inefficient systems and software moves from an insignificant portion of the runtime to a major portion of it. Deep Learning, which can deliver even more powerful business results, has an even more insatiable appetite for data; and performs best when it can use the entire data set, repeatedly. There is a change in focus, perhaps, in that deep learning in theory doesn’t require the same level of feature engineering as more traditional machine learning. In reality, some existing knowledge, encoded through feature engineering, can sometimes help the process along. But data engineering was always about augmentation, filtering, and moving data -- and as AI & ML systems try to dig deeper into larger piles of data to extract meaning and insights, more and more data engineering will have to focus on systems and performance.