Best Practices Video 5 Min Read

Automated Performance Management for Big Data: TFIR Interview

Best Practices Video 5 Min Read
  • Twitter
  • Facebook
  • LinkedIn

TFIR interviewed Kunal Agarwal, Unravel’s CEO, at the San Jose Dataworks Summit. The leader in application performance management (APM) software for big data, Unravel is seeing AI and machine learning change the way the world works, using open source technology. Unravel makes it possible for customers to run more and more applications in production, reliably, cost-effectively, and ensuring that applications meet business SLAs.

Transcript of video (subtitles provided for silent viewing)

[Swapnil] Hi. This is Swapnil Bhartiya, and we are here at the DataWorks Summit in San Jose. And today, we have with us Kunal from Unravel. Just tell us a bit about what does the company do?

[Kunal] Yeah. So, we help Big Data applications work well. Everybody’s using big data tech, all the companies in different industries. But running big data applications on any system, whether that’s Kafka, Spark, Hadoop, NoSQL, or any massively parallel processing platforms, is very, very hard…let alone run them well. And what we see is all these companies spend 70% of their time trying to firefight these issues, which boil down to: how do I make these applications run, how do I make these applications run fast, or how do I make these applications run efficiently?

Debugging these issues today is a very manual, painstaking exercise. What we’ve done is we’ve automated performance management for big data so that these companies can focus on driving value from big data rather than spending most of their time firefighting these issues. So, we’ve automated problem discovery, root cause analysis, and resolution for all these common kinds of problems that happen on these big data stacks.

S- When I come to this conference, I’m curious that, when I look at the ecosystem, is it because the base technologies are open source, so that allows for an ecosystem, or has there always been an ecosystem around even proprietary technologies?

K- Yeah. A very good question. Enterprise software itself is one of the major transformations that’s happening…the transformation is the switch to open source technologies over the last decade, I would say. Open source technologies are great because it provides you a big community behind it to help you accelerate innovation. Open source is also great for people who want to experiment on new technologies because it allows for a lot of flexibility and a lot of power.

Now, Hadoop, Spark, Kafka, all have very fanatic communities behind them, and for good reason. The technology is very flexible, very powerful, it can do a lot of different things. However, what happens with great power is great complexity. And when people are running anything using these systems, they fall into these common problems. Our place in the world is to make sure that while they’re adopting big data and open source technologies, in general, that they can actually depend on it.

A lot of proprietary systems cost about twenty times, thirty times more money than these open source technologies if you look at them on a per unit cost. But those things work. And companies can depend on them. Open source technologies are rough around the edges, they may have certain shortcomings in security integration, in our case, performance management, and that inhibits or slows down consumers who want to actually use that technology.

So, we say, “Listen, don’t be scared of open source technology,” right? “We will make sure and guarantee performance and reliability so that you guys can actually depend on this, and reliably run any type of workload that you want on these big data technologies itself.” But you’re absolutely right, the community’s huge. Spark Summit, this time around, had about 4,000, 5,000 people.

(I’m) very fascinated whenever I go around the country or even the globe. I was in London for Strata London, and London is adopting all these big data technologies also. So, it’s not just the U.S. phenomenon, it’s a world phenomenon now. Then any company that I speak with will have a lot of open source components in there, period. There’s no more discussion of, “do we need open source or not?” There’s a lot of, you know, good technologies out there today which companies are exploiting for a variety of different reasons.

And they just see that power and they can’t be left behind with the competition, so everybody else is now adopting to it as well. I’ll give you some use cases just to help you highlight this a little better. If you look at banks and financial institutions, they’ve got a tremendous amount of data, and they need to process this data in a fast, efficient, quick manner. So, you may have some of the biggest banks doing fraud prevention applications, for example.

Now, with open source technology, they get the power to process more data and in a faster fashion, and also use data that’s in different varieties, to do this fraud prevention application, for example. And we help them make sure that these applications run. Now, fraud prevention is good both for the company itself as well as the consumers. So, if you use a credit card, now you know for yourself that if you swipe your credit card, companies are very good at spotting whether it’s a fraudulent transaction or it’s a legitimate transaction.

And they want to save you money, and they want to save fraud from happening to you, which in turn builds your confidence in that bank and you want to bank with that company, you know, more. Healthcare is using it in a lot of fascinating ways. Today, you know, you’re wearing your Apple Watch, right, all of this data about you, about how much are you running, how much are you sleeping, how much water are you drinking…

S- It’s dangerous that they are, that they… just joking.

K- No, it’s not. It’s not, because that way, they get to understand you better in ways that they would never have understood you before. So, if you allow them to get this data and use this data, then it can be incredible in helping understand, from an insurance company, that, “Hey, Swapnil should not be spending $2,000 a month on insurance, he’s a healthy person. He should be spending $200 on insurance every month,” right?

Or helping identify problems and diseases before they happen. So, there’s a lot of fascinating uses of Big Data technology, what we call Big Data, but using machine learning, artificial intelligence, IoT, etc. And I’m very excited personally about it, just seeing all of these different use cases that we get to see, working with different classes of customers.

S- I’m also a science fiction writer. So, there are a lot of new use cases which are emerging, IoT is becoming popular, all those things. So, from your perspective, what are the potential new use cases that you see beyond what you already have?

K- Well, every industry that we work with, every company that I walk into, the first question we ask them is, “What are you doing with Big Data today that you weren’t able to do yesterday?” And it depends on the maturity cycle of that company.

They’ll say, for example, “The first step is we’re doing the exact same thing that we used to do yesterday, but we’re doing it with much more data.” So, if I wanted to do my sales analysis based on regions, how well is my sales team doing in California versus Florida, I would be able to do it on six months of data or one year of data. Now, with big data stacks, I’m able to do this on five years or ten years of data, for example.

So, the first step is always, “Can we enrich some of the activities that we’re already doing with these new systems?” Once companies get comfortable with that system, then they start to open it up and create new products, and what we call true, data-driven products. So, a lot of examples already exist in the world today.

They may be mundane, like giving you personalized ads, to Spotify, understanding what kind of music you like, and hooking you on to their radio, so you listen to more songs and they get paid more, so it’s a vicious cycle like that. Those are the things we’ve already experienced. Now, there’s a lot of other cool things that are coming out that we’ll start to see in every industry. You spoke about IoT, being in Silicon Valley, this is very common for us: to see driverless cars now.

Right? But driverless cars for the rest of the world is still “Wow.” Well, it’s actually happening.

S- I call it, you know, IoT on wheels or something, you know?

K- There you go. So, every car manufacturer, if you look at Toyota, GM, Ford, you know, Tesla of course, Mercedes, BMW, everybody is getting into what we now call connected cars. Now, imagine a state in the future, in Silicon Valley it’s probably next year but in the rest of the world it’s probably going to take a decade or so, that all these cars are driverless and they’re talking to each other.

So, you don’t need signals to stop these cars, these cars are actually emitting signals to each other, talking about finding the most efficient path, avoiding accidents, avoiding problems, right? So, in the not very distant future, we’ll have data running all of these different things. And that’s why I’m a firm believer that in the last decade, every business became a technology business. In the next decade, every business will become a data business.

S- Right. So, what opportunity does it create for you or your company?

K- So, the task that we’ve taken on is to make sure that all the stuff works, and it’s a very core, important piece that needs to happen. Somebody’s got to take care of all the unsexy side, right? Somebody’s has to make these things work. And what we do is we get the focus of the companies and their people.

Data scientists, and business analysts, and data engineers are probably the hardest roles to fill, and there’s a huge shortage of this talent here. So, if you’re a company and if you’ve got 100 people, you might as well have those people focus on business objectives. So, we are providing the company that insurance policy, if you may, that you can have your people focus on business objectives and values, and we’ll take care of any mess that’s created.

So, if your applications are not working, your infrastructure is not working, if, you know, you’re not getting reliable performance out of your entire application set, we will help you with those problems. Now, this fraud prevention program, you know, to come back to that example, if that fraud prevention program does not catch that fraud in seconds, the money is out of the bank.

So, we make sure that that application runs on time every time for that bank, for example. And only when people have reliability, will they actually depend on that application or that product. So, that’s our place in the market, which is, we will make sure that the stuff that you’re promising to your customers work.

S- Right. So, do you have, you know, a stock solution that people can go and, you know, use their credit card and buy, or do you customize it on a case by case basis?

K- No, it’s completely out of the box. You go to UnravelData.com, you sign up for a free trial, or if you want to buy the product, you can buy it online. And then, you can get it installed within your environment or you can spin it up as a SaaS product from the cloud depending on where you run Big Data.

S- But customers may have, you know, a lot of customized workloads maybe, or maybe they’re using a lot of totally different components.

K- Absolutely. So, what we’ve seen is customers choose a particular vendor or a particular system to run these workloads. Now, they may be doing different types of applications but, say, for example, if they’re running their applications using Spark or on Hadoop, then an application is an application for us and we can provide the same benefits to one customer that we provide to another one.

We built Unravel by working very closely with people who are running Big Data at scale. Some of our early advisors in the company were people who were running analytics at Facebook, LinkedIn, Twitter, Riot Games, Zynga. You know, there were a few handful of companies five years or six years ago that was actually running big data at scale.

And we work with them really closely to understand what are the challenges you face when you’re running big data at the scale, thousands of machines, hundreds of users, hundred of thousands of applications running every day. And we took those learnings and put that in our product to understand how do you make an agile DataOps, you know, work environment within your company, that you’re able to experiment and iterate quickly, and you’re able to guarantee their performance, and you’re making sure that stuff actually works?

So, absolutely, it is out of the box, no customizations needed, and should be able to cover the big data stack that you’re running today.

S- All right. Almost all the components that you talked about that your customers are using are open source components?

K- Yes, yes.

S- But I’m pretty sure that, you know, you are not releasing any of your own code, but you work closely with the open source communities and open source projects. So how do you see open source… you know, what is open source to you?

K- Wow, that’s a very loaded question. Open source to me at the fundamental level is a change that is driven by community. Because open source has to have a community at its heart, not just for adoption but for actual development and change.

An open source project usually comes out of a need. In this case, we’ve all created this amazing community around Hadoop and Spark. Hadoop came around, as you all know, famously from works at Google, works at Yahoo. Spark from, you know, the UC Berkeley group, for example. All of this came out of a need for being able to process data at a scale that nobody had ever seen before and in a cheap, fast fashion, right?

That’s what open source to me is, it’s a catalyst for change driven through a community.

S- Right. Because earlier, when you mentioned that, you know, open source does not solve some problem, but, you know, it’s more or less like a development model, where you bring all the people to build tools… it’s not a business model.

But then, it enables, you know, Kunal’s and, you know, Arun’s to build business around it (Arun Murthy, Hortonworks). So, you take a lot of things from open source, but how do you give back?

K- A very good question. So, we have components within our proprietary software that is also open source.

S- Just before you answer, it’s two-folded question. One is that you give back in terms of code, and other is that you give back in terms of productizing, you know, building a commercial business model around… Because technology in itself is useless if nobody’s using it.

K- Absolutely.

S- So, you’re also building a business around those open source [crosstalk] two-fold question, you know. Yeah. So, you can touch up on the good part, and then the best part.

K- Yes, that is exactly what I was going to say. So, we do actually use the open source components within Unravel itself. We use Kafka for data ingestion, we have Elasticsearch for one type of data storage in our data tier. Our instrumentation technology is also open source and we’ve actually given back in a lot of these different areas, back to the community around things that we do, especially on the scale and size that we have seen.

Because we get a preview into a lot of companies that are running things on thousands and ten thousands of nodes, that maybe a lot of people in the open source community do not have access to; that’s one way. The second way, as you rightly pointed out, is we are making these open source technologies work. A lot of the companies are now depending and relying on Spark, on Hadoop, on Kafka, and not just paying the proprietary companies twenty times more money to run that similar kind of workload because they know that there’s companies like Unravel, and Databricks, and Cloudera, and Hortonworks making this work for them.

And you’re right, if there’s no usage of it, what’s the point of having that open source technology? And now we’re seeing every Fortune 500 company using these big data technologies, you know, within their line. And very soon, in the next five years or so, we’ll start to see this trickle down to all the SMPs. This will be commonplace.

Now we call it “big data,” tomorrow it’s just “data” because data is going to be this size and every company is going to be using it in some shape or form.

S- Right. I’ve met somebody from Hortonworks yesterday. And because when people think about big data, they also think about being used in developing advanced. And he mentioned that a lot of working is going on in Africa, you know, because insurance companies are working in developing and emerging economies. Also they are using…because, the funny thing is that these countries, they have escaped the whole early age of technology.

Suddenly, they are in this phase where everything is, you know…

K- That’s true. No landlines, right, straight to mobile phones.

S- Exactly. Exactly. And it’s more efficient, cheap, and more accessible. So, what kind of markets do you get into or regions?

K- Yeah. Today we’re from the U.S. market because we’re a small company. We needed a focused approach to figure out where do people face these problems, and let’s go after a market that’s huge enough to sustain our growth. We do have customers now in Europe. Like I was pointing out, Europe is ripe. And Europe by itself is, you know, n number of different countries, and every country we’ve seen big data adoption grow.

Asia is not far behind at all. Some of the biggest banks in Singapore, and the Philippines, and India already use our product, which means that they’ve been using big data for a while. We both are from India, the entire Aadhaar project is actually running on big data, right, and I think that part of the implementation for the entire infrastructure of Hadoop and Spark that’s running on this.

So, as you can see, big data is powering, and systems and applications are powering some of the most, you know, groundbreaking, big changes in the world. Aadhaar, as you know, is our Social Security, you know, information that every Indian, 1.3 billion people, now have an ID number. You need that for opening a bank account, you need that for getting a house, you need that for buying a car, you need that for everything now.

And that’s a big change that Modi and his government and everybody else got together to do. So, for this kind of a big undertaking, it would not be possible if you were still relying on technologies of yesterday. Those databases are not capable of handling this size of data. Right? So, because we have these new databases, because we have these new technologies, adopting and getting this change to happen can be done much more quickly.

Now we’re seeing, like you pointed out, the second phase of big data. Now that the usual suspects, if you may, have started exploiting this technology, it’s now becoming more out of the box, it’s easier to deploy, easier to run. And that’s where we’re seeing people not just in different countries all over the world, in South America, in Africa like you pointed out, but even smaller companies are to use this because, you know, they don’t have an army of engineers to make it work.

But now that it’s simple to work, even these guys are jumping into it, and that provides them a level playing field, right? You don’t need to be the biggest bank in the world, you can be a regional bank and still provide your customers that same benefit that the big banks do because this technology is accessible, it’s open source, anybody can use it, anybody can contribute to it.

And all these people are now exploring these technologies for serving their customers better.

S- So, we talked about technology, we talked about future, we talked about IoT, and all that, let’s just talk about you for a second. When you are not traveling and when you’re not doing all this big data, what do you do in your free time?

K- Wow. I married recently, so I spend some time at home.

S- Okay, congratulations.

K- Thank you. Starting a company is taxing.

S- Which one was more difficult, getting married or starting?

K- That’s why I had to do one thing at a time. I could not do both of them at the same time. [crosstalk]

S- Don’t worry. We will not show this video to your wife.

K- Thank you. Guys, we should hide this video from her. No, but it’s been rewarding, it’s been great. I think starting a company requires 200% of your time, 200% of your energy and your mind, right? And it’s been exciting. I loved growing this company. But in my free time, I like to go and now become a California guy.

I used to live in New York for eight to ten years, so we didn’t know what hiking was, we didn’t know what green things are, we didn’t know that you’re supposed to put broccoli on things, right? Now I started going to the farmers market in California, living healthier, going out for hikes and bike rides, but I still enjoy my cars. So, you’ll see me doing racing on some weekends.

I like to take my sports cars out and, you know, take them around the track and race responsibly. So, whenever I do get a chance, I do that, or the Bay Area presents itself with amazing sailing opportunities. So, I’ve learned to sail over the last couple of months and I would take the boat around, you know, the Bay Area, there’s a lot of little towns over here, Marin, Sausalito, Richmond, of course, San Francisco.

So, it’s good to sail for one or two hours, dock at one place, go and see that little town, and come back on, you know, and sail around. I find that very relaxing as well, just to be in the bay, in the nice water for two, three, hours of a Sunday, can’t beat it.

S- So, have you hooked up your sports car with all those machine learning and big data?

K- I should. I should. So, it’s interesting on that point: Porsche is a very driver-centric car. And when this entire, you know, autonomous driving comes around, it was interesting what the take of a company like Porsche is on autonomous driving.

So, they still want their customers to drive cars because that’s what a car is all about. So, I like their take, where they’re using this technology now on one of their models, where using big data technology, and getting sensor data, and GPS data, they find the road ahead of you, and they know what traffic is on that road or there isn’t, what the curves are on that road, elevation changes, all those things.

And so, it’s looking one mile ahead for you. And then, when you’re in the car, it’s actually giving you feedback on how you should be driving your car better. So, how can you go and do that corner faster? Can you brake later? Which gear should you be in? So, they’re using big data technology to actually make you a better race car driver, which I love.

S- What is the latency between the decision you made versus what the car will?

K- So, they’ve got models in that they use machine learning, and they’ve got some of the best…

S- It’s really useless to talk to technologists because we’re talking about hobbies and here we are back to talking about work, right?

K- Exactly.

S- And there’s a problem, and your hobby is your work?

K- I think, I guess that’s true. You’re always thinking about it in that way.

S- My wife always complains, “You’re always working,” and she doesn’t know that, no, I’m never working, I’m just having fun. Somebody is paying me to do this, you know… I’m starting my own company, so now I’m paying, but…

K- When you’re having fun, it doesn’t feel like work.

S- Exactly, yeah. So, you’re talking about, you know… So, are they your customers as well, or not yet?

K- Not yet.

S- You cannot name and tell.

K- We cannot name our customers but those ones are not our customers just yet.

S- So, if they do become your customers, and you do know that car you’re driving is your customer, you know, so can you just like even tweak it more?

K- I wish. I wish. Those Germans, they’re great at engineering, so I’m sure they’ll figure some of these things out and make every car exciting to drive.

S- Yeah. Oh, Germans are good.

K- I love Teslas as well, so don’t get me wrong, they’re super fast.

S- That’s fine.

K- And Porsche, I think, came out with the first all-electric car as well.

S- Yeah. Tesla is changing the market in a different way. We often get to talk about solar, and power, and everything, but it’s the same story, open source, you know?

K- Yeah.

S- So, what they did, they brought a product which is so, you know, kind of tempting, they’re doing both things. They are really efficient as well as, you know… But they’re not talking about that, they’re talking about the next generation of cars.

K- Absolutely, yeah. They’re talking about excitement instead of utility.

S- Yeah, exactly. So, when we talk about climate change and all those things, instead of focusing on the, “We should actually, you know…”

K- That’s how community is driven.

S- Exactly, yes.

K- It is, “What do I get out of it?” as well and “what’s in it for me?” A Tesla is faster than a Ferrari in a 0 to 60 straight line. And Elon Musk, that was his mission statement, which is, “I want to make a car that’s faster than a Porsche 911 that’s electric.” So, it wasn’t, “Let’s make something boring that people, you know, eco-friendly environmental conscious people drive.” He made a car that people who like driving will drive and the entire community will follow.

S- So, Arun drives Tesla.

K- Yeah.

S- So, you guys should have a race, you know, and every DataWorks conference should start with a race.

K- He’d beat me! The Tesla is fast…

S- You know, whichever company wins, gets a free keynote…

K- I agree. I think it’s a great idea.

S- It was fun talking to you today.

K- Likewise, Swapnil.

S- And hopefully we’ll see you again in the next conference.

K- Thank you so much for having me.

S- Thank you.