Articles
- Amazon EMR
Whether you are migrating your data workloads to AWS or building a cloud-native application, Unravel’s AI-enabled end-to-end DataOps observability for Amazon EMR simplifies the challenges of data operations, boosting performance and resource efficiency, optimizing costs, and improving data quality while saving critical engineering time.
Cloud Cost Management & FinOps
Understand, optimize and actively govern your costs
AI-enabled cost governance identifies where you’re spending more than you have to (and how to fix it), with guardrails to proactively manage costs and prevent budget overruns.
Operations & Troubleshooting
Less effort, more problem-solving—faster, easier
No more spending hours (or days) doing time-consuming manual detective work. Unravel’s automated root cause analysis pinpoints why jobs fail or where pipelines are bottlenecked.
Pipeline & App Optimization
Optimize for performance and cost before you deploy
Automated AI recommendations eliminate trial-and-error tuning. Unravel cuts to the chase to tell you exactly how to change code or configurations for better performance and cost.
Data Quality & Reliability
Automatically correlate external data quality checks with AI-driven insights
Unravel integrates data quality check results from other tools, correlates all data details into a workload-aware context, and applies AI analysis for automated insights.
Cloud Migration
Avoid landmines and setbacks before, during, and after migration
Avoid migration setbacks and cost overruns. Unravel’s deep intelligence and automation enables confident, data-driven decisions before, during and after your move to the cloud.
EXPLORE KEY FEATURES
See how Unravel’s top features and capabilities work
2-minute demo videos and self-paced guided tours walk you through the “best of” Unravel.
Understand, optimize and actively govern your costs
AI-enabled cost governance identifies where you’re spending more than you have to (and how to fix it), with guardrails to proactively manage costs and prevent budget overruns.
Less effort, more problem-solving—faster, easier
No more spending hours (or days) doing time-consuming manual detective work. Unravel’s automated root cause analysis pinpoints why jobs fail or where pipelines are bottlenecked.
Optimize for performance and cost before you deploy
Automated AI recommendations eliminate trial-and-error tuning. Unravel cuts to the chase to tell you exactly how to change code or configurations for better performance and cost.
Automatically correlate external data quality checks with AI-driven insights
Unravel integrates data quality check results from other tools, correlates all data details into a workload-aware context, and applies AI analysis for automated insights.
Avoid landmines and setbacks before, during, and after migration
Avoid migration setbacks and cost overruns. Unravel’s deep intelligence and automation enables confident, data-driven decisions before, during and after your move to the cloud.
See how Unravel’s top features and capabilities work
2-minute demo videos and self-paced guided tours walk you through the “best of” Unravel.
Amazon Elastic MapReduce (EMR) is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, Presto, HBase, Flink, and more. You can run applications built using open source frameworks on Amazon EC2, Amazon Elastic Kubernetes Service (EKS), on-premises with AWS Outposts, or completely serverless on AWS.
Cost 360 for EMR provides trends and chargeback by app, user, department, project, business unit, queue, cluster, or instance. The EMR Cost Chargeback page displays EMR total cost, EBS cost, EC2 cost, EMR cost, and cluster count. You can see a cost breakdown for EMR clusters in real time, including related services such as EMR, EC2, and Elastic Block Store (EBS) volumes for each configured AWS account on the EMR Cluster Chargeback details tab. In addition, you get a holistic view of your cluster, including resource utilization, chargeback, and instance health, with automated AI-based cluster cost-saving recommendations and suggestions.
Recommendations, efficiency, and tuning suggestions are given on the EMR insights page. These suggestions call attention to potential underlying causes, such as inefficient storage, problems with a query, and more. You will also see suggestions to update a property or configuration parameter, including the current and recommended value.
AWS Cost Explorer refreshes your cost data about every 24 hours and Cost and Usage Reports are updated once a day in comma-separated value (CSV) format. Since Cost Explorer includes usage and costs of other services, you should tag your AWS resources and you may consider building custom applications to get granular reporting on your EMR cluster resource usage. Unravel simplifies this process with Cost 360 for EMR to provide full cost observability, budgeting, forecasting and optimization in near real time. Cost 360 includes granular details about the user, team, data workload, usage type, data job, data application, compute, and resources consumed to execute each data application. In addition, Cost 360 provides insights and recommendations to optimize clusters and jobs as well as estimated cost improvements to prioritize workload optimization.
Tagging AWS resources is a best practice that helps you categorize resources by application, owner, department, or other criteria. You can add AWS tags using the AWS Tag Editor, the AWS Resource Groups Tagging API, and Amazon EMR Serverless API. You can use Unravel tags to generate chargeback reports based upon specific criteria, such as project, department, team, and other attributes.
Data teams spend most of their time preparing data—data aggregation, cleansing, deduplication, synchronizing and standardizing data, ensuring data quality, timeliness, and accuracy, etc.—rather than actually delivering insights from analytics. Everybody needs to be working off a “single source of truth” to break down silos, enable collaboration, eliminate finger-pointing, and empower more self-service. Although the goal is to prevent data quality issues, assessing and improving data quality typically begins with monitoring and observability, detecting anomalies, and analyzing root causes of those anomalies.
Amazon CloudWatch collects monitoring and operational data in the form of logs, metrics, and events for your EMR job flows. CloudWatch metrics can be used to detect basic conditions such as idle clusters and nodes or clusters that run out of storage. AWS Cost Explorer Rightsizing recommendations are designed to help customers choose the optimal Amazon EC2 instance type for a given workload. However, these recommendations do not take into consideration the distributed resources from EMR, so they are best for simpler use cases such as identifying idle instances. Troubleshooting slow clusters and failed clusters involves a number of steps such as gathering data and digging into log files. Data application performance tuning, root cause analysis, usage forecasting, and data quality checks require additional tools and data sources. Unravel accelerates the troubleshooting process by creating a data model using metadata from your applications, clusters, resources, users, and configuration settings, then applying predictive analytics and machine learning to provide recommendations and automatically tune your EMR clusters.
Virtual Private Cloud (VPC) peering enables you to create a network connection between two VPCs, even across regions, enabling you to route traffic between them using private IP addresses. For example, if you are running both an Unravel EC2 instance and an EMR cluster in the us-east-1 region but configured with different VPC and subnet, there is no network access between the Unravel EC2 instance and EMR cluster by default. To enable network access, you can set up VPC peering between your EMR master node and your EC2 Unravel instance.
Unravel provides granular Insights, recommendations, and automation for before, during and after your Spark, Hadoop and data migration to AWS.
Get granular chargeback and cost optimization for your Amazon EMR workloads. Unravel for Amazon EMR is a complete application performance monitoring, tuning, and troubleshooting tool for big data apps running on Amazon EMR. Unravel provides AI-powered recommendations and automated actions to enable intelligent optimization of big data pipelines and applications.