DWBI.org
Login / Sign Up
Apache Iceberg is a high-performance open table format for analytic datasets. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg provides ACID compliance, Schema evolution, Time travel for data lakes.
Continue Reading...Are you looking for a seamless way to integrate your Apache Kafka cluster on Amazon Managed Streaming for Kafka (MSK) with other data sources and sinks? Look no further! In this article, we'll guide you through the process of setting up a Docker Kafka Connect container on MacOS to work with your AWS MSK cluster.
Using GitHub’s default runners may not always be ideal, particularly if you need custom configurations, enhanced security, or cost-efficiency. Hosting self-managed GitHub runners on AWS offers flexibility and control over your CI/CD processes. In this guide, we'll walk through the process of setting up GitHub self-hosted private runners on AWS.
Managing AWS access keys for GitHub Actions can be a challenge, especially when ensuring security and ease of access. Traditionally, AWS IAM user access keys have been used to grant GitHub Actions the permissions needed to interact with AWS resources. However, there is a more secure and manageable way: using OpenID Connect (OIDC) identity providers to obtain temporary AWS credentials.
Automating the provisioning of AWS infrastructure is essential for ensuring consistency and minimizing human errors during deployments. With Terraform and GitHub Actions, you can implement a Continuous Delivery (CD) pipeline that deploys to multiple environments (like staging and production) across different AWS accounts.
In this article, we will explore how to deploy a Confluent Kafka cluster using Docker-Compose. We will create a comprehensive configuration file that includes the necessary services and dependencies to provision a fully functional Kafka cluster in AWS EC2 instance for a Demo or PoC use case.
Are you looking for a comprehensive guide on how to install the MLflow tracking server in an AWS EC2 instance? Look no further! In this article, we will walk you through the process of setting up an MLflow tracking server in an EC2 instance, including creating an S3 bucket and assigning an IAM role.
Are you looking for a hassle-free way to set up Apache Airflow on your Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance? Look no further! In this article, we'll walk you through the process of installing Airflow using a simple script that can be added as part of the user data while launching an EC2 instance.
As a developer, having a reliable database management system is crucial for storing and managing data. PostgreSQL is one such popular open-source relational database management system that offers robust features and scalability. In this article, we will walk you through the process of installing and configuring PostgreSQL on Amazon Linux.
As the demand for serverless computing continues to grow, AWS Lambda has become a popular choice for developers looking to build scalable and efficient applications. One of the key features of AWS Lambda is its support for layers, which allow you to package and reuse code across multiple functions. In this article, we'll walk through the process of creating an AWS Lambda layer using Python on MacOS.
In today’s fast-paced development environment, continuous integration and continuous deployment (CI/CD) are no longer optional—they’re essential. Automating these processes not only speeds up your workflow but also minimizes human error, allowing you to focus on what truly matters: writing quality code.
Helm, a powerful package manager for Kubernetes, simplifies application deployment and management. GitHub Pages provides an easy and free hosting solution for Helm charts. This guide will walk you through setting up a Helm chart repository using GitHub Pages and uploading your charts.
One of the key challenges in working with Kubernetes is managing sensitive data like passwords, API tokens, and database credentials in a secure manner. These sensitive details, often referred to as "secrets," need to be protected to ensure application security.
Managing secrets in a cloud-native environment like Kubernetes is a crucial aspect of maintaining the security and integrity of your applications. Secrets, in the context of Kubernetes, are sensitive pieces of data such as passwords, API keys, OAuth tokens, and TLS certificates. These secrets need to be securely managed, accessed, and used by your Kubernetes workloads.
MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. Integrating it with Kubernetes allows scalable deployment, artifact storage, and tracking capabilities, essential for managing production-level ML models. This guide walks through the setup of MLflow on Kubernetes, configuring PostgreSQL as the backend, MinIO as the artifact store.
MinIO is a popular open-source object storage solution, ideal for handling unstructured data at high performance. Integrating MinIO into Kubernetes allows you to deploy scalable storage in your cloud-native applications. In this article, we'll walk you through setting up MinIO in a Kubernetes environment
A private local container registry enables you to securely store and manage your Docker images, improving efficiency, control, and security. This guide will walk you through the setup of Docker Distribution Registry in Kubernetes cluster.
Kubernetes provides a powerful platform for managing containerized applications. In Kubernetes, Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) enable dynamic and scalable storage management, ensuring that stateful applications can retain their data even if containers are deleted and recreated.
Argo CD is a powerful continuous delivery tool for managing Kubernetes resources through a GitOps approach. With Argo CD, your Git repository is the single source of truth for your application’s desired state, ensuring consistency and reliability across your deployments.
Kubernetes has become the go-to orchestration platform for containerized applications, providing robust capabilities to manage large-scale deployments. One essential tool within the Kubernetes ecosystem is the Kubernetes Dashboard, a web-based interface that allows users to manage and monitor their Kubernetes clusters. In this article, we will explore how to set up and use the Kubernetes Dashboard with Docker Desktop.
Are you interested in exploring the world of big data and machine learning? Look no further! In this article, we'll take you through a quick and easy guide to installing and configuring Apache Spark with Jupyter Notebook on your MacOS device.