Logo DWBI.org Login / Sign Up
Sign Up
Have Login?
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Login
New Account?
Recovery
Go to Login
By continuing you indicate that you agree to Terms of Service and Privacy Policy of the site.

Iceberg Data Lake on Amazon S3 with AWS Glue Catalog

Apache Iceberg is a high-performance open table format for analytic datasets. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg provides ACID compliance, Schema evolution, Time travel for data lakes.

Continue Reading...

Docker Kafka Connect Container for AWS MSK cluster

Are you looking for a seamless way to integrate your Apache Kafka cluster on Amazon Managed Streaming for Kafka (MSK) with other data sources and sinks? Look no further! In this article, we'll guide you through the process of setting up a Docker Kafka Connect container on MacOS to work with your AWS MSK cluster.

GitHub Self-Hosted Private Runners on AWS

Using GitHub’s default runners may not always be ideal, particularly if you need custom configurations, enhanced security, or cost-efficiency. Hosting self-managed GitHub runners on AWS offers flexibility and control over your CI/CD processes. In this guide, we'll walk through the process of setting up GitHub self-hosted private runners on AWS.

Simplifying AWS Access in GitHub Actions with OIDC Provider

Managing AWS access keys for GitHub Actions can be a challenge, especially when ensuring security and ease of access. Traditionally, AWS IAM user access keys have been used to grant GitHub Actions the permissions needed to interact with AWS resources. However, there is a more secure and manageable way: using OpenID Connect (OIDC) identity providers to obtain temporary AWS credentials.

Automating AWS Infrastructure Provisioning with Terraform and GitHub Actions

Automating the provisioning of AWS infrastructure is essential for ensuring consistency and minimizing human errors during deployments. With Terraform and GitHub Actions, you can implement a Continuous Delivery (CD) pipeline that deploys to multiple environments (like staging and production) across different AWS accounts.

Kafka installation on AWS EC2

In this article, we will explore how to deploy a Confluent Kafka cluster using Docker-Compose. We will create a comprehensive configuration file that includes the necessary services and dependencies to provision a fully functional Kafka cluster in AWS EC2 instance for a Demo or PoC use case.

MLflow Installation on AWS EC2

Are you looking for a comprehensive guide on how to install the MLflow tracking server in an AWS EC2 instance? Look no further! In this article, we will walk you through the process of setting up an MLflow tracking server in an EC2 instance, including creating an S3 bucket and assigning an IAM role.

Airflow Installation on AWS EC2

Are you looking for a hassle-free way to set up Apache Airflow on your Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance? Look no further! In this article, we'll walk you through the process of installing Airflow using a simple script that can be added as part of the user data while launching an EC2 instance.

Install & Configure PostgreSQL on Amazon Linux

As a developer, having a reliable database management system is crucial for storing and managing data. PostgreSQL is one such popular open-source relational database management system that offers robust features and scalability. In this article, we will walk you through the process of installing and configuring PostgreSQL on Amazon Linux.

How to create AWS Lambda Layer

As the demand for serverless computing continues to grow, AWS Lambda has become a popular choice for developers looking to build scalable and efficient applications. One of the key features of AWS Lambda is its support for layers, which allow you to package and reuse code across multiple functions. In this article, we'll walk through the process of creating an AWS Lambda layer using Python on MacOS.

Automate Docker CI/CD Pipelines with GitHub Actions

In today’s fast-paced development environment, continuous integration and continuous deployment (CI/CD) are no longer optional—they’re essential. Automating these processes not only speeds up your workflow but also minimizes human error, allowing you to focus on what truly matters: writing quality code.

Github Pages as Helm Chart Repository

Helm, a powerful package manager for Kubernetes, simplifies application deployment and management. GitHub Pages provides an easy and free hosting solution for Helm charts. This guide will walk you through setting up a Helm chart repository using GitHub Pages and uploading your charts.

External Secrets in Kubernetes

One of the key challenges in working with Kubernetes is managing sensitive data like passwords, API tokens, and database credentials in a secure manner. These sensitive details, often referred to as "secrets," need to be protected to ensure application security.

Secret Management in Kubernetes

Managing secrets in a cloud-native environment like Kubernetes is a crucial aspect of maintaining the security and integrity of your applications. Secrets, in the context of Kubernetes, are sensitive pieces of data such as passwords, API keys, OAuth tokens, and TLS certificates. These secrets need to be securely managed, accessed, and used by your Kubernetes workloads.

Setup MLflow on Kubernetes

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. Integrating it with Kubernetes allows scalable deployment, artifact storage, and tracking capabilities, essential for managing production-level ML models. This guide walks through the setup of MLflow on Kubernetes, configuring PostgreSQL as the backend, MinIO as the artifact store.

Setup MinIO Object Storage on Kubernetes

MinIO is a popular open-source object storage solution, ideal for handling unstructured data at high performance. Integrating MinIO into Kubernetes allows you to deploy scalable storage in your cloud-native applications. In this article, we'll walk you through setting up MinIO in a Kubernetes environment

Setup a Private Container Registry

A private local container registry enables you to securely store and manage your Docker images, improving efficiency, control, and security. This guide will walk you through the setup of Docker Distribution Registry in Kubernetes cluster.

Setup Kubernetes Persistent Volumes on Docker Desktop

Kubernetes provides a powerful platform for managing containerized applications. In Kubernetes, Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) enable dynamic and scalable storage management, ensuring that stateful applications can retain their data even if containers are deleted and recreated.

Install Argo CD in Kubernetes

Argo CD is a powerful continuous delivery tool for managing Kubernetes resources through a GitOps approach. With Argo CD, your Git repository is the single source of truth for your application’s desired state, ensuring consistency and reliability across your deployments.

Kubernetes Dashboard with Docker Desktop

Kubernetes has become the go-to orchestration platform for containerized applications, providing robust capabilities to manage large-scale deployments. One essential tool within the Kubernetes ecosystem is the Kubernetes Dashboard, a web-based interface that allows users to manage and monitor their Kubernetes clusters. In this article, we will explore how to set up and use the Kubernetes Dashboard with Docker Desktop.

Setup Apache Spark with Jupyter Notebook on MacOS

Are you interested in exploring the world of big data and machine learning? Look no further! In this article, we'll take you through a quick and easy guide to installing and configuring Apache Spark with Jupyter Notebook on your MacOS device.