Announcement

Free to view yesterday and today
Customer Service: cat_manager

DataHub: 现代数据与AI栈的开源元数据平台

DataHub is an open-source metadata platform for the modern data stack. It empowers data teams to discover, govern, and understand their data assets effectively.

Java
Added on 2025年6月10日
View on GitHub
DataHub: 现代数据与AI栈的开源元数据平台 preview
10,697
Stars
3,131
Forks
Java
Language

Project Introduction

Summary

DataHub is a powerful, open-source metadata platform built for the data-driven enterprise. It provides a comprehensive view of your data landscape, enabling better data discoverability, governance, and collaboration.

Problem Solved

In today's complex data ecosystems, it's challenging for users to find the right data, understand its meaning, assess its trustworthiness, and track its usage. DataHub solves this by providing a central hub for all technical, business, and operational metadata.

Core Features

Unified Metadata Search & Discovery

Easily find datasets, dashboards, models, and other data assets using a unified search interface.

Automated Metadata Ingestion

Connect to various data sources (databases, data lakes, BI tools, etc.) to automatically ingest metadata.

Automated Data Lineage

Visualize how data flows through your system, from source to consumption.

Data Governance Capabilities

Define ownership, tags, terms, and policies to improve data governance and compliance.

Tech Stack

Kafka
Elasticsearch
MySQL
PostgreSQL
Spark
React
TypeScript

使用场景

DataHub is utilized across various industries and organizational functions to enhance data operations and foster a data-driven culture.

统一数据资产目录

Details

Organizations use DataHub to create a central catalog of all data assets, allowing users to easily search, find, and understand the data they need for analysis or application development.

User Value

Significantly reduces time spent searching for data, improves data access efficiency, and prevents redundant data creation.

端到端数据血缘追踪

Details

Teams leverage DataHub's lineage capabilities to understand the flow of data from its origin to its final consumption, which is critical for impact analysis, root cause analysis, and compliance.

User Value

Increases trust in data, simplifies debugging of data pipelines, and streamlines compliance audits.

增强数据治理与合规性

Details

Data stewards use DataHub to define business terms, assign ownership, tag sensitive data, and implement data access policies.

User Value

Ensures data quality, enhances data security, and helps meet regulatory requirements like GDPR or CCPA.

Recommended Projects

You might be interested in these projects

libuvlibuv

Libuv is a multi-platform support library with a focus on asynchronous I/O. It provides an event loop, timers, and various asynchronous utilities.

C
253873702
View Details

madlerzlib

A high-performance, general-purpose lossless data compression library. Provides functions for compressing and decompressing data streams, crucial for reducing data size in various applications.

C
61902564
View Details

googleguava

A comprehensive set of Google's core libraries for Java, providing common utilities, data structures, and more to enhance developer productivity and code quality.

Java
5087311015
View Details