Announcement

Free to view yesterday and today

Customer Service: cat_manager

加载中

正在获取最新内容，请稍候...

FunASR: 强大的端到端语音识别工具包与SOTA模型

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python

Added on 2025年6月19日

View on GitHub

11,101

Stars

1,125

Forks

Python

Language

Project Introduction

Summary

FunASR is a comprehensive, end-to-end open-source toolkit for Automatic Speech Recognition (ASR). It provides fundamental ASR capabilities, includes SOTA pretrained models, and supports related tasks such as Voice Activity Detection (VAD) and text post-processing, aiming to simplify the development and deployment of speech applications.

Problem Solved

Addresses the complexity and high cost of building and deploying accurate speech recognition systems by providing a comprehensive, open-source, and high-performance toolkit with readily available SOTA models.

Core Features

SOTA Pretrained Models

Provides state-of-the-art (SOTA) open-source models for high-accuracy speech recognition.

Voice Activity Detection (VAD)

Includes robust Voice Activity Detection (VAD) capabilities to accurately identify speech segments.

Text Post-processing

Offers text post-processing functionalities for refining transcription outputs.

End-to-End Toolkit

Designed as an end-to-end (E2E) toolkit for streamlined development and deployment.

Tech Stack

Python

PyTorch

Kaldi (potentially for some components)

FastAPI (potentially for serving)

Docker

使用场景

FunASR can be applied in various scenarios requiring speech-to-text capabilities or audio analysis:

会议记录与内容检索

Details

Transcribing audio from meetings, lectures, or interviews for documentation and searchability.

User Value

Significantly reduces manual transcription effort and enables quick keyword search within audio/video archives.

语音助手与智能交互系统

Details

Building backend services for voice assistants, command & control systems, or voice search.

User Value

Provides the core ASR engine required for understanding spoken user input in interactive applications.

批量音频数据分析

Details

Processing large volumes of audio data for analytics, such as call center interactions or media content.

User Value

Automates the conversion of audio to text, facilitating large-scale sentiment analysis, topic modeling, or compliance monitoring.

Recommended Projects

You might be interested in these projects

knadhlistmonk

Listmonk is a modern, high-performance, self-hosted newsletter and mailing list manager delivered as a single binary. It features a modern dashboard and is designed for efficiency.

170141625

View Details

netbox-communitydevicetype-library

A comprehensive, community-driven repository of NetBox DeviceType definitions, simplifying the process of adding network devices, servers, and other equipment to your NetBox instance.

Python

10831056

View Details

evershopcommerceevershop

A powerful and flexible open-source e-commerce platform built with Node.js, designed for building robust and scalable online stores. Ideal for developers and businesses of all sizes.

JavaScript

51411483

View Details