Announcement

Free to view yesterday and today
Customer Service: cat_manager

MediaCrawler: Multi-Platform Social Media Content & Comment Scraper

A versatile open-source media crawler designed to extract content and comments from popular Chinese social media platforms like Xiaohongshu, Douyin, Kuaishou, Bilibili, Weibo, Baidu Tieba, and Zhihu. Ideal for data analysis, market research, and sentiment analysis.

Python
Added on 2025年7月6日
View on GitHub
MediaCrawler: Multi-Platform Social Media Content & Comment Scraper preview
27,232
Stars
6,999
Forks
Python
Language

Project Introduction

Summary

MediaCrawler is an open-source project offering robust crawling capabilities for various Chinese social media platforms. It allows users to collect publicly available content and comments, providing valuable datasets for analysis and research.

Problem Solved

Collecting data from diverse and frequently changing social media platforms manually or using platform-specific tools is inefficient and prone to errors. This project provides a unified solution to automate the data extraction process across multiple key platforms.

Core Features

Xiaohongshu (Little Red Book) Crawler

Crawl posts and comments from Xiaohongshu.

Douyin (TikTok) Video/Comment Crawler

Extract video details and comments from Douyin (TikTok).

Kuaishou Video/Comment Crawler

Gather video data and comments from Kuaishou.

Bilibili (B Station) Video/Comment Crawler

Scrape Bilibili video information and comments.

Weibo Post/Comment Crawler

Fetch Weibo posts and their corresponding comments.

Baidu Tieba Post/Comment/Reply Crawler

Crawl Baidu Tieba posts, comments, and replies.

Zhihu (Zhihu) Q&A/Article/Comment Crawler

Extract Zhihu Q&A, articles, and comments.

Tech Stack

Python
Scrapy
Requests
Selenium
Data Persistence (e.g., JSON, CSV, Database)

Use Cases

The MediaCrawler can be utilized in various scenarios where accessing and analyzing social media data is crucial:

Social Media Sentiment & Brand Analysis

Details

Businesses can crawl comments on posts related to their brand or industry across platforms to gauge public sentiment and identify feedback.

User Value

Gain insights into public perception and identify actionable feedback for product or service improvement.

Academic and Social Science Research

Details

Researchers can collect datasets of discussions on specific topics or events across platforms for qualitative or quantitative analysis.

User Value

Obtain comprehensive and diverse data sets for studying online communication patterns, public opinion, or cultural trends.

Competitor Monitoring and Market Trend Analysis

Details

Marketing teams can monitor competitor activity, content strategies, and audience engagement across different platforms.

User Value

Stay informed about competitive landscapes and identify emerging trends in content or user interaction.

Recommended Projects

You might be interested in these projects

trinodbtrino

Trino is a distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. It allows organizations to analyze data where it lives without migrating it.

Java
115273248
View Details

elasticlogstash

Logstash is a powerful, open-source data processing pipeline that can ingest data from a multitude of sources simultaneously, transform it, and then send it to your favorite "stash", like Elasticsearch.

Java
144753519
View Details

fla-orgflash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Torch and Triton. This project provides high-performance, memory-efficient alternatives to traditional quadratic attention mechanisms, specifically optimized for long sequences and large-scale deep learning models.

Python
2569189
View Details