git-researchGit-Research
Sign in

Quick Links

DashboardNamespacesPull RequestsIssuesStarred
ModelsDatasetsPapersSpacesCollectionsExperimentsNotebooks
Settings

Quick Links

DashboardNamespacesPull RequestsIssuesStarred
ModelsDatasetsPapersSpacesCollectionsExperimentsNotebooks
Settings

Datasets

Access and share high-quality datasets for machine learning research

Browse Datasets

common-crawl/common-crawl-2024Trending220 TB

Large-scale web corpus containing petabytes of data collected over years

Language ModelingText Generation
450K3,200Formats: parquet, json
stanford-vision/imagenet-21kTrending1.3 TB

Large-scale image dataset with 21,000 classes for computer vision tasks

Image ClassificationObject Detection
280K5,600Formats: tar, zip
stanford-nlp/squad-v245 MB

Reading comprehension dataset with 150K+ questions on Wikipedia articles

Question AnsweringNLP
820K8,900Formats: json
openslr/librispeech60 GB

1000 hours of English speech corpus derived from audiobooks

Speech RecognitionAudio
390K4,200Formats: flac, txt

Share Your Dataset

Contribute to the research community by uploading your datasets