DataHive AI

Arabic Multidialect Emotional Speech Dataset

Audio

Speech

Arabic

Multi-dialect

Emotion

TTS

Get Custom Dataset Download Free Dataset

Description

Arabic Multidialect Emotional Speech is a demonstration dataset of Arabic emotional speech recordings featuring multiple dialects, comprehensive quality annotations, and emotion labels. It is designed for training and evaluating speech systems that need to recognize, generate, or analyze affective Arabic speech across realistic everyday domains.

Each record pairs an audio file with a transcribed Arabic script, dialect classification, recording style (emotion), domain, speaker demographics, and dual-rater quality assurance metadata. The demo subset focuses on the Hejazi (Saudi Arabian / Gulf) dialect with Angry and Happy emotional styles spoken across ten conversational domains.

Scope

The demo split contains 496 recordings in Parquet format, with an aggregate footprint under 1K rows. Each entry includes a single audio clip ranging roughly 5–10 seconds, paired with full transcription, speaker metadata, and independent annotations from two raters. The full production dataset extends across additional Arabic dialects and emotional styles upon request.

Intended Usage

Automatic Speech Recognition (ASR) fine-tuning for Arabic dialects
Emotion-aware Text-to-Speech (TTS) and expressive voice synthesis
Audio classification — emotion, dialect, speaker traits
Multimodal sentiment and paralinguistic analysis
Benchmarking dialect robustness and rater-agreement studies

Modalities & Languages

Modalities: Audio, Text
Language: Arabic
Dialect featured in demo: Hejazi (Gulf / Saudi Arabian)
Emotions featured in demo: Angry, Happy
Domains: Finance, Family, Healthcare, Technology, Travel, Daily Life, Shopping, Customer Service, Education, Emergency

Data Format and Access

The dataset is delivered as Parquet with embedded audio, auto-converted by the Hugging Face datasets server. It is directly loadable with the datasets, pandas, or polars libraries. Each row represents one recording with the following fields:

audio: Audio recording file
text: Transcribed speech text in Arabic
dialect: Dialect classification (e.g. Hejazi)
recording_style: Emotional style (e.g. Angry, Happy)
domain: Conversational context (Finance, Healthcare, Travel, …)
duration_seconds: Length of audio in seconds
speaker_id, speaker_gender, speaker_age: Speaker metadata
qa_score, qa_tier: Aggregate quality score and tier (gold / silver)
emotion_match: Whether the intended emotion was correctly identified by raters
qa_min_dimension, qa_agreement: Minimum dimension and inter-rater agreement
r1_* / r2_*: Independent annotations from two raters covering script accuracy, dialect match, speech quality, background noise, technical quality, emotion, and expression intensity
id: Unique record identifier

Sample Records

Emotion	Domain	Dialect	Speaker	Duration	Expression
Angry	Finance	Hejazi	Male, 27	8.0s	medium-high
Happy	Healthcare	Hejazi	Female, 36	6.5s	high

Quality Assurance

Dual rater annotations — every record is independently assessed by two raters across multiple dimensions
Quality tiers — gold (highest) and silver, derived from the underlying QA score
Multi-dimensional QA — script accuracy, dialect match, speech quality, noise level, technical quality, emotion label, and expression intensity (high / medium / low)
Licensing — released under CC-BY-NC-4.0 for research and demonstration use

Customization and Modification

We offer dataset customization to meet specific commercial, academic, or technical needs. Available extensions include:

Additional Arabic dialects (Egyptian, Levantine, Maghrebi, Iraqi, and more)
Expanded emotional styles beyond Angry / Happy (sad, neutral, fearful, surprised, …)
Domain-specific scripted scenarios for verticals such as banking, healthcare, automotive, or customer support
Tailored speaker pools by gender, age range, or accent profile

To request a custom dataset or tailored solution, please contact us at contact@datahive.ai

Get Custom Dataset Download Free Dataset