Arabic Multidialect Emotional Speech Dataset
Description
Arabic Multidialect Emotional Speech is a demonstration dataset of Arabic emotional speech recordings featuring multiple dialects, comprehensive quality annotations, and emotion labels. It is designed for training and evaluating speech systems that need to recognize, generate, or analyze affective Arabic speech across realistic everyday domains.
Each record pairs an audio file with a transcribed Arabic script, dialect classification, recording style (emotion), domain, speaker demographics, and dual-rater quality assurance metadata. The demo subset focuses on the Hejazi (Saudi Arabian / Gulf) dialect with Angry and Happy emotional styles spoken across ten conversational domains.
Scope
The demo split contains 496 recordings in Parquet format, with an aggregate footprint under 1K rows. Each entry includes a single audio clip ranging roughly 5–10 seconds, paired with full transcription, speaker metadata, and independent annotations from two raters. The full production dataset extends across additional Arabic dialects and emotional styles upon request.
Intended Usage
- Automatic Speech Recognition (ASR) fine-tuning for Arabic dialects
- Emotion-aware Text-to-Speech (TTS) and expressive voice synthesis
- Audio classification — emotion, dialect, speaker traits
- Multimodal sentiment and paralinguistic analysis
- Benchmarking dialect robustness and rater-agreement studies
Modalities & Languages
- Modalities: Audio, Text
- Language: Arabic
- Dialect featured in demo: Hejazi (Gulf / Saudi Arabian)
- Emotions featured in demo: Angry, Happy
- Domains: Finance, Family, Healthcare, Technology, Travel, Daily Life, Shopping, Customer Service, Education, Emergency
Data Format and Access
The dataset is delivered as Parquet with embedded audio, auto-converted by the Hugging Face datasets server. It is directly loadable with the datasets, pandas, or polars libraries. Each row represents one recording with the following fields:
audio: Audio recording filetext: Transcribed speech text in Arabicdialect: Dialect classification (e.g. Hejazi)recording_style: Emotional style (e.g. Angry, Happy)domain: Conversational context (Finance, Healthcare, Travel, …)duration_seconds: Length of audio in secondsspeaker_id,speaker_gender,speaker_age: Speaker metadataqa_score,qa_tier: Aggregate quality score and tier (gold / silver)emotion_match: Whether the intended emotion was correctly identified by ratersqa_min_dimension,qa_agreement: Minimum dimension and inter-rater agreementr1_*/r2_*: Independent annotations from two raters covering script accuracy, dialect match, speech quality, background noise, technical quality, emotion, and expression intensityid: Unique record identifier
Sample Records
| Emotion | Domain | Dialect | Speaker | Duration | Expression |
|---|---|---|---|---|---|
| Angry | Finance | Hejazi | Male, 27 | 8.0s | medium-high |
| Happy | Healthcare | Hejazi | Female, 36 | 6.5s | high |
Quality Assurance
- Dual rater annotations — every record is independently assessed by two raters across multiple dimensions
- Quality tiers — gold (highest) and silver, derived from the underlying QA score
- Multi-dimensional QA — script accuracy, dialect match, speech quality, noise level, technical quality, emotion label, and expression intensity (high / medium / low)
- Licensing — released under
CC-BY-NC-4.0for research and demonstration use
Customization and Modification
We offer dataset customization to meet specific commercial, academic, or technical needs. Available extensions include:
- Additional Arabic dialects (Egyptian, Levantine, Maghrebi, Iraqi, and more)
- Expanded emotional styles beyond Angry / Happy (sad, neutral, fearful, surprised, …)
- Domain-specific scripted scenarios for verticals such as banking, healthcare, automotive, or customer support
- Tailored speaker pools by gender, age range, or accent profile
To request a custom dataset or tailored solution, please contact us at contact@datahive.ai