Linguistic data 2022

Hello Campania

Linguistic data & analysis

Collection and analysis of linguistic data for the "Hello Campania" project (University of Naples Federico II & L'Orientale). The work turned raw audio recordings into structured datasets for linguistic research and documentation.

View project →

Turning real-world audio into structured, analysable linguistic data with high fidelity and consistent metadata.

Hybrid pipeline: WhisperX for automatic transcription, alignment, and diarisation, followed by ELAN for multi-level manual annotation and segmentation.

A reliable, structured dataset ready for linguistic analysis and documentation, with a rigorous and replicable workflow for complex spoken data.

#Linguistics#Data collection#University research#WhisperX#ELAN