Hello Campania
Linguistic data & analysis
Collection and analysis of linguistic data for the "Hello Campania" project (University of Naples Federico II & L'Orientale). The work turned raw audio recordings into structured datasets for linguistic research and documentation.
Turning real-world audio into structured, analysable linguistic data with high fidelity and consistent metadata.
Hybrid pipeline: WhisperX for automatic transcription, alignment, and diarisation, followed by ELAN for multi-level manual annotation and segmentation.
A reliable, structured dataset ready for linguistic analysis and documentation, with a rigorous and replicable workflow for complex spoken data.