Publicación: MIDI-Conditional Text-to-Audio Synthesis Using ControlNet on AudioLDM
Cargando...
Fecha
2023-09
Autores
Editor/a
Director/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Resumen
Text-to-audio systems have gained attention in recent months, achieving impressive results in general audio synthesis. However, they often lack fine-grained control over the musical output, as note-level adjustments cannot be determined by text. In this work, we present MIDI-AudioLDM, which implements MIDI conditioning into AudioLDM with the use of ControlNet. This enables MIDI-conditional text-to-audio synthesis, which adds up to AudioLDM’s previous capacities, including direct text-to-audio synthesis as well as audio style transfer and inpainting. Like AudioLDM, the model uses contrastive language-audio pretraining (CLAP) latents and is trained on audio embeddings, while using text embeddings for inference. In contrast to unconditional audio synthesis, MIDI-AudioLDM offers detailed control over various musical aspects such as notes, genre, mood, and timbre, which makes it a more valuable tool for the music production process. A demo is available at https://huggingface.co/spaces/lauraibnz/midi-audioldm.
Descripción
Categorías UNESCO
Palabras clave
audio synthesis, MIDI conditioning, text-to-audio systems, AudioLDM, ControlNet
Citación
Ibáñez Martínez, Laura (2023) MIDI-Conditional Text-to-Audio Synthesis Using ControlNet on AudioLDM. Trabajo Fin de Máster. Universidad de Educación a Distancia (UNED)
Centro
Facultades y escuelas::E.T.S. de Ingeniería Informática
Departamento
Inteligencia Artificial