Three network-wide activities will support the ESR research projects. Unlike the projects, designed as doctoral research programmes for early-stage researchers, support activities are shorter-term (12-24 months) and will be carried out by experienced researchers (ERs). The planned support projects are critical to the success and anchoring of the INSPIRE network. The ER projects will be instrumental in providing links and synergy between the ESR projects by developing common data and common processing tools.
S-1. Collection of a large-scale, individual confusions corpus. The development and evaluation of models which predict speech intelligibility at a microscopic level requires a different kind of listener response characterisation than that used for macroscopic models, for which averaged intelligibility scores for each environmental degradation have typically sufficed. Microscopic models demand listener response distributions at the level of individual tokens. Of particular use are high entropy distributions stemming from a high level of inter-listener agreement. Consistent responses of this kind are of immense value, especially when they are incorrect, since any model which replicates these responses is likely to accurately reflect normal human perceptual processes. Further, those responses which are both correct and have a high level of listener agreement are also valuable in providing a goal for robust algorithms. Preliminary studies have indicated that such a high-value corpus can be collected by means of very large-scale web-based speech perception experiments, which we intend to pursue in the first 12 months of the INSPIRE project.
S-2. Support for sophisticated, synthesis-based stimulus design. Much of what we know about speech perception has come from experiments that used manipulations of speech or speech-like signals to create stimuli, which often lacked naturalness. More recently, auditory vocoders such as STRAIGHT have increased the fidelity of manipulations of pitch and apparent vocal tract size. In INSPIRE, we will take the next steps towards the ideal of controlled-yet-realistic speech that can be created by means of the articulatory-controllable statistical speech synthesis that is developed by UEDIN. These techniques, which will be developed during the first 24 months of the project, will be particularly valuable in the accommodation theme, since they allow the generation of stimuli reflecting, for example, clear-speech modifications made by speakers in response to adverse conditions. They will also be employed for perturbation analysis of consistent confusions in order to explore processes that give rise to confusions based on misallocation of signal components. Once the human responses are known to these stimuli, these data, similarly to the data collected in S-1, will form valuable material for evaluating robustness algorithms developed in the ENR projects.
S-3. INSPIRE Challenge. Research fields such as automatic speech recognition, speech separation and speech synthesis technology have benefitted greatly from global, community-led evaluation campaigns held annually. Such challenges are less common in behavioural research and modelling. As a consequence, the principal virtues of the growing number of microscopic intelligibility models are difficult to assess. The INSPIRE Challenge will introduce the comparative and competitive evaluations between models of perceptual behaviours by providing a common set of richly annotated behaviour data to all contestants. Using data from project S-1 the ER will make preparations that allow the senior staff members of INSPIRE to organise two annual rounds of the Challenge during the lifetime of the project. In doing so, the infrastructure will be put in place for future challenges that the community can organise without dedicated funding. This will make sure that the goals which motivate INSPIRE continue beyond the lifetime of the project.