1 / 16

Audio Directing of Attention

Articles Reviewed. Brock, D., Stroup, J., Ballas A., 2002, Effects of 3D auditory display on dual task performance in a simulated multiscreen workstation environment, Proceedings of the Human Factors and Ergonomics Society 46th Annual General Meeting-2002.Bronkhorst, A.W., Veltman, J.A.,

felice
Télécharger la présentation

Audio Directing of Attention

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Audio Directing of Attention By Jeff Budau

    2. Articles Reviewed Brock, D., Stroup, J., Ballas A., 2002, Effects of 3D auditory display on dual task performance in a simulated multiscreen workstation environment, Proceedings of the Human Factors and Ergonomics Society 46th Annual General Meeting-2002. Bronkhorst, A.W., Veltman, J.A., & van Breda, L., 1996, Application of a three dimensional auditory display in a flight task, Human Factors, 38(1), 23-33. Nelson, W.T., & Bolia, R.S., unknown date, Evaluating the effectiveness of spatial audio displays in a simulated airborne command and control task, Unknown source. Bolia, R.S., DAngelo, W.R., & McKinley,R.L., 1999, Aurally aided visual search in three dimensional space , Human Factors, 41(4), 664-669.

    3. Similarities in the papers All 4 papers Deal with using audio signals with directional characteristics to focus attention to a task Showed a benefit from the use of directional audio Were relatively short

    4. What differed in these papers? The nature of the audio stimulus The task being used to test the effect The presence and nature of distractor stimuli

    5. Nelson & Bolia (ND) The most distinct of the 4 papers Was specifically aimed at spatially orienting streams of speech to assist the operator with discrimination AWACS environment with CAS simulation Weak results showing an improved reaction time for spatial audio conditions Operators believed spatial audio was helpful AWACS stands for Airborne warning and control system CAS stands for Close air support The results in this paper were generally weak with the reaction time just missing the 0.05 level with a p of 0.06. All the high chatter volume conditions showed a reaction time improvement with spatial audio, but interestingly enough the non-spatial proved faster in the low chatter volume condition for 2 of the 3 phases of the mission. There were a few weak points in the research that allowed confounds. The first was the subjects never rated any of the conditions as particularly mentally tough (using the NASA TLX) so the presumption is that there could have been ample monitoring capacity left to perform the task in all conditions. The second was that the chatter didnt necessarily occur at the same time as the target signal. The third issue was that there was no requirement to actually attend the chatter. If the subject could block it out they were free to simply wait for the target signal.AWACS stands for Airborne warning and control system CAS stands for Close air support The results in this paper were generally weak with the reaction time just missing the 0.05 level with a p of 0.06. All the high chatter volume conditions showed a reaction time improvement with spatial audio, but interestingly enough the non-spatial proved faster in the low chatter volume condition for 2 of the 3 phases of the mission. There were a few weak points in the research that allowed confounds. The first was the subjects never rated any of the conditions as particularly mentally tough (using the NASA TLX) so the presumption is that there could have been ample monitoring capacity left to perform the task in all conditions. The second was that the chatter didnt necessarily occur at the same time as the target signal. The third issue was that there was no requirement to actually attend the chatter. If the subject could block it out they were free to simply wait for the target signal.

    6. Brock et al. (2002) The unfortunate part about this paper is that while it is short and sweat, it is actually too short, in that it is less than complete. The authors referenced a previous paper (from a 1999 symposium) as their only descriptor of the audio conditions. Reading about their previous research in the background section of this paper leads me to believe that the sound was set up to be statically located left, right or center. The task was a mix of tracking and tactical display and there was an interaction between the sound/no sound condition and the close or far apart positioning of the information. The spatial sound improved performance to a greater degree when the visual tasks were split farther apart. The sound had only positive effects and countered the loss of performance incurred when the displays were moved farther apart.The unfortunate part about this paper is that while it is short and sweat, it is actually too short, in that it is less than complete. The authors referenced a previous paper (from a 1999 symposium) as their only descriptor of the audio conditions. Reading about their previous research in the background section of this paper leads me to believe that the sound was set up to be statically located left, right or center. The task was a mix of tracking and tactical display and there was an interaction between the sound/no sound condition and the close or far apart positioning of the information. The spatial sound improved performance to a greater degree when the visual tasks were split farther apart. The sound had only positive effects and countered the loss of performance incurred when the displays were moved farther apart.

    7. Bolia et al. (1999) Subjects were within a 4.6 meter diameter geodesic sphere! Small speakers and visual signal arrays were placed at each of the 272 vertices (this meant a 15 degree separation between each) Subjects had to search for the visual signal amongst a set number of distractors Reaction times were compared with no sound, free-field audio and 3D virtual audio This is one of the neatest conditions (being placed inside a sphere) and one of the most elaborate setup of the 4 experiments. It also ended up being the least applicable to real world scenarios. A few of the vertices (lights and speakers) were not used (6) because they were not visible to the seated user. There were 5 conditions of distractors (1,5,10,25 and 50), but no information was given as to the proximity of the distractor to the signal. This is one of the neatest conditions (being placed inside a sphere) and one of the most elaborate setup of the 4 experiments. It also ended up being the least applicable to real world scenarios. A few of the vertices (lights and speakers) were not used (6) because they were not visible to the seated user. There were 5 conditions of distractors (1,5,10,25 and 50), but no information was given as to the proximity of the distractor to the signal.

    8. Bolia et al. (1999) Strong results showing that the sound conditions were significantly faster than no-sound Results should free-field audio was superior to virtual 3D audio Non-audio and virtual 3D showed serial search qualities. Free Field audio showed parallel search qualities The parallel search quality is an interesting one and implies essentially that the free field search is always the same speed regardless of the numbers of distractors. It would of course been interesting to see if there was a break point to that where the density of distractors was such that some degree of serial search had to occur. I would theorize that this would happen when the distractor level approaches nearly complete saturation of the available signal field. I suggest this because the limitation with the virtual audio may reasonably have been the resolution of the equipment being used to generate the signal. Presuming that the human hearing system doesnt have perfect resolution, there is likely a point where the ability to narrow the field is limited by the resolution ability of the human ear (or at the very least the ability to determine exact direction from sound)The parallel search quality is an interesting one and implies essentially that the free field search is always the same speed regardless of the numbers of distractors. It would of course been interesting to see if there was a break point to that where the density of distractors was such that some degree of serial search had to occur. I would theorize that this would happen when the distractor level approaches nearly complete saturation of the available signal field. I suggest this because the limitation with the virtual audio may reasonably have been the resolution of the equipment being used to generate the signal. Presuming that the human hearing system doesnt have perfect resolution, there is likely a point where the ability to narrow the field is limited by the resolution ability of the human ear (or at the very least the ability to determine exact direction from sound)

    9. Bolia et al. (1999) No significant change in accuracy There was a significant interaction between audio condition and number of distractors Researchers believe that the imperfect reproduction of sound in the virtual audio was the reason for the differences. The accuracy finding isnt a real surprise. The seeming effect of the audio in all these papers is to get the eyes pointed in the right direction, It would be very curious if the eyes were MORE accurate because sound was there. The speed effect had much more face validity as that would direct the eyes to the right neighbourhood and then the eyes get to search. The virtual audio may not get the eyes to as small a neighbourhood, so the eyes then have to go through a serial search of the broader area. The non-personalized head related transfer functions were where they were putting some of the blame. These HRTFs are a function that adjusts the sound being transferred based on the sound absorbing and reflecting characteristics of the head, shoulders and ears (Pinna, the soft outside the head portion of the ear). They suggested that the software also didnt allow the smooth transitions in direction cues as the operator shifted their head to locate the sound source. (the subjects head position as tracked and the headphone audio output was altered to mimic the sound shift that should have happened.The accuracy finding isnt a real surprise. The seeming effect of the audio in all these papers is to get the eyes pointed in the right direction, It would be very curious if the eyes were MORE accurate because sound was there. The speed effect had much more face validity as that would direct the eyes to the right neighbourhood and then the eyes get to search. The virtual audio may not get the eyes to as small a neighbourhood, so the eyes then have to go through a serial search of the broader area. The non-personalized head related transfer functions were where they were putting some of the blame. These HRTFs are a function that adjusts the sound being transferred based on the sound absorbing and reflecting characteristics of the head, shoulders and ears (Pinna, the soft outside the head portion of the ear). They suggested that the software also didnt allow the smooth transitions in direction cues as the operator shifted their head to locate the sound source. (the subjects head position as tracked and the headphone audio output was altered to mimic the sound shift that should have happened.

    10. Bronkhorst et al.(1996) A great introduction with good background on previous research in the area and of the technique of establishing personalized Head Related Transfer Functions (HRTFs) Head movements are used to better localize sounds The lack of head movement creates cones of confusion in up-down and front-back locations This paper was the longest and accordingly had the most thorough introduction. The authors point out the 3 main advantages of 3D audio display as: Relevant directional info can be conveyed using natural sound localization ability of humans Spatial separation of sound signals lowers threshold of detectibility Assigning spatial positions of sound sources improves ID of multiple sounds Good description of HRTFs and the note that they stick a mike in your ear to determine yours. I am guessing though that the mike needs to be precisely and repeatably placed or things can go wrong. They use manikins to mimic some of the HRTFs Cones of confusion occur when you cant move your head and there are a couple possible signal locations that can be attributed the audio input. Typically the errors are reflections up-down or front-back. Previously the confusions have been higher with virtual sound with NON-individualized HRTFs.This paper was the longest and accordingly had the most thorough introduction. The authors point out the 3 main advantages of 3D audio display as: Relevant directional info can be conveyed using natural sound localization ability of humans Spatial separation of sound signals lowers threshold of detectibility Assigning spatial positions of sound sources improves ID of multiple sounds Good description of HRTFs and the note that they stick a mike in your ear to determine yours. I am guessing though that the mike needs to be precisely and repeatably placed or things can go wrong. They use manikins to mimic some of the HRTFs Cones of confusion occur when you cant move your head and there are a couple possible signal locations that can be attributed the audio input. Typically the errors are reflections up-down or front-back. Previously the confusions have been higher with virtual sound with NON-individualized HRTFs.

    11. Bronkhorst et al.(1996) This study: permitted head movements Performance with 3D audio display was compared to a visual display containing similar information Knocked itself out producing individual HRTFs Used an odd jet fighter simulation as a test Target jet disappeared mid-flight only to pop up somewhere else and the subject had to go find it Its questionable whether or not you could get similar information on a 2 dimensional display visual as you can get from a 3 dimensional sound. I believe they deal with this in their discussion. The authors showed in a previous experiment that head movement is critical at reducing confusions so allowed it in this experiment. The production of the HRTFs is impressive. 967 different angles and 2.5 hours of testing is a pretty big investment and helps explain why the N was only 8. All in all the paper notes 2 days per subject from start to finish. I found the test kind of weird. I guess it is a good test to search for a randomly placed object, but why go through the trouble of making it a bit realistic only to have the jet disappear. It stands to reason that a flight with an empty scope and having an enemy entering the theatre of operations from a random location would be a bit more realistic and serve the same purpose. Who knows.Its questionable whether or not you could get similar information on a 2 dimensional display visual as you can get from a 3 dimensional sound. I believe they deal with this in their discussion. The authors showed in a previous experiment that head movement is critical at reducing confusions so allowed it in this experiment. The production of the HRTFs is impressive. 967 different angles and 2.5 hours of testing is a pretty big investment and helps explain why the N was only 8. All in all the paper notes 2 days per subject from start to finish. I found the test kind of weird. I guess it is a good test to search for a randomly placed object, but why go through the trouble of making it a bit realistic only to have the jet disappear. It stands to reason that a flight with an empty scope and having an enemy entering the theatre of operations from a random location would be a bit more realistic and serve the same purpose. Who knows.

    12. Bronkhorst et al.(1996) The 3D audio was a faster search time than the visual display, but not significantly. 3D audio and visual together was significantly faster than visual alone. Workload score didnt differ significantly Where the plane appeared relative to the subjects plane had a significant effect on search time. It should be noted here that the scale on the graphs help give the illusion that these results were bigger than they really were. At first it caught me off guard that the 3D audio wasnt significantly better than the visual, but the scale helps tell the story. Maybe with a higher N we would see a significant difference. Keep in mind that he visual display gives information like speed of the target, but cant give the same kind of precise location of the targets position as the audio can. I would question whether the workload for a single target would show a difference. I think if you were looking for perceived workload effects you would want multiple targets and have the 3D audio help ID new targets or something. Workload used the Rating Scale for Mental Effort (RSME) by Zijlstra (1993) Keep in mind that the plane the subject is flying isnt always necessarily in the same orientation relative to the target plane when the target reappears. Clearly figure 4 on page 31 shows targets in the back of the subjects plane are much tougher to get to quickly. Notice in this figure that every single graph shows 3D audio better than visual and both better than either alone. I think with more power this would have shown significant results. It should be noted here that the scale on the graphs help give the illusion that these results were bigger than they really were. At first it caught me off guard that the 3D audio wasnt significantly better than the visual, but the scale helps tell the story. Maybe with a higher N we would see a significant difference. Keep in mind that he visual display gives information like speed of the target, but cant give the same kind of precise location of the targets position as the audio can. I would question whether the workload for a single target would show a difference. I think if you were looking for perceived workload effects you would want multiple targets and have the 3D audio help ID new targets or something. Workload used the Rating Scale for Mental Effort (RSME) by Zijlstra (1993) Keep in mind that the plane the subject is flying isnt always necessarily in the same orientation relative to the target plane when the target reappears. Clearly figure 4 on page 31 shows targets in the back of the subjects plane are much tougher to get to quickly. Notice in this figure that every single graph shows 3D audio better than visual and both better than either alone. I think with more power this would have shown significant results.

    13. Bronkhorst et al.(1996) The authors found: Subjects developed strategies to sort out the zones of confusion Often this meant the subject rolled their aircraft to alter their audio input in a manner that helped narrow the search field. The interaction effect with 3D audio and Visual being stronger than either individually may be that the two displays confer slightly different information that can be used together beneficially. The combination of the audio and visual inputs to a greater benefit reminds me of redundant signal coding. Kind of like the stop sign having the Red, the Octagon and the word Stop to all help transmit information, presumably in a stronger way than either alone and in such a way that will conquer most distractors, maybe the presentation of audio and visual functions the same way. There is an additive effect in a way that confers similar information across different senses. The noise reinforces the blip on the radar and vice versa. The combination of the audio and visual inputs to a greater benefit reminds me of redundant signal coding. Kind of like the stop sign having the Red, the Octagon and the word Stop to all help transmit information, presumably in a stronger way than either alone and in such a way that will conquer most distractors, maybe the presentation of audio and visual functions the same way. There is an additive effect in a way that confers similar information across different senses. The noise reinforces the blip on the radar and vice versa.

    14. Additional Related Papers McClimens, B., Nevitt, J., Zhao, C., Brock, D., & Ballas, J., 2005, The effect of pitch shifts on the identification of environmental sounds: design considerations for the modification of sounds in auditory displays, Proceedings of ICAD 05 Eleventh Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 6-9, 2005 The experimentors used environmental sounds Short sounds Experiment Pitch shifted the sounds 6 & 12 semitones The results Any pitch shift makes the sound tougher to identify Pitch shifts downwards show this effect to a lesser degree. Implications Carefully select how to adjust signals in a crowded audio display I picked up this paper while looking for some more information on the Multi Modal watch station (MMW) in the Brock et al paper. I thought it had some interesting implications as all the assigned papers except Nelson & Bolia (ND) dealt with a relatively free audio environment so the audio information was easy to distinguish. Nelson and Bolia however designed their experiment in such away that all the clutter sound could be effectively ignored because no measure caught how much they were paying attention to it. Presumably in a crowded, active battlefield, multiple audio signals may be introduced to the operator and how these will be separated in their audio field has significant implications. I picked up this paper while looking for some more information on the Multi Modal watch station (MMW) in the Brock et al paper. I thought it had some interesting implications as all the assigned papers except Nelson & Bolia (ND) dealt with a relatively free audio environment so the audio information was easy to distinguish. Nelson and Bolia however designed their experiment in such away that all the clutter sound could be effectively ignored because no measure caught how much they were paying attention to it. Presumably in a crowded, active battlefield, multiple audio signals may be introduced to the operator and how these will be separated in their audio field has significant implications.

    15. Additional Related Papers Brock, D., & McClimens, B.,ND, Cognitive models of the effect of audio cueing on attentional shifts in a complex multimodal, dual display task, ND, unknown source (online) The experimentors: Compared eye movements in a dual task (tracking and tactical) in a no-sound and 3D sound environment The results The 3D audio cueing reduced the number of eye movements the operator makes. Implications There are clear implications for cognitive workload in what is a reasonably taxing task. Again this paper was a little different than I suspected (I came across it on a google scholar search), but didnt keep the source and it is not on the paper. The reference section has a 2004 paper so this is reasonably recent. It should be noted that the majority of the paper is dedicated to describing the translation of the results of the study into a computer model to help model the search behaviour of people. While it was quite interesting and of obvious use when predicting the effect of such displays on performance it was a little beyond the scope of what I was looking for, which was a little more information on the effects of the 3D audio on performance in more complex environments.Again this paper was a little different than I suspected (I came across it on a google scholar search), but didnt keep the source and it is not on the paper. The reference section has a 2004 paper so this is reasonably recent. It should be noted that the majority of the paper is dedicated to describing the translation of the results of the study into a computer model to help model the search behaviour of people. While it was quite interesting and of obvious use when predicting the effect of such displays on performance it was a little beyond the scope of what I was looking for, which was a little more information on the effects of the 3D audio on performance in more complex environments.

    16. Additional Related Papers Brock, D., Ballas, J.A., Stroup, J.L., & McClimens, B.,2004, The design of mixed use virtual auditory displays: recent findings with a dual task paradigm, Proceedings of ICAD 04 Tenth meeting of the international conference on auditory display, Sydney, Australia, July 6-9, 2004. The experimentors: Used the dual tracking task (tactical and tracking) identical to the last paper. Compared head movements and reaction times during the task under various single and dual audio single conditions The results Replicated the previous finding that the 3D audio cues decreased reaction time. The audio tracking alert (indicating they were off target on the tracking task) increased head movements and reaction times Implications The audio environment is one that is sensitive to overloading and you need to be careful how your audio displays will pull attention This paper I tracked down after finding it in the references of the previous paper (it looked on topic). The summary of this paper might be beware how you use audio displays. The use of the audio display to discretely indicate the tactical display had been handled was consistently useful. The audio display on the tracking seemed to slow things down and confound the tactical display, particularly when the tactical display had no audio signals of its own. Again the use of the audio signals in a complex environment is where this research needs to go. This paper I tracked down after finding it in the references of the previous paper (it looked on topic). The summary of this paper might be beware how you use audio displays. The use of the audio display to discretely indicate the tactical display had been handled was consistently useful. The audio display on the tracking seemed to slow things down and confound the tactical display, particularly when the tactical display had no audio signals of its own. Again the use of the audio signals in a complex environment is where this research needs to go.

    17. Take Aways Sounds can help draw attention to a visual display reducing the time needed to perform visual tasks. The more the task is 3D in nature the more high fidelity virtual sounds help performance 3D audio can improve visual tracking performance when used together Continuous audio signals that are used to confer tracking information not related to direction (i.e. degree of deviation from ideal) may decrease performance in other tasks. All the papers clearly showed that sound with directional characteristics can help draw attention in that spatial direction. It should be noted that the ability to move our heads helps in the determining of direction and that possibly due to imperfect sound reproduction real audio tends to be more effective than virtual audio. It seemed that the tasks where tracking in a 3D environment was necessary, 3D sounds were used (as opposed to the attention to screens and the conversational paper, which used only broad directional cues). This may just be the implications based on the design of the papers, but it has some face validity. 3D audio combined nicely with the visual tracking tasks, likely due to each display offering some unique information. The audio feedback on the continuous tracking experiment didnt seem to be helpful in a dual task environment, possible due to it drawing too many attentional resources from the other task. It was acting like the squeaky wheel. All the papers clearly showed that sound with directional characteristics can help draw attention in that spatial direction. It should be noted that the ability to move our heads helps in the determining of direction and that possibly due to imperfect sound reproduction real audio tends to be more effective than virtual audio. It seemed that the tasks where tracking in a 3D environment was necessary, 3D sounds were used (as opposed to the attention to screens and the conversational paper, which used only broad directional cues). This may just be the implications based on the design of the papers, but it has some face validity. 3D audio combined nicely with the visual tracking tasks, likely due to each display offering some unique information. The audio feedback on the continuous tracking experiment didnt seem to be helpful in a dual task environment, possible due to it drawing too many attentional resources from the other task. It was acting like the squeaky wheel.

More Related