A microphone array consists of a group of microphones arranged in specific geometric patterns, typically linear or circular. These arrays perform space-time processing on sound signals collected from various spatial directions, enabling advanced functions such as noise suppression, reverberation removal, interference reduction, sound source localization, sound source tracking, and array gain. These capabilities significantly enhance the quality of speech signal processing and improve speech recognition accuracy in real-world environments.
Link to VISSONIC
Microphone arrays can be classified into different shapes, including linear, circular, and spherical, though they can also take on more complex forms like cross, planar, spiral, or irregular configurations. The number of microphones in an array can range from just two to several thousand, making them versatile but complex. While intricate arrays are primarily used in industrial and defense applications, simpler configurations are more common in consumer electronics due to cost considerations.
The growing popularity of microphone arrays in consumer devices is largely driven by the booming voice interaction market. These arrays are crucial for improving long-distance voice recognition, ensuring accuracy in real-world scenarios. As voice interaction technology moves from mobile phones to devices like Echo smart speakers or robots, the challenges faced by microphones change dramaticallycomparable to the difference between whispering and shouting.
Smartphones, like those equipped with Siri, typically use a single microphone system. This setup works well in low-noise environments, with no reverberation, and when the sound source is very close. However, when the sound source is farther away, and theres significant noise, multipath reflection, or reverberation, the quality of the captured signal deteriorates, severely impacting voice recognition accuracy. Single microphones struggle to achieve sound source localization and separation under these conditions. This is where microphone arrays come into play, offering a solution for these limitations.
That said, a microphone array alone isnt enough to guarantee high voice recognition rates. While the array serves as the physical gateway, handling sound signal processing in the real world, the ultimate recognition rate depends on cloud-based processing. For optimal results, the physical microphone array and the cloud-based recognition system must work in harmony.
Moreover, the quality of the signal processed by the microphone array is critical. Modern speech recognition systems rely heavily on deep learning, which is constrained by the quality of its training data. If the processed sound doesnt closely match the characteristics of the training samples, recognition accuracy can suffer. Interestingly, the goal isnt to produce the purest signal possible, but rather one that closely mirrors the characteristics of the training dataeven if that data is less than ideal.
Microphone arrays can be categorized based on the distance between the sound source and the array itself, leading to two distinct sound field models: the near-field model and the far-field model.
In the near-field model, sound waves are treated as spherical waves. This is because sound waves, as a type of vibration wave, spread outward in all directions after being generated by a vibrating sound source, making them inherently spherical. The near-field model accounts for the amplitude differences in the signals received by each microphone in the array.
On the other hand, the far-field model simplifies the situation by treating sound waves as plane waves, ignoring amplitude differences across the microphones. Instead, it assumes that the relationship between the signals received by each microphone is purely a matter of time delay. This simplification makes the far-field model easier to process and is the basis for most general speech enhancement techniques.
There isnt an absolute rule to distinguish between near-field and far-field models. However, it is generally accepted that when the distance between the sound source and the central reference point of the microphone array is significantly greater than the signal wavelength, the far-field model applies. Conversely, if this distance is shorter, the near-field model is more appropriate.
For example, if the distance between adjacent microphones in a uniform linear array (known as the array aperture) is denoted by d and the wavelength of the highest frequency sound from the source (the minimum wavelength) is λ_min, then if the distance from the sound source to the center of the array is greater than 2d²/λ_min, its considered a far-field model; otherwise, it falls under the near-field model, as illustrated in Figure 1
Consumer-grade microphone arrays face a variety of challenges, including environmental noise, room reverberation, overlapping human voices, model noise, and array structure limitations. When used in speech recognition applications, additional optimization and alignment for speech recognition accuracy must be considered. To address these challenges, especially in specialized consumer applications, certain key technologies play a crucial role:
In speech recognition, it isnt necessary to completely eliminate noise, unlike in call systems where full noise removal is often required. The noise in question here typically refers to environmental sounds like air conditioning noise, which lacks spatial directionality and has low energy levels. While this noise doesnt overwhelm normal speech, it can reduce clarity and intelligibility. Though not suited for environments with high noise levels, this method is adequate for managing everyday voice interactions.
Reverberation is a particularly troublesome factor in speech recognition, significantly impacting the systems performance. After the sound source stops producing sound, the sound waves continue to reflect and get absorbed within a room, creating a mix of sound waves for a brief periodthis is reverberation. Reverberation can severely affect speech signal processing, reducing the accuracy of direction finding and impairing functions like cross-correlation or beamforming.
More accurately termed self-noise rather than echo, this refers to a situation where the voice interaction device picks up its own sound output. Echoes are a more extended concept of reverberation, with longer delaysover 100 milliseconds, for instance, can make it seem like a sound is repeating itself, creating a distinct echo. In this context, echo cancellation is about eliminating the sounds emitted by the device itself, such as music or the voice of Alexa from an Echo speaker, to ensure that only the users voice is recognized.
Unlike sound source positioning, which is more complex, consumer-grade microphone arrays are designed primarily for direction finding. This process involves detecting the direction of the person speaking, which is essential for subsequent beamforming. Direction finding can be achieved using energy methods, spectrum estimation, or Time Difference of Arrival (TDOA) technology, and is typically implemented during the voice wake-up phase.
Beamforming is a common signal processing technique that involves manipulating the output signals of each microphone in the arraythrough weighting, delay, summation, and other methodsto create spatial directivity. This technique is used to suppress sound interference outside the main lobe, including human voices. For instance, when multiple people are speaking around an Echo device, beamforming allows it to focus on and recognize the voice of a single person.
This concept addresses the issue of pickup distance. If the captured signal is too weak, it can undermine speech recognition accuracy. Array gain involves enhancing the energy of the speech signal through array processing to ensure it is strong enough for reliable recognition.
This involves aligning the microphone array with speech recognition and semantic understanding models. Voice interaction is a complete signal chain that starts with the microphone array, and the model must be matched throughout the process. Effective microphone arrays designed for voice interaction typically use two sets of algorithms: one embedded in the hardware for real-time processing and another for cloud-based voice processing.
The trend toward smaller microphone arrays is gaining momentum. While many products currently use two microphones, this choice is often driven by industrial design considerations rather than cost. Microphone arrays can be made more compact and the method has already been proven effective. Its only a matter of time before it becomes widely adopted in consumer electronics.
The high cost of microphone arrays, whether they have 2, 4, or 6 microphones, remains a barrier to widespread adoption. Reducing these costs isnt simply about substituting cheaper components; it requires a complete redesign of the entire system, including the devices, chips, algorithms, and cloud infrastructure.
Its important to note that even a 2-microphone array is not particularly cheap. In fact, the cost difference between 2- and 4-microphone arrays is minimal, although this comparison doesnt account for the additional hardware required for echo cancellation. When considering the overall system, the cost differences between these configurations are not as significant as one might expect.
The cocktail party effect refers to the human ability to focus on a single conversation in a noisy environment, even when multiple people are speaking simultaneously. Current microphone array and speech recognition technologies are still primarily designed for single-speaker scenarios. Achieving reliable multi-speaker recognition is a challenging goal that remains on the horizon, but it represents a significant area for future development in voice technology.
Choosing the right microphone array for your product involves understanding the balance between hardware solutions, algorithmic optimization, and cloud recognition capabilities. While the hardware for microphone arrays is fairly advanced, the front-end algorithms and cloud recognition are still evolving. The specific algorithmic approaches vary by company, with some solutions allowing users to select the central microphone independently, which is beneficial for design flexibility.
Microphone arrays with more than two microphones are generally organized in linear or ring structures, while 2-microphone arrays typically come in Broadside or Endfire configurations. With these options available, how should manufacturers decide which solution is best? The answer lies in product positioning and the intended user scenarios.
If your product aims to be budget-friendly, theres often no need for a complex microphone array. A single microphone, coupled with the right algorithms, can still achieve noise suppression and echo cancellation, ensuring adequate voice recognition in near-field environments at a much lower cost.
If your product requires better noise reduction, a 2-microphone solution might be more suitable. This configuration simplifies design and can effectively reduce noise within a certain range during calls. However, it doesnt offer a significant improvement in voice recognition compared to a single microphone, and the cost is relatively high. Additionally, when factoring in the necessary echo cancellation features for voice interaction, costs can escalate further.
One major drawback of the 2-microphone solution is its limited ability to locate sound sources, making it more suitable for mobile phones and headphones where the focus is on call noise reduction. This can be simulated by a directional microphone, akin to the Endfire configuration of a 2-microphone array, where a single microphone is designed to mimic the functionality of two. However, this approach requires dual openings in the design, which can complicate the industrial design process.
If your product needs to handle more diverse user scenarios, a microphone array with four or more microphones is recommended. For example, Amazon Echo uses a configuration with more than six microphones to enhance voice recognition and noise handling. Robots generally perform well with four microphones, while speakers may benefit from six or more. In automotive applications, distributed arrays or other specialized structures may be the best choice.
While the tech world buzzes with advancements in multi-microphone arrays, the dual microphone solution has quietly become the workhorse of the smart home appliance control field. Based on extensive experience at Dusun IoT, its clear that dual microphones are not just a compromisetheyre the optimal choice for many smart home applications.
Since , the home appliance industry has sought to seamlessly integrate voice interaction technology into everyday products. The key requirements are straightforward yet challenging: enable direct voice control unaffected by the appliances own noise, achieve reliable far-field voice interaction, and ensure the solution is both mature and cost-effective. Far-field voice interaction, in particular, stands out as the critical factor.
Although many might think more microphones equal better performance, reality paints a different picture. While an eight-microphone array might offer higher voice recognition accuracy, it also introduces a host of challengeshigher costs, more complex structures, and greater difficulties in production and installation. Moreover, for appliances like air conditioners and TVs, which are typically placed against walls, the extra microphones add little practical value.
In contrast, the dual microphone array shines in these scenarios. With its straightforward design, lower cost, easier implementation, and lower power consumption, its no surprise that dual microphones are poised to become the standard in smart home products. As we look to the future, its clear that simplicity and efficiency will continue to drive innovation in the smart home space.
Linear Microphone Array
A linear microphone array has its elements aligned along a single straight line. There are two main types:
Uniform Linear Array (ULA): In a ULA, the spacing between adjacent microphones is consistent. This uniformity results in equal phase and sensitivity across the array, making it the simplest and most common array topology.
Nested Linear Array: This type is essentially a combination of multiple ULAs, stacked or nested together. Its a specialized form of a non-uniform array, providing flexibility while retaining some of the simplicity of the ULA. However, linear arrays are limited to capturing only the horizontal azimuth information of the sound signal.
Planar Microphone Array
A planar microphone array has its elements arranged across a flat surface, rather than a straight line. Depending on the geometric pattern, planar arrays can be classified into several subtypes, including:
Equilateral Triangle Array
T-Array
Uniform Circular Array
Uniform Square Array
Coaxial Circular Array
Circular or Rectangular Array
Planar arrays are advantageous because they can capture both the horizontal and vertical azimuth information of a sound signal, providing more comprehensive spatial information compared to linear arrays.
Stereo Microphone Array
Stereo microphone arrays expand into three-dimensional space, with their elements arranged in various 3D geometric shapes. Common configurations include:
Tetrahedron Array
Cube Array
Cuboid Array
Spherical Array
Stereo arrays offer the most complete spatial information. They can detect the horizontal and vertical azimuth, as well as the distance between the sound source and the reference point within the array, making them ideal for applications requiring precise 3D sound localization.
Beamforming is a technique used in microphone arrays to focus on sound from a specific direction. It works by:
Delaying and Phase Compensating: Adjusting the timing and phase of each microphones signal to align them from a chosen direction.
Amplitude Weighting: Assigning different weights to each microphones signal to enhance the desired direction while suppressing noise from other directions.
The company is the world’s best what is microphone array supplier. We are your one-stop shop for all needs. Our staff are highly-specialized and will help you find the product you need.
Key Beam Pattern Parameters:
3dB Bandwidth: The range of frequencies where the array maintains performance within 3 decibels of the maximum gain.
Distance to the First Zero Point: The distance to the first point where the beams gain drops to zero.
First Sidelobe Height: The height of the first secondary peak outside the main beam.
Sidelobe Attenuation Rate: How quickly the gain decreases from the main beam to the sidelobes.
The power pattern, which is the square of the amplitude, is used to measure overall performance. Beamforming microphones are ideal for applications needing precise sound directionality and noise suppression.
Different Costs
The cost of dual microphones is much lower than that of multi-microphones. In addition to the difference in the number of microphones that can be observed intuitively, the hardware circuits required to support multi-microphone channels and the additional computing power required to process more signal data all make the cost reflect a large difference.
Technical Differences
Although the technologies used by dual microphones and multi-microphones are similar, there are significant differences in the algorithm systems. Obviously, the more microphones there are, the easier it is to achieve better noise reduction and voice enhancement effects. Therefore, in order to achieve the same or similar effects, the dual microphone array technology is relatively more technically challenging. However, due to cost issues, the application of dual microphone arrays is more popular.
Voice Positioning and Recognition
If the technology optimization is good enough, in a home environment of 3 to 5 meters, the dual microphone array can achieve almost the same noise reduction and voice enhancement effects as the multi-microphone array. However, a disadvantage of dual microphones is that the sound source positioning can only be located within a range of 180°, while the circular microphone array can achieve positioning within a full angle range of 360°. Of course, this difference is not a problem for some devices that need to be placed against the wall, such as air conditioners and TVs. For products like robots placed in the center of the room, if you want them to locate the speaker, you can only use a multi-microphone solution.
Implementation
Finally, from the perspective of the final product form, the dual-microphone solution is simpler and easier to implement. The biggest problem with multi-microphone arrays is that, whether linear arrays or circular arrays, they have extremely strict requirements on the appearance and structural design of the product, because the microphones must be evenly distributed in space. Dual microphones obviously do not have to consider these factors.
For robots or AIoT products, the choice of microphone array depends on the application requirements:
Robots: Require precise sound source localization. Therefore, a circular multi-microphone array is typically used. This type of array provides 360° sound source positioning, essential for accurate localization and interaction.
AIoT Products: The choice can be more flexible. Dual microphones offer faster implementation and are simpler to integrate, making them advantageous for quicker deployment and varied design forms. Multi-microphone arrays can also be used if high sound source localization is needed, but they are generally more complex and costly.
Overall, while multi-microphone arrays are ideal for precise localization, dual microphones are often preferred for their ease of implementation and versatility in building AIoT ecosystems.
There are many types of microphones used in presentation situations: tabletop microphones, handheld microphones, ceiling microphones, and inconspicuous clip-on and headset microphones. The podium microphone on the conference table can certainly represent an elegant design feature - and incidentally also emphasizes the special importance of the person sitting behind it.
Among other things, a microphone array is a microphone device that functions like a regular microphone, but instead of having just one microphone to record sound input, it has multiple. A common microphone array is a 2-microphone array device, with one microphone placed on the left side of the device and the other on the right side. With one microphone on each side, sound can be recorded from the left and right sides of the room, resulting in a dynamic stereo recording that simulates surround sound.
Analog Array Microphone Unit
The most important feature that must be present in a microphone array device is microphone matching. All the microphones in the array must be similar and closely matched, and in some ways identical, in order for the array to get a good recording. Otherwise, you may run into the following problems:
one of the microphones in the microphone array has a much higher gain.
microphones are out of phase, so one microphone records before the other.
One microphone picks up sound from all directions, while the other picks up sound from only a single direction.
The above situation can lead to some terrible consequences. Therefore, the three aspects of microphone arrays that need to be considered for matching are directionality, sensitivity and phase.
Digital Networked DSP Conference Processor for Array Mic
Directionality refers to the direction in which it can pick up sound. Some microphones can only pick up sound from one direction, i.e. unidirectional microphones. Other microphones are built so that they can pick up sound from all directions, omnidirectional microphones. When building an array microphone, all microphones must have the same directionality.
Having one microphone pick up sound from only one direction and another microphone pick up sound from all directions would result in a disastrous, unbalanced recording. This is largely undesirable, except in some special cases where this would occur.
Sensitivity is another aspect of a microphone array that must be matched. Sensitivity is the gain that the microphone picks up when recording a signal. The sensitivity of a microphone array device must be closely matched, otherwise one microphone will be louder than the other, resulting in an unbalanced recording.
Usually the maximum sensitivity difference allowed for an array microphone is ±1.5 dB, and the microphone sensitivity difference for a microphone is no more than 3 dB.
Array Microphone Unit with channels selector
Phase is the last important aspect of a microphone array that must be matched. Phase is the reference degree line for when a microphone starts recording, meaning that it determines when all the microphones in the array start and stop recording. If the microphones have very different phases, they will record the signal at different times. This will cause the recording to be out of sync. Again, this is largely undesirable. It is desired that the microphones record the signals simultaneously so that there is no delay between the signals.
Just like sensitivity, the phase difference between microphones must have a maximum allowable tolerance. This difference is typically ±1.5 degrees to ensure that the signals are recorded simultaneously so that harmonious recording is achieved.
You can check our audio solutions here. Microphone arrays are becoming increasingly popular in the audio industry because they allow dynamic surround sound recording. VISSONIC SONICON Array Microphone Conference System has advanced voice processing technology, unique microphone head design, and high quality built-in speakers. This unit picks up and reproduces the human voice accurately, ensuring that every word is well understood. Please send us a message to get the information you want, and we will be glad to serve you.
For more information, please visit digital ptz.