VoiceKey

Unique Randomness Properties of the Human Voice


Table of Contents

  1. Introduction
  2. Physiological Basis of Voice Production
  3. Sources of Randomness in the Human Voice
  4. Statistical Characteristics of Human Voice
  5. Contrast with AI-Generated Voices
  6. Implications for Voice Authentication
  7. Code Examples
  8. Conclusion
  9. References
  10. Contact Information
  11. Acknowledgments

Introduction

The human voice is a complex acoustic signal generated by intricate physiological processes. It exhibits unique randomness properties arising from a combination of physiological, biochemical, and environmental factors. These properties contribute to the individuality of each person’s voice and present significant challenges for replication by artificial intelligence (AI) systems.

This document explores the sources of randomness in the human voice, contrasts them with AI-generated voices, and discusses their implications for secure voice authentication systems like VoiceKey.


Physiological Basis of Voice Production

Anatomy of the Vocal Apparatus

Phonation Process


Sources of Randomness in the Human Voice

Micro-Level Physiological Variations

Vocal Micro-Tremors

Quantum Noise in Biological Systems

True Randomness

Non-Linear Dynamics and Chaos Theory

Chaos in Voice Production


Statistical Characteristics of Human Voice

Entropy and Complexity Measures

Calculating Entropy

Fractal Analysis

Applications


Contrast with AI-Generated Voices

Deterministic Nature of AI Voices

Challenges in Replicating Human Randomness


Implications for Voice Authentication


Code Examples

Analyzing Micro-Tremors

The following Python code demonstrates how to analyze micro-tremors in a voice signal using the Hilbert-Huang Transform (HHT):

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import hilbert
from PyEMD import EMD

# Load the voice signal
voice_signal, sampling_rate = librosa.load('voice_sample.wav', sr=None)

# Apply Empirical Mode Decomposition (EMD)
emd = EMD()
IMFs = emd(voice_signal)

# Select the first Intrinsic Mode Function (IMF) corresponding to micro-tremors
micro_tremor_imf = IMFs[0]

# Compute the analytic signal
analytic_signal = hilbert(micro_tremor_imf)
amplitude_envelope = np.abs(analytic_signal)
instantaneous_phase = np.unwrap(np.angle(analytic_signal))
instantaneous_frequency = np.diff(instantaneous_phase) * (sampling_rate / (2.0 * np.pi))

# Plot the amplitude envelope and instantaneous frequency
plt.figure(figsize=(12, 6))

plt.subplot(2, 1, 1)
plt.plot(amplitude_envelope)
plt.title('Amplitude Envelope of Micro-Tremors')
plt.xlabel('Samples')
plt.ylabel('Amplitude')

plt.subplot(2, 1, 2)
plt.plot(instantaneous_frequency)
plt.title('Instantaneous Frequency of Micro-Tremors')
plt.xlabel('Samples')
plt.ylabel('Frequency (Hz)')

plt.tight_layout()
plt.show()

Explanation

Measuring Entropy in Voice Signals

This example calculates the sample entropy of a voice signal to assess its complexity:

import numpy as np
import nolds
import librosa

# Load the voice signal
voice_signal, sampling_rate = librosa.load('voice_sample.wav', sr=None)

# Ensure the signal is one-dimensional
voice_signal = voice_signal.flatten()

# Calculate Sample Entropy
sampen = nolds.sampen(voice_signal)

print(f'Sample Entropy of the voice signal: {sampen:.4f}')

Explanation


Conclusion

The human voice’s unique randomness properties stem from complex physiological and quantum-level processes that introduce true randomness and chaotic dynamics. These characteristics are challenging for AI systems to replicate authentically. By understanding and leveraging these properties, voice authentication systems like VoiceKey can effectively distinguish between human and AI-generated voices, enhancing security and trustworthiness.


References

  1. Titze, I. R. (1994). Principles of Voice Production. Prentice Hall.
  2. Herzel, H., Berry, D., Titze, I. R., & Saleh, M. (1994). Analysis of vocal disorders with methods from nonlinear dynamics. Journal of Speech, Language, and Hearing Research, 37(5), 1008-1019.
  3. Kantz, H., & Schreiber, T. (2004). Nonlinear Time Series Analysis. Cambridge University Press.
  4. Burnett, T. A., & Krishnamurthy, A. K. (1991). Production of subharmonics and chaos in the vocal folds. IEEE Transactions on Biomedical Engineering, 38(4), 357-365.
  5. Strogatz, S. H. (2015). Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press.
  6. Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. The Journal of the Acoustical Society of America, 101(1), 466-481.
  7. Goldberger, A. L., Amaral, L. A., Hausdorff, J. M., Ivanov, P. C., Peng, C. K., & Stanley, H. E. (2002). Fractal dynamics in physiology: Alterations with disease and aging. Proceedings of the National Academy of Sciences, 99(suppl 1), 2466-2472.
  8. Feng, Y., & Narayanan, S. (2013). Analysis of vocal disorders using nonlinear dynamic features. IEEE Transactions on Biomedical Engineering, 60(1), 186-192.
  9. Ishima, T., & Shinohara, K. (2012). Voice analysis and detection of mental fatigue. Journal of Voice, 26(4), 454-461.
  10. Kobayashi, M., & Musha, T. (1982). 1/f fluctuation of heartbeat period. IEEE Transactions on Biomedical Engineering, (6), 456-457.

Contact Information

AI Integrity Alliance


Acknowledgments

We extend our gratitude to the researchers and contributors whose work has laid the foundation for understanding the unique randomness properties of the human voice. Their pioneering studies have been instrumental in advancing secure voice authentication technologies.


Note: This document is part of the VoiceKey project by the AI Integrity Alliance. It provides a detailed examination of the unique randomness properties inherent in the human voice, contributing to the development of advanced voice authentication systems.