# 說說雞尾酒會問題(Cocktail Party Problem)和程式實現

“雞尾酒會問題”（cocktailparty problem）是在計算機語音識別領域的一個問題，當前語音識別技術已經可以以較高精度識別一個人所講的話，但是當說話的人數為兩人或者多人時，語音識別率就會極大的降低，這一難題被稱為雞尾酒會問題。

# 1 盲訊號分離

## 1.3 自然梯度法計算W矩陣

1)初始化W(0)為單位矩陣；

2)迴圈執行如下的步驟,直到W(n 1)與W(n)差異小於規定值(計算矩陣差異的方法可以人為規定)，有時候也人為規定迭代次數；

3)利用公式 ,(其中 )；

4)利用公式

# 2 程式實現

## 1.main.m

``````% fftproject.m
% By: Johan Bylund（原始作者）
% 修改註釋：Harvey
close all;
clear, clf, format compact;
% Makes a matrix out of wav files in project directory.
% The arreys will all be in the same length as the shortest one.
% Files longer than the shortest one will be truncated.
disp('Reading .wav files from project directory');
size(mic_1)
size(mic_2)
% The below operation makes them 1xN:
% 將矩陣設定成1*N的矩陣就一行多列的矩陣
mic_1=mic_1';
mic_2=mic_2';
% 規範長度，將長度統一在一起
if length(mic_1)>length(mic_2)
mic_1=mic_1(1:length(mic_2));
else
mic_2=mic_2(1:length(mic_1));
end
size(mic_1)
size(mic_2)
% 分別播放兩個原始的接受源訊號
disp('Playing the recording of the right microphone (closer to music)');
soundsc(mic_1)
disp('Playing the recording of the left microphone (closer to me)');
soundsc(mic_2)
subplot(2,1,1)
plot(mic_1), axis([0 16000 -0.12 0.12]);
title('Right microphone (closer to music)')
xlabel('Sampled points');
subplot(2,1,2)
plot(mic_2), axis([0 16000 -0.12 0.12]);
title('Left microphone (closer to me)')
xlabel('Sampled points');
% I also choose to look at the frequency spectra of the signal:
% 訊號頻譜經過快速傅立葉變換顯示
Fs=8000; % Sampling frequency
% Matrix with the frequency spectra from the two microphones:
fftsounds=[real(fft(mic_1,Fs));real(fft(mic_2,Fs))];
f=[1:Fs/2];
figure(2)
subplot(2,1,1)
plot(f,fftsounds(1,f)), axis([0 4000 -15 15]);
title('Frequency spectra of the right microphone')
xlabel('Frequency (Hz)');
subplot(2,1,2)
plot(f,fftsounds(2,f)), axis([0 4000 -15 15]);
title('Frequency spectra of the left microphone')
xlabel('Frequency (Hz)');
% At first I tried the algorithm in the time domain, and it didn't
% manage to separate the two sources very well.
% After that I used the frequency spectra of the two microphone signals
% in the algorithm and it worked out much better.
% 演算法開始了……
% 按照原作者說的話，使用頻域來運算的效果會比時域要好（請看上面的英文註釋）
sounds=[real(fft(mic_1));real(fft(mic_2))];
% N="number of microphones"
% P="number of points"
[N,P]=size(sounds) % P=?, N=2, in this case.
permute=randperm(P); % Generate a permutation vector.
s=sounds(:,permute); % Time-scrambled inputs for stationarity.
x=s;
mixes=sounds;
% Spheres the data (normalisation).
mx=mean(mixes');
c=cov(mixes');
x=x-mx'*ones(1,P); % Subtract means from mixes.
wz=2*inv(sqrtm(c)); % Get decorrelating matrix.
x=wz*x; % Decorrelate mixes so cov(x')=4*eye(N);
w=pi^2*rand(N); % Initialise unmixing matrix.
M=size(w,2); % M=N usually
sweep=0; oldw=w; olddelta=ones(1,N*N);
Id=eye(M);
% L="learning rate, B="points per block"
% Both are used in sep.m, which goes through the mixed signals
% in batch blocks of size B, adjusting weights, w, at the end
% of each block.
%L=0.01; B=30; sep
%L=0.001; B=30; sep % Annealing will improve solution.
%L=0.0001; B=30; sep % ...and so on
%for multiple sweeps:
L=0.0001; B=30;
for I=1:100
sep; % For details see sep.m
end;
uu=w*wz*mixes; % make unmixed sources
% Plot the two separated vectors in the frequency domain.
figure(3)
subplot(2,1,1)
plot(f,uu(1,f)), axis([0 4000 -22 22]);
title('Frequency spectra of one of the separated signals')
xlabel('Frequency (Hz)');
subplot(2,1,2)
plot(f,uu(2,f)), axis([0 4000 -22 22]);
title('Frequency spectra of the other separated signal')
xlabel('Frequency (Hz)');
% Transform signals back to time domain.
uu(2,:)=real(ifft(uu(2,:)));
uu(1,:)=real(ifft(uu(1,:)));
disp('Playing the first of the separated vectors');
soundsc(uu(1,:))
% Plot the vector that is played above.
figure(4);
subplot(2,1,1)
plot(uu(1,:)), axis([0 16000 -0.12 0.12]);
title('Plot of one of the separated vectors (time domain)')
disp('Playing the second of the separated vectors');
soundsc(uu(2,:))
% Plot the vector that is played above.
subplot(2,1,2)
plot(uu(2,:)), axis([0 16000 -0.12 0.12]);
title('Plot of the other separated vector (time domain)')
``````

## 2. sep.m

``````% SEP goes once through the scrambled mixed speech signals, x
% (which is of length P), in batch blocks of size B, adjusting weights,
% w, at the end of each block.
%
% I suggest a learning rate L, of 0.01 at least for 2->2 separation.
% But this will be unstable for higher dimensional data. Test it.
% Use smaller values. After convergence at a value for L, lower
% L and it will fine tune the solution.
%
% NOTE: this rule is the rule in our NC paper, but multiplied by w^T*w,
% as proposed by Amari, Cichocki & Yang at NIPS '95. This `natural
% gradient' method speeds convergence and avoids the matrix inverse in the
% learning rule.
sweep=sweep 1; t=1;
noblocks=fix(P/B);
BI=B*Id;
for t=t:B:t-1 noblocks*B,
u=w*x(:,t:t B-1);
w=w L*(BI (1-2*(1./(1 exp(-u))))*u')*w;
end;
sepout``````

## 3. sepout.m

``````% SEPOUT - put whatever textual output report you want here.
%  Called after each pass through the data.
%  If your data is real, not artificially mixed, you will need
%  to comment out line 4, since you have no idea what the matrix 'a' is.
%
[change,olddelta,angle]=wchange(oldw,w,olddelta);
oldw=w;
fprintf('****sweep=%d, change=%.4f angle=%.1f deg., [N%d,M%d,P%d,B%d,L%.5f] \n',...
sweep,change,180*angle/pi,N,M,P,B,L);
% w*wz*a     %should be a permutation matrix for artif. mixed data``````

[1]維基百科. 獨立成分分析.
http://zh.wikipedia.org/wiki/%E7%8B%AC%E7%AB%8B%E6%88%90%E5%88%86%E5%88%86%E6%9E%90
,2014-1-14.

[2] 我愛公開課. Coursera公開課筆記: 斯坦福大學機器學習第一課“引言(Introduction)”.