Fastica

FastICA este un popular algoritm independent de analiză a componentelor dezvoltat de Aapo Hyvärinen de la Universitatea de Tehnologie din Helsinki . Algoritmul se bazează pe un punct fix, schemă iterativă pentru a maximiza non-gaussianitatea unei măsuri statistice de independență. Algoritmul poate fi, de asemenea, derivat din iterația aproximativă a lui Newton.

Algoritm

FastICA pentru o componentă

Algoritmul iterativ găsește direcția pentru vectorul de greutate $\mathbf {w}$ ${\ displaystyle \ mathbf {w}}$ $\ mathbf {w}$ maximizând non-gaussianitatea proiecției $\mathbf {w} ^{T}\mathbf {x}$ ${\ displaystyle \ mathbf {w} ^ {T} \ mathbf {x}}$ ${\ displaystyle \ mathbf {w} ^ {T} \ mathbf {x}}$ pentru $\mathbf {x}$ ${\ displaystyle \ mathbf {x}}$ $\ mathbf {x}$ . Functia $g(\cdot )$ ${\ displaystyle g (\ cdot)}$ ${\ displaystyle g (\ cdot)}$ este derivatul unei funcții non-pătrate.

Alegeți un purtător de greutate de pornire $\mathbf {w}$ ${\ displaystyle \ mathbf {w}}$ $\ mathbf {w}$
Este $\mathbf {w} ^{+}\leftarrow E\left\{\mathbf {x} g(\mathbf {w} ^{T}\mathbf {x} )\right\}-E\left\{g'(\mathbf {w} ^{T}\mathbf {x} )\right\}\mathbf {w}$ ${\ displaystyle \ mathbf {w} ^ {+} \ leftarrow E \ left \ {\ mathbf {x} g (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} - E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} \ mathbf {w}}$ ${\ displaystyle \ mathbf {w} ^ {+} \ leftarrow E \ left \ {\ mathbf {x} g (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} - E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} \ mathbf {w}}$
Este $\mathbf {w} \leftarrow \mathbf {w} ^{+}/\|\mathbf {w} ^{+}\|$ ${\ displaystyle \ mathbf {w} \ leftarrow \ mathbf {w} ^ {+} / \ | \ mathbf {w} ^ {+} \ |}$ ${\ displaystyle \ mathbf {w} \ leftarrow \ mathbf {w} ^ {+} / \ | \ mathbf {w} ^ {+} \ |}$
dacă nu converge, reveniți la pasul 2

În acest caz, convergența înseamnă apariția situației prin care valorile $\mathbf {w}$ ${\ displaystyle \ mathbf {w}}$ $\ mathbf {w}$ referindu-se la 2 iterații succesive îndreptate în aceeași direcție.

Câteva exemple de funcții $g(\cdot )$ ${\ displaystyle g (\ cdot)}$ ${\ displaystyle g (\ cdot)}$ Sunt:

$g(u)=\tanh(au)$ ${\ displaystyle g (u) = \ tanh (au)}$ ${\ displaystyle g (u) = \ tanh (au)}$
$g(u)=u\ exp\left({-u^{2} \over 2}\right)$ ${\ displaystyle g (u) = u \ exp \ left ({- u ^ {2} \ over 2} \ right)}$ ${\ displaystyle g (u) = u \ exp \ left ({- u ^ {2} \ over 2} \ right)}$

Maximele relative la aproximarea negentropiei lui $\mathbf {w} ^{T}\mathbf {x}$ ${\ displaystyle \ mathbf {w} ^ {T} \ mathbf {x}}$ ${\ displaystyle \ mathbf {w} ^ {T} \ mathbf {x}}$ sunt obținute în corespondență cu unele rezultate ale optimizării funcției $E\left\{G(\mathbf {w} ^{T}\mathbf {x} )\right\}$ ${\ displaystyle E \ left \ {G (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \}}$ ${\ displaystyle E \ left \ {G (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \}}$ ; conform condițiilor Karush-Kuhn-Tucker , optimale ale funcției $E\left\{G(\mathbf {w} ^{T}\mathbf {x} )\right\}$ ${\ displaystyle E \ left \ {G (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \}}$ ${\ displaystyle E \ left \ {G (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \}}$ cu constrângerea $E\left\{(\mathbf {w} ^{T}\mathbf {x} )^{2}\right\}=||\mathbf {w} ^{2}||=1$ ${\ displaystyle E \ left \ {(\ mathbf {w} ^ {T} \ mathbf {x}) ^ {2} \ right \} = || \ mathbf {w} ^ {2} || = 1}$ ${\ displaystyle E \ left \ {(\ mathbf {w} ^ {T} \ mathbf {x}) ^ {2} \ right \} = || \ mathbf {w} ^ {2} || = 1}$ sunt obținute în punctele în care apare: $E\left\{\mathbf {x} g(\mathbf {w} ^{T}\mathbf {x} )\right\}-\beta \mathbf {w} =0$ ${\ displaystyle E \ left \ {\ mathbf {x} g (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} - \ beta \ mathbf {w} = 0}$ ${\ displaystyle E \ left \ {\ mathbf {x} g (\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} - \ beta \ mathbf {w} = 0}$

Rezolvând ecuația cu metoda lui Newton și definind partea stângă a ecuației cu F , obținem matricea iacobiană JF (w) ca: $JF(\mathbf {w} )=E\left\{\mathbf {x} \mathbf {x} ^{T}g'(\mathbf {w} ^{T}\mathbf {x} )\right\}-\beta \mathbf {I}$ ${\ displaystyle JF (\ mathbf {w}) = E \ left \ {\ mathbf {x} \ mathbf {x} ^ {T} g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ dreapta \} - \ beta \ mathbf {I}}$ ${\ displaystyle JF (\ mathbf {w}) = E \ left \ {\ mathbf {x} \ mathbf {x} ^ {T} g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ dreapta \} - \ beta \ mathbf {I}}$ Pentru a simplifica inversarea acestei matrice, este util să aproximăm primul termen; dacă datele sunt centrate (valoare medie nulă) și albite, acestea pot fi aproximate după cum urmează: $E\left\{\mathbf {x} \mathbf {x} ^{T}g'(\mathbf {w} ^{T}\mathbf {x} )\right\}=E\left\{\mathbf {x} \mathbf {x} ^{T})\right\}E\left\{g'(\mathbf {w} ^{T}\mathbf {x} )\right\}=E\left\{g'(\mathbf {w} ^{T}\mathbf {x} )\right\}\mathbf {I}$ ${\ displaystyle E \ left \ {\ mathbf {x} \ mathbf {x} ^ {T} g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} = E \ left \ { \ mathbf {x} \ mathbf {x} ^ {T}) \ right \} E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} = E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} \ mathbf {I}}$ ${\ displaystyle E \ left \ {\ mathbf {x} \ mathbf {x} ^ {T} g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} = E \ left \ { \ mathbf {x} \ mathbf {x} ^ {T}) \ right \} E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} = E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} \ mathbf {I}}$

Prin aplicarea acesteia, matricea iacobiană devine o matrice diagonală și, prin urmare, poate fi ușor inversată. Se obține astfel o iterație Newton aproximativă:

$\mathbf {w} ^{+}=\mathbf {w} -{\frac {\left[E\left\{\mathbf {x} g(\mathbf {w} ^{T}\mathbf {x} )\right\}-\beta \mathbf {w} \right]}{\left[E\left\{g'(\mathbf {w} ^{T}\mathbf {x} )\right\}-\beta \right]}}$ ${\ displaystyle \ mathbf {w} ^ {+} = \ mathbf {w} - {\ frac {\ left [E \ left \ {\ mathbf {x} g (\ mathbf {w} ^ {T} \ mathbf { x}) \ right \} - \ beta \ mathbf {w} \ right]} {\ left [E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} - \ beta \ right]}}}$ ${\ displaystyle \ mathbf {w} ^ {+} = \ mathbf {w} - {\ frac {\ left [E \ left \ {\ mathbf {x} g (\ mathbf {w} ^ {T} \ mathbf { x}) \ right \} - \ beta \ mathbf {w} \ right]} {\ left [E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \} - \ beta \ right]}}}$

Algoritmul poate fi simplificat în continuare prin înmulțirea ambelor părți cu $\beta -E\left\{g'(\mathbf {w} ^{T}\mathbf {x} )\right\}$ ${\ displaystyle \ beta -E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \}}$ ${\ displaystyle \ beta -E \ left \ {g '(\ mathbf {w} ^ {T} \ mathbf {x}) \ right \}}$ , dând naștere adevăratului algoritm FastICA.

(Algoritmul folosește o aproximare a negentropiei , care folosește kurtosis ).

FastICA pentru componente multiple

Algoritmul descris pentru o componentă permite să obțină doar una dintre componentele independente. Pentru a putea estima mai mult este necesar să se aplice algoritmul unui set de n unități, caracterizat prin vectori de greutate $\mathbf {w} _{1},...,\mathbf {w} _{n}$ ${\ displaystyle \ mathbf {w} _ {1}, ..., \ mathbf {w} _ {n}}$ ${\ displaystyle \ mathbf {w} _ {1}, ..., \ mathbf {w} _ {n}}$ .

Aplicarea algoritmului este aceeași, dar este necesară prevenirea convergenței mai multor neuroni către același maxim, adică este necesară necorelarea ieșirilor de rețea $\mathbf {w} _{1}^{T}\mathbf {x} ,...,\mathbf {w} _{n}^{T}\mathbf {x}$ ${\ displaystyle \ mathbf {w} _ {1} ^ {T} \ mathbf {x}, ..., \ mathbf {w} _ {n} ^ {T} \ mathbf {x}}$ ${\ displaystyle \ mathbf {w} _ {1} ^ {T} \ mathbf {x}, ..., \ mathbf {w} _ {n} ^ {T} \ mathbf {x}}$ la sfârșitul fiecărei iterații. Pentru a face acest lucru, există cel puțin trei metode în literatură.

Caracteristicile algoritmului

convergența este cubică presupunând un model ICA, făcând astfel algoritmul mai rapid decât metodele clasice bazate pe coborârea în gradient, care se caracterizează prin convergență liniară.
algoritmul se bucură de o ușurință mare de utilizare, de asemenea, deoarece nu există prea mulți parametri de setat.
FastICA poate găsi componentele independente ale aproape tuturor distribuțiilor gaussiene prin intermediul oricărei funcții neliniare g , spre deosebire de alte tehnici care necesită informații a priori despre distribuții.
Componentele independente pot fi estimate una câte una, făcând din acest instrument un instrument important pentru analiza datelor exploratorii și reducerea sarcinii de calcul.
Algoritmul împărtășește caracteristicile dezirabile cu abordările neuronale: este paralel, distribuit, flexibil din punct de vedere al calculului și nu este foarte solicitant în ceea ce privește memoria utilizată.

Elemente conexe

analiza componentelor independente

linkuri externe

Pachet FastICA pentru Matlab , pe cis.hut.fi.
pachetul fastICA în limbajul de programare R.