Restricted Boltzmann Machines

Sigmoid / binary

P(hj=1|v)=sigmoid(bjh+iviwi,j)
P(vi=1|h)=sigmoid(biv+jhjwi,j)

Energy

E(v, h)=-ibivvi-jbjhhj-ijvihjwi,j

Probability of data v

P(v, h)=1Ze-E(v, h)
Z=vhe-E(v, h)
P(v)=1Zhe-E(v, h)=1Ze-F(v)

Free enery given data v

F(v)=-ln(he-E(v, h))=-ibivvi-jsoftplus(bjh+iviwi,j)
logF(v)wi,j= vihjdata-vihjmodel

Update weight, visible and hidden bias

wi,j=ϵ( vihjdata-vihjmodel)
biv=ϵ( vidata-vimodel)
bjh=ϵ( hjdata-hjmodel)

Softplus / categorical

P(vi0,i1=1 | h)=ebi0,i1v+jhjwi0,i1,ji1ebi0,i1v+jhjwi0,i1,j

Energy

E(v, h)=-i0,i1bi0,i1vvi0,i1-jbjhhj-i0,i1jvi0,i1hjwi0,i1,j
E(v, h)=-i0,i1bi0,i1vvi0,i1-j0,j1bj0,j1hhj0,j1-i0,i1j0,j1vi0,i1hj0,j1wi0,i1,j1,j0
E(v, h)=i(vi-biv)22σi2-jbjhhj-ijviσi2hjwi,j

Gaussian / real value

P(vi|h)=N(vi|biv+jhjwi,j,σi2)

Hybrid

P(hj=1 | x, y)=sigmoid(bjh+ixxixwix,jx,h+i0y,i1y,jyi0y,i1ywi0y,i1y,jy0,y1,h)

Energy

E(x, y, h)=-jbjhhj-ixbixxxix-i0y,i1ybi0y,i1yyyi0y,i1y-ix,jxixhjwix,jx,h-i0y,i1y,jyi0y,i1yhjwi0y,i1y,jy0,y1,h

Probability and free energy of generative part

P(x, y, h)=1Ze-E(x, y, h)
P(x, y)=1Zhe-E(x, y, h)=1Ze-F(x, y)
F(x, y)=-ixbixxxix-i0y,i1ybi0y,i1yyyi0y,i1y-jsoftplus(bjh+ixxixwix,jx,h+i0y,i1yyi0y,i1ywi0y,i1y,jy0,y1,h)

Probability and free energy of categorical target

P(yi0,i1=1 | h)=ebi0,i1y+jhjwi0,i1,jy0,y1,hi1ebi0,i1y+jhjwi0,i1,jy0,y1,h
P(yi0,i1=1 | x)=he-E(x, y, h)yhe-E(x, y, h)=ebi0,i1y+jsoftplus(bjh+ixxixwix,jx,h+wi0y,i1y,jy0,y1,h)yebi0,i1y+jsoftplus(bjh+ixxixwix,jx,h+wi0y,i1y,jy0,y1,h)=e-F(yi0,i1|x)i1e-F(yi0,i1|x)
F(yi0,i1|x)=-bi0,i1y-jsoftplus(bjh+ixxixwix,jx,h+wi0y,i1y,jy0,y1,h)