Machine learning Multiplicative weight update method




1 machine learning

1.1 winnow algorithm
1.2 hedge algorithm

1.2.1 analysis


1.3 adaboost algorithm





machine learning

in machine learning, littlestone , warmuth generalized winnow algorithm weighted majority algorithm. later, freund , schapire generalized in form of hedge algorithm. adaboost algorithm formulated yoav freund , robert schapire employed multiplicative weight update method.


winnow algorithm

based on current knowledge in algorithms, multiplicative weight update method first used in littlestone s winnow algorithm. used in machine learning solve linear program.


given



m


{\displaystyle m}

labeled examples




(

a

1


,

l

1


)

,



,

(

a

m


,

l

m


)



{\displaystyle \left(a_{1},l_{1}\right),{\text{…}},\left(a_{m},l_{m}\right)}






a

j





r


n




{\displaystyle a_{j}\in \mathbb {r} ^{n}}

feature vectors, ,




l

j




{

1
,
1
}




{\displaystyle l_{j}\in \left\{-1,1\right\}\quad }

labels.


the aim find non-negative weights such examples, sign of weighted combination of features matches labels. is, require




l

j



a

j


x

0


{\displaystyle l_{j}a_{j}x\geq 0}





j


{\displaystyle j}

. without loss of generality, assume total weight 1 form distribution. thus, notational convenience, redefine




a

j




{\displaystyle a_{j}}






l

j



a

j




{\displaystyle l_{j}a_{j}}

, problem reduces finding solution following lp:








j
=
1
,
2
,



,
m
:

a

j


x

0


{\displaystyle \forall j=1,2,{\text{…}},m:a_{j}x\geq 0}

,




1

x
=
1


{\displaystyle 1*x=1}

,





i
:

x

i



0


{\displaystyle \forall i:x_{i}\geq 0}

.

this general form of lp.


hedge algorithm

the hedge algorithm similar weighted majority algorithm. however, exponential update rules different. used solve problem of binary allocation in need allocate different portion of resources n different options. loss every option available @ end of every iteration. goal reduce total loss suffered particular allocation. allocation following iteration revised, based on total loss suffered in current iteration using multiplicative update.


analysis

assume



ϵ

1


{\displaystyle \epsilon \leq 1}

,



t

[
t
]


{\displaystyle t\in [t]}

,




p

t




{\displaystyle p^{t}}

picked hedge. experts



i


{\displaystyle i}

,










t

t



p

t



m

t






t

t



m

t


+



ln

(
n
)

ϵ


+
ϵ
t


{\displaystyle \sum _{t\leq t}p^{t}m^{t}\leq \sum _{t\leq t}m^{t}+{\frac {\ln(n)}{\epsilon }}+\epsilon t}



initialization: fix



η

12


{\displaystyle \eta \leq 12}

. each expert, associate weight




w

i


1




{\displaystyle w_{i}^{1}}

≔1 t=1,2,…,t:



1. pick distribution




p

j


t


=



w

1


t



Φ
t





{\displaystyle p_{j}^{t}={\frac {w_{1}^{t}}{\phi t}}}





Φ
t
=



i



w

i


t




{\displaystyle \phi t=\sum _{i}w_{i}^{t}}

.
2. observe cost of decision




m

t




{\displaystyle m^{t}}

.
3. set





w

i


t
+
1


=

w

i


t





{\displaystyle w_{i}^{t+1}=w_{i}^{t}*}

exp(




ϵ


m

i


t




{\displaystyle -\epsilon *m_{i}^{t}}

).

adaboost algorithm

this algorithm maintains set of weights




w

t




{\displaystyle w^{t}}

on training examples. on every iteration



t


{\displaystyle t}

, distribution




p

t




{\displaystyle p^{t}}

computed normalizing these weights. distribution fed weak learner weaklearn generates hypothesis




h

t




{\displaystyle h_{t}}

(hopefully) has small error respect distribution. using new hypothesis




h

t




{\displaystyle h_{t}}

, adaboost generates next weight vector




w

t
+
1




{\displaystyle w^{t+1}}

. process repeats. after t such iterations, final hypothesis




h

f




{\displaystyle h_{f}}

output. hypothesis




h

f




{\displaystyle h_{f}}

combines outputs of t weak hypotheses using weighted majority vote.



input:
sequence of



n


{\displaystyle n}

labeled examples (




x

1




{\displaystyle x_{1}}

,




y

1




{\displaystyle y_{1}}

),…,(




x

n




{\displaystyle x_{n}}

,




y

n




{\displaystyle y_{n}}

)
distribution



d


{\displaystyle d}

on



n


{\displaystyle n}

examples
weak learning algorithm weaklearn
integer



t


{\displaystyle t}

specifying number of iterations
initialize weight vector:




w

i


1


=
d
(
i
)


{\displaystyle w_{i}^{1}=d(i)}





i
=
1


{\displaystyle i=1}

,...,



n


{\displaystyle n}

.
do



t
=
1


{\displaystyle t=1}

,...,



n


{\displaystyle n}


1. set




p

t


=



w

t






i
=
1


n



w

i


t







{\displaystyle p^{t}={\frac {w^{t}}{\sum _{i=1}^{n}w_{i}^{t}}}}

.
2. call weaklearn, providing distribution




p

t




{\displaystyle p^{t}}

; hypothesis




h

t


:
x



{\displaystyle h_{t}:x\rightarrow }

[0,1].
3. calculate error of




h

t


:

ϵ

t


=



i
=
1


n



p

i


t




{\displaystyle h_{t}:\epsilon _{t}=\sum _{i=1}^{n}p_{i}^{t}}

|




h

t


(

x

i


)


{\displaystyle h_{t}(x_{i})}

.
4. set




β

t


=



ϵ

t



1


ϵ

t







{\displaystyle \beta _{t}={\frac {\epsilon _{t}}{1-\epsilon _{t}}}}

.
5. set new weight vector




w

i


t
+
1


=

w

i


t



β

t


1


|


h

t


(

x

i


)


y

i



|





{\displaystyle w_{i}^{t+1}=w_{i}^{t}\beta _{t}^{1-|h_{t}(x_{i})-y_{i}|}}

.

output hypothesis:





f
(
x
)
=


{



1



if




t
=
1


t


log

(
1

/


β

t


)

h

t


(
x
)



1
2





t
=
1


t


log

(
1

/


β

t


)



q

1


w






0



otherwise









{\displaystyle f(x)={\begin{cases}1&{\text{if}}\sum _{t=1}^{t}\log(1/\beta _{t})h_{t}(x)\geq {\frac {1}{2}}\sum _{t=1}^{t}\log(1/\beta _{t}){\frac {q_{1}}{w}}\\0&{\text{otherwise}}\end{cases}}}








Comments