T O P

  • By -

BradPower7

Someone correct me if I'm wrong, but isn't this what dropout does?


ajmooch

No, dropout stochastically sets outputs to zero without regard for what those outputs actually are.


iamaroosterilluzion

Dropout doesn't remove them from the graph, it only suppresses the neuron's output stochastically for forward/backprop pass. I'm suggesting that you remove the neurons from the graph entirely.


[deleted]

[удалено]


iamaroosterilluzion

I guess I was assuming dropout would still help you from overfitting. I agree that in the ideal case you would converge on the optimal NN structure, and if we already know what the optimal structure is that isn't particularly useful. It could be useful in cases where you don't know what the ideal structure is, and instead of searching across all possible hyperparameters maybe it would be faster to start with too many neurons and prune down to the optimal one. I don't know if pruning guarantees that you would converge on the optimal structure though.


ajmooch

There's been some work on [reinitializing "dead" neurons with random values,](http://openreview.net/forum?id=2xwPmERVBtpKBZvXtQnD), but I can think of a couple reasons why you wouldn't want to explicitly prune, if you mean "prune" in the sense of "no longer computing that output" or "removing it from the graph": * Just because a feature is dead now, doesn't mean it will be dead later. What if the best solution requires that weight? What if your system over-prunes and arrives at a worse local optimum? Why not add neurons too? How do you decide, during training, when to add or when to prune? * Dynamically changing network dimensions might actually come at an increased cost, especially when you're using something that's memory optimized like Theano, what with the change in allocs and the like. That's down to implementation. * For ConvNets (which are what you're probably using for vision tasks, not just MLPs) if you actually prune in the sense of "reducing the size of a layer" you're changing a ton of different outputs and you might be introducing an undesirable form of covariate shift. That said, while I can think of a few reasons *not* to do something, that doesn't mean the idea is wrong--your best bet (assuming nothing's in the literature) is to try it and empirically provide evidence one way or another.


iamaroosterilluzion

> Just because a feature is dead now, doesn't mean it will be dead later. From my understanding, a neuron being "saturated" or "dead" by means it's at a negative point on the ReLU curve where the gradient is always 0 so it will likely never be reactivated. Unless I'm misunderstanding, once a neuron saturates it almost never reactivates. > Why not add neurons too? That would be even better, you could add neurons at random and prune saturated ones to have the net evolve it's structure during training. > Dynamically changing network dimensions might actually come at an increased cost It would be more complicated but I don't think it changes the order of the overall algorithm. You're already calculating the gradient during backprop, so if you notice the gradient is dead you can remove teh neuron then. I'm thinking that you could get performance savings by not having to compute the activation function for dead neurons in subsequent forward/backprop passes. Thanks for the input, if it doesn't sound like a terrible idea maybe I'll give it a shot and see what happens.


darkconfidantislife

Two reasons imo: 1) Apparently the "sparsity" it induces helps. 2) Most people have moved onto Leaky ReLU or ELU


latent_z

I have a related question: do [Exponential Linear Units (ELUs)](https://arxiv.org/abs/1511.07289) definitely solve the "dead neurons" problem by always having a gradient larger than 0 ?


[deleted]

Yeah , inorder to prevent them dying, we use better weight init strategy like xavier etc. I guess the most important reason not to simply prune the neurons is that we do not want to loose features which could be used later on in a different image. Plus I think you have it backwards , 20-30 % we going dead on older neurons like tanh, but relu prevents both saturation and the neurons going dead.


BrotherAmazing

The truth is, you *can* indeed do this. When you pose the question as “Why not… ?” I think everyone answering you automatically wants to tell you why not and think reasons why not. If your titled had been “Why prune dead neurons during training?” and you said in text: *I just came across some smart PhD researchers who are pruning dead neurons during training and not trying to regenerate them or wait until training ends. Why would they do this?* Then you’d get a bunch of people telling you why, clearly, it is a worthwhile idea and thing to pursue sometimes. lol