# What Size Net Gives Valid Generalization?

@article{Baum1989WhatSN, title={What Size Net Gives Valid Generalization?}, author={Eric B. Baum and David Haussler}, journal={Neural Computation}, year={1989}, volume={1}, pages={151-160} }

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. Assume 0 < ∊ 1/8. We show that if m O(W/∊ log N/∊) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a… Expand

#### Topics from this paper

#### 1,706 Citations

When Are k-Nearest Neighbor and Back Propagation Accurate for Feasible Sized Sets of Examples?

- Computer Science
- EURASIP Workshop
- 1990

Experimental tests of generalization versus number of examples are presented for random target networks and examples drawn from a uniform distribution, and seem to indicate that networks with two hidden layers have Vapnik-Chervonenkis dimension roughly equal to their total number of weights. Expand

What size network is good for generalization of a specific task of interest?

- Mathematics, Computer Science
- Neural Networks
- 1994

This work proposes a way to calculate the exact value of the probability that a network trained on a randomly chosen set of examples corresponding to a specific task of interest performs it with no more than a given number of errors, and shows that for some tasks increasing network size leads to worse generalization. Expand

Generalization and Selection of Examples in Feedforward Neural Networks

- Mathematics, Computer Science
- Neural Computation
- 2000

It is shown that for the parity problem, one of the most used problems for testing learning algorithms, only the use of the whole set of examples ensures global learning in a depth two architecture and this difficulty can be overcome by considering a tree-structured network. Expand

Generalization and PAC learning: some new results for the class of generalized single-layer networks

- Mathematics, Medicine
- IEEE Trans. Neural Networks
- 1995

It is shown that the use of self-structuring techniques for GSLNs may reduce the number of training examples sufficient to guarantee good generalization performance, and an explanation for the fact that GSLNs can require a relatively large number of weights is provided. Expand

Limiting Network Size within Finite Bounds for Optimization

- Mathematics, Computer Science
- ArXiv
- 2019

A theoretical justification on required attribute size and its corresponding hidden layer dimension for a given sample set that gives an optimal binary classification results with minimum training complexity in a single layered feed forward network framework is brought out. Expand

Polynomial time algorithms for learning neural nets

- Mathematics, Computer Science
- COLT '90
- 1990

A more powerful query learning algorithm is described which is able to very rapidly learn depth 2 nets having n inputs connected to 4 threshold units connected to one or more outputs, or to learn the intersection of k half spaces in n unknowns. Expand

Improving Generalization with Active Learning January

- 2002

Active learning di ers from passive learning from examples in that the learning algorithm assumes at least some control over what part of the input domain it receives information about In some… Expand

Agnostic PAC Learning of Functions on Analog Neural Nets

- Computer Science, Mathematics
- Neural Computation
- 1995

The computation time of LEARN is exponential in the number of weights of the considered network architecture, and therefore only of interest for neural nets of small size. Expand

What Size Neural Network Gives Optimal Generalization? Convergence Properties of Backpropagation

- Mathematics
- 1998

One of the most important aspects of any machine learning paradigm is how i t scales according to problem size and complexity. Using a task with known optimal train ing error, and a pre-specified… Expand

On the Generalization Problem

- Mathematics
- 2001

The generalization problem considered in this paper assumes that a limited amount of input and output data from a system is available, and that from this information an estimate of the output… Expand

#### References

SHOWING 1-10 OF 53 REFERENCES

On the capabilities of multilayer perceptrons

- Computer Science, Mathematics
- J. Complex.
- 1988

A construction is presented here for implementing an arbitrary dichotomy with one hidden layer containing [ N d ] units, for any set of N points in general position in d dimensions, which is in fact the smallest such net as dichotomies which cannot be implemented by any net with fewer units. Expand

Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

- Mathematics
- 28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
- 1987

Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in… Expand

A statistical approach to learning and generalization in layered neural networks

- Computer Science
- COLT '89
- 1989

The proposed formalism is applied to the problems of selecting an optimal architecture and the prediction of learning curves and the Gibbs distribution on the ensemble of networks with a fixed architecture is derived. Expand

A Proposal for More Powerful Learning Algorithms

- Computer Science
- Neural Computation
- 1989

Neural nets are universal learners, capable of learning any learnable class of concepts, and standard circuit complexity arguments show that learning algorithms with such freedom can solve in polynomial time any learning problem that can be solved in poynomial time by any algorithm whatever. Expand

Large Automatic Learning, Rule Extraction, and Generalization

- Computer Science
- Complex Syst.
- 1987

Since an tiquity, man has dreamed of building a de vice that would "learn from examples" 1 "form generalizations", and "discover t he rules" behind patt ern s in t he data. Recent work has shown that… Expand

A general lower bound on the number of examples needed for learning

- Computer Science, Mathematics
- COLT '88
- 1988

This paper proves a lower bound on the number of random examples required for distribution-free learning of a concept class C and shows that for many interesting concept classes, including k CNF and k DNF, the bound is actually tight to within a constant factor. Expand

The Vapnik-Chervonenkis Dimension: Information versus Complexity in Learning

- Computer Science
- Neural Computation
- 1989

The work of Vapnik and Chervonenkis (1971) provides the key tools for dealing with the information issue and the main ideas are developed and how complexity fits in are discussed. Expand

Connectionist Learning Procedures

- Computer Science, Mathematics
- Artif. Intell.
- 1989

These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks. Expand

Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework

- Mathematics, Computer Science
- Artif. Intell.
- 1988

Abstract We show that the notion of inductive bias in concept learning can be quantified in a way that directly relates to learning performance in the framework recently introduced by Valiant. Our… Expand

A 'Neural' Network that Learns to Play Backgammon

- Computer Science
- NIPS
- 1987

This study touches on some of the most important issues in network learning theory, including the development of efficient coding schemes and training procedures, scaling, generalization, the use of real-valued inputs and outputs, and techniques for escaping from local minima. Expand