The law of large numbers. Central limit theorem

Don't lose. Subscribe and receive a link to the article in your email.

Interacting daily in work or study with numbers and numbers, many of us do not even suspect that there is a very interesting law of large numbers, used, for example, in statistics, economics, and even psychological and pedagogical research. It refers to probability theory and says that the arithmetic mean of any large sample from a fixed distribution is close to the mathematical expectation of this distribution.

You probably noticed that it is not easy to understand the essence of this law, especially for those who are not particularly friendly with mathematics. Based on this, we would like to talk about it in simple language (as far as possible, of course), so that everyone can at least approximately understand for themselves what it is. This knowledge will help you better understand some mathematical patterns, become more erudite and positively influence.

Concepts of the law of large numbers and its interpretation

In addition to the above definition of the law of large numbers in probability theory, we can give its economic interpretation. In this case, it represents the principle that the frequency of a particular type of financial loss can be predicted with a high degree of certainty when there is a high level of losses of such types in general.

In addition, depending on the level of convergence of features, we can distinguish the weak and strengthened laws of large numbers. We are talking about weak when convergence exists in probability, and about strong when convergence exists in almost everything.

If we interpret it a little differently, then we should say this: it is always possible to find such a finite number of trials, where, with any preprogrammed probability less than one, the relative frequency of occurrence of some event will differ very little from its probability.

Thus, the general essence of the law of large numbers can be expressed as follows: the result of the complex action of a large number of identical and independent random factors will be such a result that does not depend on chance. And speaking in even simpler language, then in the law of large numbers, the quantitative laws of mass phenomena will clearly manifest themselves only when there are a large number of them (that is why the law of large numbers is called the law).

From this we can conclude that the essence of the law is that in the numbers that are obtained by mass observation, there are some correctness, which is impossible to detect in a small number of facts.

The essence of the law of large numbers and its examples

The law of large numbers expresses the most general patterns of the accidental and the necessary. When random deviations "extinguish" each other, the averages determined for the same structure take on the form of typical ones. They reflect the operation of essential and permanent facts under the specific conditions of time and place.

Regularities defined by the law of large numbers are strong only when they represent mass tendencies, and they cannot be laws for individual cases. Thus, the principle of mathematical statistics comes into force, which says that the complex action of a number of random factors can cause a non-random result. And the most striking example of the operation of this principle is the convergence of the frequency of occurrence of a random event and its probability when the number of trials increases.

Let's remember the usual coin toss. Theoretically, heads and tails can fall out with the same probability. This means that if, for example, a coin is tossed 10 times, 5 of them should come up heads and 5 should come up heads. But everyone knows that this almost never happens, because the ratio of the frequency of heads and tails can be 4 to 6, and 9 to 1, and 2 to 8, etc. However, with an increase in the number of coin tosses, for example, up to 100, the probability that heads or tails will fall out reaches 50%. If, theoretically, an infinite number of such experiments are carried out, the probability of a coin falling out on both sides will always tend to 50%.

How exactly the coin will fall is influenced by a huge number of random factors. This is the position of the coin in the palm of your hand, and the force with which the throw is made, and the height of the fall, and its speed, etc. But if there are many experiments, regardless of how the factors act, it can always be argued that the practical probability is close to the theoretical probability.

And here is another example that will help to understand the essence of the law of large numbers: suppose we need to estimate the level of earnings of people in a certain region. If we consider 10 observations, where 9 people receive 20 thousand rubles, and 1 person - 500 thousand rubles, the arithmetic mean will be 68 thousand rubles, which, of course, is unlikely. But if we take into account 100 observations, where 99 people receive 20 thousand rubles, and 1 person - 500 thousand rubles, then when calculating the arithmetic mean, we get 24.8 thousand rubles, which is already closer to the real state of affairs. By increasing the number of observations, we will force the average value to tend to the true value.

It is for this reason that in order to apply the law of large numbers, it is first necessary to collect statistical material in order to obtain truthful results by studying a large number of observations. That is why it is convenient to use this law, again, in statistics or social economics.

Summing up

The importance of the fact that the law of large numbers works is difficult to overestimate for any field of scientific knowledge, and especially for scientific developments in the field of the theory of statistics and methods of statistical knowledge. The action of the law is also of great importance for the objects under study themselves with their mass regularities. Almost all methods of statistical observation are based on the law of large numbers and the principle of mathematical statistics.

But, even without taking into account science and statistics as such, we can safely conclude that the law of large numbers is not just a phenomenon from the field of probability theory, but a phenomenon that we encounter almost every day in our lives.

We hope that now the essence of the law of large numbers has become more clear to you, and you can easily and simply explain it to someone else. And if the topic of mathematics and probability theory is interesting to you in principle, then we recommend reading about and. Also get acquainted with and. And, of course, pay attention to ours, because after passing it, you will not only master new thinking techniques, but also improve your cognitive abilities in general, including mathematical ones.

Law of Large Numbers

The practice of studying random phenomena shows that although the results of individual observations, even those carried out under the same conditions, can differ greatly, at the same time, the average results for a sufficiently large number of observations are stable and weakly depend on the results of individual observations. The theoretical justification for this remarkable property of random phenomena is the law of large numbers. The general meaning of the law of large numbers is that the joint action of a large number of random factors leads to a result that is almost independent of chance.

Central limit theorem

Lyapunov's theorem explains the wide distribution of the normal distribution law and explains the mechanism of its formation. The theorem allows us to assert that whenever a random variable is formed as a result of adding a large number of independent random variables, the variances of which are small compared to the variance of the sum, the law of distribution of this random variable turns out to be practically a normal law. And since random variables are always generated by an infinite number of causes, and most often none of them has a variance comparable to the variance of the random variable itself, most of the random variables encountered in practice are subject to the normal distribution law.

Let us dwell in more detail on the content of the theorems of each of these groups.

In practical research, it is very important to know in what cases it is possible to guarantee that the probability of an event will be either sufficiently small or arbitrarily close to unity.

Under law of large numbers and is understood as a set of sentences in which it is stated that with a probability arbitrarily close to one (or zero), an event will occur that depends on a very large, indefinitely increasing number of random events, each of which has only a slight influence on it.

More precisely, the law of large numbers is understood as a set of sentences in which it is stated that with a probability arbitrarily close to one, the deviation of the arithmetic mean of a sufficiently large number of random variables from a constant value, the arithmetic mean of their mathematical expectations, will not exceed a given arbitrarily small number.

Separate, single phenomena that we observe in nature and in social life often appear as random (for example, a registered death, the sex of a born child, air temperature, etc.) due to the fact that many factors that are not related to the essence of the emergence or development of a phenomenon. It is impossible to predict their total effect on the observed phenomenon, and they manifest themselves differently in individual phenomena. Based on the results of one phenomenon, nothing can be said about the patterns inherent in many such phenomena.

However, it has long been noted that the arithmetic mean of the numerical characteristics of certain features (the relative frequency of the occurrence of an event, the results of measurements, etc.) with a large number of repetitions of the experiment is subject to very slight fluctuations. In the middle one, as it were, the regularity inherent in the essence of phenomena manifests itself; in it, the influence of individual factors, which made the results of individual observations random, is mutually canceled out. Theoretically, this behavior of the average can be explained using the law of large numbers. If some very general conditions regarding random variables are met, then the stability of the arithmetic mean will be a practically certain event. These conditions constitute the most important content of the law of large numbers.

The first example of the operation of this principle can be the convergence of the frequency of occurrence of a random event with its probability with an increase in the number of trials - a fact established in Bernoulli's theorem (Swiss mathematician Jacob Bernoulli(1654-1705)). Bernoull's theorem is one of the simplest forms of the law of large numbers and is often used in practice. For example, the frequency of occurrence of any quality of the respondent in the sample is taken as an estimate of the corresponding probability).

Outstanding French mathematician Simeon Denny Poisson(1781-1840) generalized this theorem and extended it to the case when the probability of events in a trial varies independently of the results of previous trials. He was also the first to use the term "law of large numbers".

Great Russian mathematician Pafnuty Lvovich Chebyshev(1821 - 1894) proved that the law of large numbers operates in phenomena with any variation and also extends to the regularity of the average.

A further generalization of the theorems of the law of large numbers is connected with the names A.A.Markov, S.N.Bernshtein, A.Ya.Khinchin and A.N.Kolmlgorov.

The general modern formulation of the problem, the formulation of the law of large numbers, the development of ideas and methods for proving theorems related to this law belong to Russian scientists P. L. Chebyshev, A. A. Markov and A. M. Lyapunov.

CHEBYSHEV'S INEQUALITY

Let us first consider auxiliary theorems: the lemma and Chebyshev's inequality, which can be used to easily prove the law of large numbers in the Chebyshev form.

Lemma (Chebyshev).

If there are no negative values of the random variable X, then the probability that it will take on some value that exceeds the positive number A is not greater than a fraction, the numerator of which is the mathematical expectation of the random variable, and the denominator is the number A:

Proof.Let the distribution law of the random variable X be known:

(i = 1, 2, ..., ), and we consider the values of the random variable to be arranged in ascending order.

In relation to the number A, the values of the random variable are divided into two groups: some do not exceed A, while others are greater than A. Suppose that the first group includes the first values of the random variable ().

Since , then all terms of the sum are non-negative. Therefore, discarding the first terms in the expression, we obtain the inequality:

Because the

then

Q.E.D.

Random variables can have different distributions with the same mathematical expectations. However, for them, Chebyshev's lemma will give the same estimate of the probability of one or another test result. This shortcoming of the lemma is related to its generality: it is impossible to achieve a better estimate for all random variables at once.

Chebyshev's inequality .

The probability that the deviation of a random variable from its mathematical expectation will exceed in absolute value a positive number

Proof.Since a random variable that does not take negative values, we apply the inequality from the Chebyshev lemma for a random variable for :

Q.E.D.

Consequence. Because the

then

- another form of Chebyshev's inequality

We accept without proof the fact that the lemma and Chebyshev's inequality are also true for continuous random variables.

Chebyshev's inequality underlies the qualitative and quantitative statements of the law of large numbers. It defines the upper bound on the probability that the deviation of the value of a random variable from its mathematical expectation is greater than some given number. It is remarkable that the Chebyshev inequality gives an estimate of the probability of an event for a random variable whose distribution is unknown, only its mathematical expectation and variance are known.

Theorem. (Law of large numbers in Chebyshev form)

If the dispersions of independent random variables are limited by one constant C, and their number is large enough, then the probability is arbitrarily close to unity that the deviation of the arithmetic mean of these random variables from the arithmetic mean of their mathematical expectations will not exceed the given positive number in absolute value, no matter how small it is neither was:

We accept the theorem without proof.

Consequence 1. If independent random variables have the same, equal, mathematical expectations, their variances are limited by the same constant C, and their number is large enough, then, no matter how small the given positive number is, the probability that the deviation of the mean is arbitrarily close to unity arithmetic of these random variables from will not exceed in absolute value .

The fact that the approximate value of an unknown quantity is taken as the arithmetic mean of the results of a sufficiently large number of measurements made under the same conditions can be justified by this theorem. Indeed, the measurement results are random, since they are affected by a lot of random factors. The absence of systematic errors means that the mathematical expectations of individual measurement results are the same and equal. Consequently, according to the law of large numbers, the arithmetic mean of a sufficiently large number of measurements will practically arbitrarily differ little from the true value of the desired value.

(Recall that errors are called systematic if they distort the measurement result in the same direction according to a more or less clear law. These include errors that appear as a result of the imperfection of the instruments (instrumental errors), due to the personal characteristics of the observer (personal errors) and etc.)

Consequence 2 . (Bernoulli's theorem.)

If the probability of the occurrence of event A in each of the independent trials is constant, and their number is sufficiently large, then the probability is arbitrarily close to unity that the frequency of the occurrence of the event differs arbitrarily little from the probability of its occurrence:

Bernoulli's theorem states that if the probability of an event is the same in all trials, then with an increase in the number of trials, the frequency of the event tends to the probability of the event and ceases to be random.

In practice, experiments are relatively rare in which the probability of an event occurring in any experiment is unchanged, more often it is different in different experiments. Poisson's theorem refers to a test scheme of this type:

Corollary 3 . (Poisson's theorem.)

If the probability of occurrence of an event in a -test does not change when the results of previous trials become known, and their number is large enough, then the probability that the frequency of occurrence of an event differs arbitrarily little from the arithmetic mean probabilities is arbitrarily close to unity:

Poisson's theorem states that the frequency of an event in a series of independent trials tends to the arithmetic mean of its probabilities and ceases to be random.

In conclusion, we note that none of the considered theorems gives either an exact or even an approximate value of the desired probability, but only its lower or upper bound is indicated. Therefore, if it is required to establish the exact or at least approximate value of the probabilities of the corresponding events, the possibilities of these theorems are very limited.

Approximate probabilities for large values can only be obtained using limit theorems. In them, either additional restrictions are imposed on random variables (as is the case, for example, in the Lyapunov theorem), or random variables of a certain type are considered (for example, in the Moivre-Laplace integral theorem).

The theoretical significance of Chebyshev's theorem, which is a very general formulation of the law of large numbers, is great. However, if we apply it to the question of whether it is possible to apply the law of large numbers to a sequence of independent random variables, then, if the answer is yes, the theorem will often require that there be much more random variables than is necessary for the law of large numbers to come into force. This shortcoming of Chebyshev's theorem is explained by its general character. Therefore, it is desirable to have theorems that would more accurately indicate the lower (or upper) bound on the desired probability. They can be obtained by imposing on random variables some additional restrictions, which are usually satisfied for random variables encountered in practice.

REMARKS ON THE CONTENT OF THE LAW OF LARGE NUMBERS

If the number of random variables is large enough and they satisfy some very general conditions, then, no matter how they are distributed, it is practically certain that their arithmetic mean deviates arbitrarily small from a constant value - the arithmetic mean of their mathematical expectations, i.e. is practically constant. Such is the content of the theorems relating to the law of large numbers. Consequently, the law of large numbers is one of the expressions of the dialectical connection between chance and necessity.

One can give many examples of the emergence of new qualitative states as manifestations of the law of large numbers, primarily among physical phenomena. Let's consider one of them.

According to modern concepts, gases consist of individual particles-molecules that are in chaotic motion, and it is impossible to say exactly where it will be at a given moment and at what speed this or that molecule will move. However, observations show that the total effect of molecules, such as the pressure of a gas on

vessel wall, manifests itself with amazing constancy. It is determined by the number of blows and the strength of each of them. Although the first and second are a matter of chance, the instruments do not detect fluctuations in the pressure of a gas under normal conditions. This is explained by the fact that due to the huge number of molecules, even in the smallest volumes

a change in pressure by a noticeable amount is practically impossible. Therefore, the physical law that states the constancy of gas pressure is a manifestation of the law of large numbers.

The constancy of pressure and some other characteristics of a gas at one time served as a weighty argument against the molecular theory of the structure of matter. Subsequently, they learned to isolate a relatively small number of molecules, ensuring that the influence of individual molecules still remained, and thus the law of large numbers could not manifest itself sufficiently. Then it was possible to observe fluctuations in gas pressure, confirming the hypothesis of the molecular structure of matter.

The law of large numbers underlies various types of insurance (human life insurance for various periods, property, livestock, crops, etc.).

When planning the range of consumer goods, the demand for them from the population is taken into account. In this demand, the operation of the law of large numbers is manifested.

The sampling method widely used in statistics finds its scientific justification in the law of large numbers. For example, the quality of wheat brought from the collective farm to the procurement point is judged by the quality of grains accidentally captured in a small measure. There are few grains in the measure compared to the whole batch, but in any case, the measure is chosen such that there are quite enough grains in it for

manifestation of the law of large numbers with an accuracy that satisfies the need. We have the right to take the corresponding indicators in the sample as indicators of weediness, moisture content and the average weight of grains of the entire batch of incoming grain.

Further efforts of scientists to deepen the content of the law of large numbers were aimed at obtaining the most general conditions for the applicability of this law to a sequence of random variables. For a long time there were no fundamental successes in this direction. After P. L. Chebyshev and A. A. Markov, only in 1926 did the Soviet academician A. N. Kolmogorov manage to obtain the conditions necessary and sufficient for the law of large numbers to be applicable to a sequence of independent random variables. In 1928, the Soviet scientist A. Ya. Khinchin showed that a sufficient condition for the applicability of the law of large numbers to a sequence of independent identically distributed random variables is the existence of their mathematical expectation.

For practice, it is extremely important to fully clarify the question of the applicability of the law of large numbers to dependent random variables, since phenomena in nature and society are mutually dependent and mutually determine each other. Much work has been devoted to elucidating the restrictions that must be imposed

into dependent random variables so that the law of large numbers can be applied to them, the most important ones being those of the outstanding Russian scientist A. A. Markov and the great Soviet scientists S. N. Bernshtein and A. Ya. Khinchin.

The main result of these papers is that the law of large numbers is applicable to dependent random variables, if only strong dependence exists between random variables with close numbers, and between random variables with distant numbers, the dependence is sufficiently weak. Examples of random variables of this type are the numerical characteristics of the climate. The weather of each day is noticeably influenced by the weather of the previous days, and the influence noticeably weakens with the distance of the days from each other. Consequently, the long-term average temperature, pressure and other characteristics of the climate of a given area, in accordance with the law of large numbers, should practically be close to their mathematical expectations. The latter are objective characteristics of the local climate.

In order to experimentally verify the law of large numbers, the following experiments were carried out at different times.

1. Experience Buffon. The coin is flipped 4040 times. The coat of arms fell 2048 times. The frequency of its occurrence was equal to 0.50694 =

2. Experience Pearson. The coin is flipped 12,000 and 24,000 times. The frequency of the loss of the coat of arms in the first case turned out to be 0.5016, in the Second - 0.5005.

H. Experience Vestergaard. From the urn, in which there were equally white and black balls, 5011 white and 4989 black balls were obtained with 10,000 extractions (with the return of the next drawn ball to the urn). The frequency of white balls was 0.50110 = (), and black - 0.49890.

4. Experience of V.I. Romanovsky. Four coins are thrown 21160 times. Frequencies and frequencies of various combinations of coat of arms and grating were distributed as follows:

Combinations of the number of coat of arms and tails	Frequencies	Frequencies
Combinations of the number of coat of arms and tails	Frequencies	empirical	Theoretical
4 and 0	1 181	0,05858	0,0625
3 and 1	4909	0,24350	0,2500
2 and 2	7583	0,37614	0,3750
1 and 3	5085	0,25224	0,2500
1 and 4		0,06954	0,0625
Total	20160	1,0000	1,0000

The results of experimental tests of the law of large numbers convince us that the experimental frequencies are close to the probabilities.

CENTRAL LIMIT THEOREM

It is easy to prove that the sum of any finite number of independent normally distributed random variables is also distributed according to the normal law.

If independent random variables are not distributed according to the normal law, then some very loose restrictions can be imposed on them, and their sum will still be normally distributed.

This problem was posed and solved mainly by Russian scientists P. L. Chebyshev and his students A. A. Markov and A. M. Lyapunov.

Theorem (Lyapunov).

If independent random variables have finite mathematical expectations and finite variances , their number is large enough, and with an unlimited increase

where are the absolute central moments of the third order, then their sum with a sufficient degree of accuracy has a distribution

(In fact, we present not Lyapunov's theorem, but one of its corollaries, since this corollary is quite sufficient for practical applications. Therefore, the condition , which is called the Lyapunov condition, is a stronger requirement than is necessary for the proof of Lyapunov's theorem itself.)

The meaning of the condition is that the action of each term (random variable) is small compared to the total action of all of them. Many random phenomena that occur in nature and in social life proceed exactly according to this pattern. In this regard, Lyapunov's theorem is of exceptionally great importance, and the normal distribution law is one of the basic laws in probability theory.

Let, for example, measurement some size . Various deviations of the observed values from its true value (mathematical expectation) are obtained as a result of the influence of a very large number of factors, each of which generates a small error , and . Then the total measurement error is a random variable, which, according to the Lyapunov theorem, must be distributed according to the normal law.

At gun shooting under the influence of a very large number of random causes, shells are scattered over a certain area. Random effects on the projectile trajectory can be considered independent. Each cause causes only a small change in the trajectory compared to the total change due to all causes. Therefore, it should be expected that the deviation of the projectile rupture site from the target will be a random variable distributed according to the normal law.

By Lyapunov's theorem, we have the right to expect that, for example, adult male height is a random variable distributed according to the normal law. This hypothesis, as well as those considered in the previous two examples, is in good agreement with observations. To confirm, we present the distribution by height of 1000 adult male workers and the corresponding theoretical numbers of men, i.e., the number of men who should have the growth of these groups, based on the distribution assumption growth of men according to the normal law.

Height, cm	number of men
Height, cm	experimental data	theoretical forecasts
143-146
146-149
149-152
152-155
155-158
158- 161
161- 164
164-167
167-170
170-173
173-176
176-179
179 -182
182-185
185-188

It would be difficult to expect a more accurate agreement between the experimental data and the theoretical ones.

One can easily prove, as a corollary of Lyapunov's theorem, a proposition that will be needed in what follows to justify the sampling method.

Sentence.

The sum of a sufficiently large number of identically distributed random variables with absolute central moments of the third order is distributed according to the normal law.

The limit theorems of the theory of probability, the theorems of Moivre-Laplace explain the nature of the stability of the frequency of occurrence of an event. This nature consists in the fact that the limiting distribution of the number of occurrences of an event with an unlimited increase in the number of trials (if the probability of an event in all trials is the same) is a normal distribution.

System of random variables.

The random variables considered above were one-dimensional, i.e. were determined by one number, however, there are also random variables that are determined by two, three, etc. numbers. Such random variables are called two-dimensional, three-dimensional, etc.

Depending on the type of random variables included in the system, systems can be discrete, continuous or mixed if the system includes different types of random variables.

Let us consider systems of two random variables in more detail.

Definition. distribution law system of random variables is called a relation that establishes a relationship between the areas of possible values of the system of random variables and the probabilities of the occurrence of the system in these areas.

Example. From an urn containing 2 white and 3 black balls, two balls are drawn. Let be the number of drawn white balls, and the random variable is defined as follows:

Let's make a distribution table of the system of random variables:

Since is the probability that no white balls are taken out (hence, two black balls are taken out), while , then

Probability

Probability is the probability that no white balls are taken out (and, therefore, two black balls are taken out), while , then

Probability is the probability that one white ball (and, therefore, one black) is drawn, while , then

Probability - the probability that two white balls are drawn (and, therefore, no black ones), while , then

Thus, the distribution series of a two-dimensional random variable has the form:

Definition. distribution function system of two random variables is called a function of two argumentsF( x, y) , equal to the probability of joint fulfillment of two inequalitiesX< x, Y< y.

We note the following properties of the distribution function of a system of two random variables:

1) ;

2) The distribution function is a non-decreasing function with respect to each argument:

3) The following is true:

5) The probability of hitting a random point ( X , Y ) into an arbitrary rectangle with sides parallel to the coordinate axes, is calculated by the formula:

Distribution density of a system of two random variables.

Definition. Joint distribution density probabilities of a two-dimensional random variable ( X , Y ) is called the second mixed partial derivative of the distribution function.

If the distribution density is known, then the distribution function can be found by the formula:

The two-dimensional distribution density is non-negative and the double integral with infinite limits of the two-dimensional density is equal to one.

From the known joint distribution density, one can find the distribution density of each of the components of a two-dimensional random variable.

; ;

Conditional laws of distribution.

As shown above, knowing the joint distribution law, one can easily find the distribution laws for each random variable included in the system.

However, in practice, the inverse problem is more often - according to the known laws of distribution of random variables, find their joint distribution law.

In the general case, this problem is unsolvable, because the distribution law of a random variable says nothing about the relationship of this variable with other random variables.

In addition, if random variables are dependent on each other, then the distribution law cannot be expressed in terms of the distribution laws of the components, since should establish a connection between the components.

All this leads to the need to consider conditional distribution laws.

Definition. The distribution of one random variable included in the system, found under the condition that another random variable has taken a certain value, is called conditional distribution law.

The conditional distribution law can be specified both by the distribution function and by the distribution density.

The conditional distribution density is calculated by the formulas:

The conditional distribution density has all the properties of the distribution density of one random variable.

Conditional mathematical expectation.

Definition. Conditional expectation discrete random variable Y at X = x (x is a certain possible value of X) is called the product of all possible values Y on their conditional probabilities.

For continuous random variables:

where f( y/ x) is the conditional density of the random variable Y when X = x .

Conditional expectationM( Y/ x)= f( x) is a function of X and called regression function X on Y.

Example.Find the conditional expectation of the component Y at

X=x1 =1 for a discrete two-dimensional random variable given by the table:

Y
Y	x1=1	x2=3	x3=4	x4=8
y1=3	0,15	0,06	0,25	0,04
y2=6	0,30	0,10	0,03	0,07

The conditional variance and conditional moments of the system of random variables are defined similarly.

Dependent and independent random variables.

Definition. Random variables are called independent, if the distribution law of one of them does not depend on what value the other random variable takes.

The concept of dependence of random variables is very important in probability theory.

Conditional distributions of independent random variables are equal to their unconditional distributions.

Let us define the necessary and sufficient conditions for the independence of random variables.

Theorem. Y are independent, it is necessary and sufficient that the distribution function of the system ( X, Y) was equal to the product of the distribution functions of the components.

A similar theorem can be formulated for the distribution density:

Theorem. In order for the random variables X and Y are independent, it is necessary and sufficient that the joint distribution density of the system ( X, Y) was equal to the product of the distribution densities of the components.

The following formulas are practically used:

For discrete random variables:

For continuous random variables:

The correlation moment serves to characterize the relationship between random variables. If the random variables are independent, then their correlation moment is zero.

The correlation moment has a dimension equal to the product of the dimensions of the random variables X and Y . This fact is a disadvantage of this numerical characteristic, since with different units of measurement, different correlation moments are obtained, which makes it difficult to compare the correlation moments of different random variables.

In order to eliminate this shortcoming, another characteristic is applied - the correlation coefficient.

Definition. Correlation coefficient rxy random variables X and Y is the ratio of the correlation moment to the product of the standard deviations of these quantities.

The correlation coefficient is a dimensionless quantity. For independent random variables, the correlation coefficient is zero.

Property: The absolute value of the correlation moment of two random variables X and Y does not exceed the geometric mean of their dispersions.

Property: The absolute value of the correlation coefficient does not exceed unity.

Random variables are called correlated if their correlation moment is nonzero, and uncorrelated if their correlation moment is zero.

If random variables are independent, then they are uncorrelated, but from uncorrelation one cannot conclude that they are independent.

If two quantities are dependent, then they can be either correlated or uncorrelated.

Often, according to a given distribution density of a system of random variables, one can determine the dependence or independence of these variables.

Along with the correlation coefficient, the degree of dependence of random variables can also be characterized by another quantity, which is called coefficient of covariance. The coefficient of covariance is determined by the formula:

Example. The distribution density of the system of random variables X andindependent. Of course, they will also be uncorrelated.

Linear regression.

Consider a two-dimensional random variable ( X , Y ), where X and Y are dependent random variables.

Let us represent approximately one random variable as a function of another. An exact match is not possible. We assume that this function is linear.

To determine this function, it remains only to find the constant values a and b.

Definition. Functiong( X) called best approximation random variable Y in the sense of the least squares method, if the mathematical expectation

Takes on the smallest possible value. Also functiong( x) called mean square regression Y to X .

Theorem. Linear mean square regression Y on X is calculated by the formula:

in this formula mx= M( X random variable Yrelative to random variable X. This value characterizes the magnitude of the error resulting from the replacement of a random variableYlinear functiong( X) = aX +b.

It is seen that if r= ± 1, then the residual variance is zero, and hence the error is zero and the random variableYis exactly represented by a linear function of the random variable X.

Direct Root Mean Square Regression X on theYis determined similarly by the formula: X and Yhave linear regression functions in relation to each other, then we say that the quantities X andYconnected linear correlation dependence.

Theorem. If a two-dimensional random variable ( X, Y) is normally distributed, then X and Y are connected by a linear correlation dependence.

E.G. Nikiforova

The phenomenon of stabilization of the frequencies of occurrence of random events, discovered on a large and varied material, at first did not have any justification and was perceived as a purely empirical fact. The first theoretical result in this area was the famous Bernoulli theorem published in 1713, which laid the foundation for the laws of large numbers.

Bernoulli's theorem in its content is a limit theorem, i.e., a statement of asymptotic meaning, saying what will happen to the probabilistic parameters with a large number of observations. The progenitor of all modern numerous statements of this type is precisely Bernoulli's theorem.

Today it seems that the mathematical law of large numbers is a reflection of some common property of many real processes.

Having a desire to give the law of large numbers as much coverage as possible, corresponding to the far from exhausted potential possibilities of applying this law, one of the greatest mathematicians of our century A. N. Kolmogorov formulated its essence as follows: the law of large numbers is “a general principle by virtue of which the action of a large number of random factors leads to a result almost independent of chance.

Thus, the law of large numbers has, as it were, two interpretations. One is mathematical, associated with specific mathematical models, formulations, theories, and the second is more general, going beyond this framework. The second interpretation is connected with the phenomenon of formation, which is often noted in practice, in varying degrees of directed action against the background of a large number of hidden or visible acting factors that do not have such continuity outwardly. Examples related to the second interpretation are pricing in the free market, the formation of public opinion on a particular issue.

Having noted this general interpretation of the law of large numbers, let us turn to the specific mathematical formulations of this law.

As we said above, the first and fundamentally most important for the theory of probability is Bernoulli's theorem. The content of this mathematical fact, which reflects one of the most important regularities of the surrounding world, is reduced to the following.

Consider a sequence of unrelated (i.e., independent) tests, the conditions for which are reproduced invariably from test to test. The result of each test is the appearance or non-appearance of the event of interest to us. BUT.

This procedure (Bernoulli scheme) can obviously be recognized as typical for many practical areas: "boy - girl" in the sequence of newborns, daily meteorological observations ("it was raining - it was not"), control of the flow of manufactured products ("normal - defective") etc.

Frequency of occurrence of the event BUT at P trials ( t A -

event frequency BUT in P tests) has with growth P tendency to stabilize its value, this is an empirical fact.

Bernoulli's theorem. Let us choose any arbitrarily small positive number e. Then

We emphasize that the mathematical fact established by Bernoulli in a certain mathematical model (in the Bernoulli scheme) should not be confused with the empirically established regularity of frequency stability. Bernoulli was not satisfied only with the statement of formula (9.1), but, taking into account the needs of practice, he gave an estimate of the inequality present in this formula. We will return to this interpretation below.

Bernoulli's law of large numbers has been the subject of research by a large number of mathematicians who have sought to refine it. One such refinement was obtained by the English mathematician Moivre and is currently called the Moivre-Laplace theorem. In the Bernoulli scheme, consider the sequence of normalized quantities:

Integral theorem of Moivre - Laplace. Pick any two numbers X ( and x 2 . In this case, x, x 7, then when P -» °°

If on the right side of formula (9.3) the variable x x tend to infinity, then the resulting limit, which depends only on x 2 (in this case, the index 2 can be removed), will be a distribution function, it is called standard normal distribution, or Gauss law.

The right side of formula (9.3) is equal to y = F(x 2) - F(x x). F(x2)-> 1 at x 2-> °° and F(x,) -> 0 for x, -> By choosing a sufficiently large

X] > 0 and sufficiently large in absolute value X] n we obtain the inequality:

Taking into account formula (9.2), we can extract practically reliable estimates:

If the reliability of y = 0.95 (i.e., the error probability of 0.05) may seem insufficient to someone, you can play it safe and build a slightly wider confidence interval using the three sigma rule mentioned above:

This interval corresponds to a very high confidence level y = 0.997 (see normal distribution tables).

Consider the example of tossing a coin. Let's toss a coin n = 100 times. Can it happen that the frequency R will be very different from the probability R= 0.5 (assuming the symmetry of the coin), for example, will it be equal to zero? To do this, it is necessary that the coat of arms does not fall out even once. Such an event is theoretically possible, but we have already calculated such probabilities, for this event it will be equal to This value

is extremely small, its order is a number with 30 decimal places. An event with such a probability can safely be considered practically impossible. What deviations of the frequency from the probability with a large number of experiments are practically possible? Using the Moivre-Laplace theorem, we answer this question as follows: with probability at= 0.95 coat of arms frequency R fits into the confidence interval:

If the error of 0.05 seems not small, it is necessary to increase the number of experiments (tossing a coin). With an increase P the width of the confidence interval decreases (unfortunately, not as fast as we would like, but inversely proportional to -Jn). For example, when P= 10 000 we get that R lies in the confidence interval with the confidence probability at= 0.95: 0.5 ± 0.01.

Thus, we have dealt quantitatively with the question of the approximation of frequency to probability.

Now let's find the probability of an event from its frequency and estimate the error of this approximation.

Let us make a large number of experiments P(tossed a coin), found the frequency of the event BUT and want to estimate its probability R.

From the law of large numbers P follows that:

Let us now estimate the practically possible error of the approximate equality (9.7). To do this, we use inequality (9.5) in the form:

For finding R on R it is necessary to solve inequality (9.8), for this it is necessary to square it and solve the corresponding quadratic equation. As a result, we get:

where

For an approximate estimate R on R can be in the formula (9.8) R on the right, replace with R or in formulas (9.10), (9.11) consider that

Then we get:

Let in P= 400 experiments received frequency value R= 0.25, then at the confidence level y = 0.95 we find:

But what if we need to know the probability more accurately, with an error of, say, no more than 0.01? To do this, you need to increase the number of experiments.

Assuming in formula (9.12) the probability R= 0.25, we equate the error value to the given value of 0.01 and obtain an equation for P:

Solving this equation, we get n~ 7500.

Let us now consider one more question: can the deviation of frequency from probability obtained in experiments be explained by random causes, or does this deviation show that the probability is not what we assumed it to be? In other words, does experience confirm the accepted statistical hypothesis or, on the contrary, require it to be rejected?

Let, for example, tossing a coin P= 800 times, we get the crest frequency R= 0.52. We suspected that the coin was not symmetrical. Is this suspicion justified? To answer this question, we will proceed from the assumption that the coin is symmetrical (p = 0.5). Let's find the confidence interval (with the confidence probability at= 0.95) for the frequency of appearance of the coat of arms. If the value obtained in the experiment R= 0.52 fits into this interval - everything is normal, the accepted hypothesis about the symmetry of the coin does not contradict the experimental data. Formula (9.12) for R= 0.5 gives an interval of 0.5 ± 0.035; received value p = 0.52 fits into this interval, which means that the coin will have to be “cleared” of suspicions of asymmetry.

Similar methods are used to judge whether various deviations from the mathematical expectation observed in random phenomena are random or "significant". For example, was there an accidental underweight in several samples of packaged goods, or does it indicate a systematic deception of buyers? Did the recovery rate increase by chance in patients who used the new drug, or is it due to the effect of the drug?

The normal law plays a particularly important role in probability theory and its practical applications. We have already seen above that a random variable - the number of occurrences of some event in the Bernoulli scheme - when P-» °° reduces to the normal law. However, there is a much more general result.

Central limit theorem. The sum of a large number of independent (or weakly dependent) random variables comparable to each other in the order of their dispersions is distributed according to the normal law, regardless of what the distribution laws of the terms were. The above statement is a rough qualitative formulation of the central limit theory. This theorem has many forms that differ from each other in the conditions that random variables must satisfy in order for their sum to “normalize” with an increase in the number of terms.

The density of the normal distribution Dx) is expressed by the formula:

where a - mathematical expectation of a random variable X s= V7) is its standard deviation.

To calculate the probability of x falling within the interval (x 1? x 2), the integral is used:

Since the integral (9.14) at density (9.13) is not expressed in terms of elementary functions (“it is not taken”), the tables of the integral distribution function of the standard normal distribution are used to calculate (9.14), when a = 0, a = 1 (such tables are available in any textbook on probability theory):

Probability (9.14) using equation (10.15) is expressed by the formula:

Example. Find the probability that the random variable x, having a normal distribution with parameters a, a, deviate from its mathematical expectation modulo no more than 3a.

Using formula (9.16) and the table of the distribution function of the normal law, we get:

Example. In each of the 700 independent experiences, an event BUT happens with constant probability R= 0.35. Find the probability that the event BUT will happen:

1) exactly 270 times;
2) less than 270 and more than 230 times;
3) more than 270 times.

Finding the mathematical expectation a = etc and standard deviation:

random variable - the number of occurrences of the event BUT:

Finding the centered and normalized value X:

According to the density tables of the normal distribution, we find f(x):

Let's find now R w (x,> 270) = P 700 (270 F(1.98) == 1 - 0.97615 = 0.02385.

A serious step in the study of the problems of large numbers was made in 1867 by P. L. Chebyshev. He considered a very general case, when nothing is required from independent random variables, except for the existence of mathematical expectations and variances.

Chebyshev's inequality. For an arbitrarily small positive number e, the following inequality holds:

Chebyshev's theorem. If a x x, x 2, ..., x n - pairwise independent random variables, each of which has a mathematical expectation E(Xj) = ci and dispersion D(x,) =), and the variances are uniformly bounded, i.e. 1,2 ..., then for an arbitrarily small positive number e the relation is fulfilled:

Consequence. If a a,= aio, -o 2 , i= 1,2 ..., then

A task. How many times must a coin be tossed so that with probability at least y - 0.997, could it be argued that the frequency of the coat of arms would be in the interval (0.499; 0.501)?

Suppose the coin is symmetrical, p - q - 0.5. We apply the Chebyshev theorem in formula (9.19) to the random variable X- the frequency of appearance of the coat of arms in P coin tossing. We have already shown above that X = X x + X 2 + ... +Х„, where X t - a random variable that takes the value 1 if the coat of arms fell out, and the value 0 if the tails fell out. So:

We write inequality (9.19) for an event opposite to the event indicated under the probability sign:

In our case, [e \u003d 0.001, cj 2 \u003d /? -p)] t is the number of coats of arms in P throwing. Substituting these quantities into the last inequality and taking into account that, according to the condition of the problem, the inequality must be satisfied, we obtain:

The given example illustrates the possibility of using Chebyshev's inequality for estimating the probabilities of certain deviations of random variables (as well as problems like this example related to the calculation of these probabilities). The advantage of Chebyshev's inequality is that it does not require knowledge of the laws of distributions of random variables. Of course, if such a law is known, then Chebyshev's inequality gives too rough estimates.

Consider the same example, but using the fact that coin tossing is a special case of the Bernoulli scheme. The number of successes (in the example - the number of coats of arms) obeys the binomial law, and with a large P this law can be represented by the integral theorem of Moivre - Laplace as a normal law with mathematical expectation a = pr = n? 0.5 and with standard deviation a = yfnpq- 25=0.5l/l. The random variable - the frequency of the coat of arms - has a mathematical expectation = 0.5 and a standard deviation

Then we have:

From the last inequality we get:

From the normal distribution tables we find:

We see that the normal approximation gives the number of coin tosses that provides a given error in estimating the probability of the coat of arms, which is 37 times smaller than the estimate obtained using the Chebyshev inequality (but the Chebyshev inequality makes it possible to perform similar calculations even in the case when we do not have the information on the law of distribution of the random variable under study).

Let us now consider an applied problem solved with the help of formula (9.16).

Competition problem. Two competing railway companies each have one train running between Moscow and St. Petersburg. These trains are equipped in approximately the same way, they also depart and arrive at approximately the same time. Let's pretend that P= 1000 passengers independently and randomly choose a train for themselves, therefore, as a mathematical model for choosing a train by passengers, we use the Bernoulli scheme with P trials and chances of success R= 0.5. The company must decide how many seats to provide on the train, taking into account two mutually contradictory conditions: on the one hand, they don’t want to have empty seats, on the other hand, they don’t want to appear dissatisfied with the lack of seats (next time they will prefer competing firms). Of course, you can provide on the train P= 1000 seats, but then there will certainly be empty seats. The random variable - the number of passengers in the train - within the framework of the accepted mathematical model using the integral theory of De Moivre - Laplace obeys the normal law with the mathematical expectation a = pr = n/2 and dispersion a 2 = npq = p/4 sequentially. The probability that the train will come to more than s passengers is determined by the ratio:

Set the risk level a, i.e. the probability that more than s passengers:

From here:

If a a- the risk root of the last equation, which is found in the tables of the distribution function of the normal law, we get:

If, for example, P = 1000, a= 0.01 (this level of risk means that the number of places s will be sufficient in 99 cases out of 100), then x a ~ 2.33 and s= 537 places. Moreover, if both companies accept the same levels of risk a= 0.01, then the two trains will have a total of 1074 seats, 74 of which will be empty. Similarly, one can calculate that 514 seats would be enough in 80% of all cases, and 549 seats in 999 out of 1000 cases.

Similar considerations apply to other competitive service problems. For example, if t cinemas compete for the same P spectators, it should be accepted R= -. We get

that the number of seats s in the cinema should be determined by the ratio:

The total number of empty seats is equal to:

For a = 0,01, P= 1000 and t= 2, 3, 4 the values of this number are approximately equal to 74, 126, 147, respectively.

Let's consider one more example. Let the train be P - 100 wagons. The weight of each wagon is a random variable with mathematical expectation a - 65 tons and mean square expectation o = 9 tons. A locomotive can carry a train if its weight does not exceed 6600 tons; otherwise, you have to hook up the second locomotive. We need to find the probability that this will not be necessary.

weights of individual wagons: having the same mathematical expectation a - 65 and the same variance d- o 2 \u003d 81. According to the rule of mathematical expectations: E(x) - 100 * 65 = 6500. According to the rule of addition of variances: D(x) \u003d 100 x 81 \u003d 8100. Taking the root, we find the standard deviation. In order for one locomotive to be able to pull a train, it is necessary that the weight of the train X turned out to be limiting, i.e., fell within the limits of the interval (0; 6600). The random variable x - the sum of 100 terms - can be considered normally distributed. By formula (9.16) we get:

It follows that the locomotive will "handle" the train with approximately 0.864 probability. Let us now reduce the number of cars in the train by two, i.e., take P= 98. Calculating now the probability that the locomotive will “handle” the train, we obtain a value of the order of 0.99, i.e., an almost certain event, although only two cars had to be removed for this.

So, if we are dealing with sums of a large number of random variables, then we can use the normal law. Naturally, this raises the question: how many random variables need to be added so that the distribution law of the sum is already “normalized”? It depends on what the laws of distribution of terms are. There are such intricate laws that normalization occurs only with a very large number of terms. But these laws are invented by mathematicians, while nature, as a rule, specifically does not arrange such troubles. Usually in practice, in order to be able to use the normal law, five or six terms are sufficient.

The speed with which the law of distribution of the sum of identically distributed random variables "normalizes" can be illustrated by the example of random variables with a uniform distribution on the interval (0, 1). The curve of such a distribution has the form of a rectangle, which is already unlike the normal law. Let's add two such independent quantities - we get a random variable distributed according to the so-called Simpson's law, the graphic representation of which has the form of an isosceles triangle. It doesn't look like a normal law either, but it's better. And if you add three such uniformly distributed random variables, you get a curve consisting of three segments of parabolas, very similar to a normal curve. If you add six such random variables, you get a curve that does not differ from a normal one. This is the basis of the widely used method for obtaining a normally distributed random variable, while all modern computers are equipped with sensors of uniformly distributed (0, 1) random numbers.

The following method is recommended as one practical way to check this. We build a confidence interval for the frequency of an event with a level at= 0.997 according to the three sigma rule:

and if both of its ends do not go beyond the segment (0, 1), then the normal law can be used. If any of the boundaries of the confidence interval is outside the segment (0, 1), then the normal law cannot be used. However, under certain conditions, the binomial law for the frequency of some random event, if it does not tend to the normal one, can tend to another law.

In many applications, the Bernoulli scheme is used as a mathematical model of a random experiment, in which the number of trials P large, a random event is quite rare, i.e. R = etc not small, but not large (fluctuates in the range of O -5 - 20). In this case, the following relation holds:

Formula (9.20) is called the Poisson approximation for the binomial law, since the probability distribution on its right side is called Poisson's law. The Poisson distribution is said to be a probability distribution for rare events, since it occurs when the limits are met: P -»°°, R-»0, but X = pr oo.

Example. Birthdays. What is the probability R t (k) that in a society of 500 people to people born on New Year's Day? If these 500 people are chosen at random, then the Bernoulli scheme can be applied with a probability of success P = 1/365. Then

Probability calculations for various to give the following values: RU = 0,3484...; R 2 = 0,2388...; R 3 = 0,1089...; P 4 = 0,0372...; R 5 = 0,0101...; R 6= 0.0023... Corresponding approximations by the Poisson formula for X= 500 1/365 = 1,37

give the following values: Ru = 0,3481...; R 2 = 0,2385...; Р b = 0,1089; R 4 = 0,0373...; P 5 = 0,0102...; P 6 = 0.0023... All errors are only in the fourth decimal place.

Let us give examples of situations where Poisson's law of rare events can be used.

At the telephone exchange, an incorrect connection is unlikely to occur. R, usually R~ 0.005. Then the Poisson formula allows you to find the probability of incorrect connections for a given total number of connections n~ 1000 when X = pr =1000 0,005 = 5.

When baking buns, raisins are placed in the dough. It should be expected that due to stirring, the frequency of raisin rolls will approximately follow the Poisson distribution P n (k, X), where X- density of raisins in the dough.

A radioactive substance emits n-particles. The event that the number of d-particles reaching in the course of time t given area of space, takes a fixed value to, obeys Poisson's law.

The number of living cells with altered chromosomes under the influence of X-rays follows the Poisson distribution.

So, the laws of large numbers allow solving the problem of mathematical statistics associated with estimating unknown probabilities of elementary outcomes of random experience. Thanks to this knowledge, we make the methods of probability theory practically meaningful and useful. The laws of large numbers also make it possible to solve the problem of obtaining information about unknown elementary probabilities in another form - the form of testing statistical hypotheses.

Let us consider in more detail the formulation and the probabilistic mechanism for solving problems of testing statistical hypotheses.

The distribution function of a random variable and its properties.

distribution function random variable X is called the function F(X), expressing for each x the probability that the random variable X takes a value less than x: F(x)=P(X

Function F(x) sometimes called integral function distribution or integral distribution law.

Distribution function properties:

1. The distribution function of a random variable is a non-negative function enclosed between zero and one:

0 ≤ F(x) ≤ 1.

2. The distribution function of a random variable is a non-decreasing function on the whole number axis.

3. At minus infinity, the distribution function is equal to zero, at plus infinity it is equal to one, i.e.: F(-∞)= , F(+∞)= .

4. The probability of a random variable falling into the interval [x1,x2) (including x1) is equal to the increment of its distribution function on this interval, i.e. P(x 1 ≤ X< х 2) = F(x 2) - F(x 1).

Markov and Chebyshev inequality

Markov inequality

Theorem: If a random variable X takes only non-negative values and has a mathematical expectation, then for any positive number A the equality is true: P(x>A) ≤ .

Since the events X > A and X ≤ A are opposite, replacing P(X > A) we express 1 - P (X ≤ A), we arrive at another form of Markov's inequality: P(X ≥ A) ≥1 - .

Markov's inequality k is applicable to any non-negative random variables.

Chebyshev's inequality

Theorem: For any random variable with mathematical expectation and variance, Chebyshev's inequality is true:

P (|X - a| > ε) ≤ D(X) / ε 2 or P (|X - a| ≤ ε) ≥ 1 - DX / ε 2, where a \u003d M (X), ε>0.

The law of large numbers "in the form" of Chebyshev's theorem.

Chebyshev's theorem: If the variances n independent random variables X1, X2,…. X n are limited by the same constant, then with an unlimited increase in the number n the arithmetic mean of random variables converges in probability to the arithmetic mean of their mathematical expectations a 1 ,a 2 ....,a n , i.e. .

The meaning of the law of large numbers is that the average values of random variables tend to their mathematical expectation when n→ ∞ in probability. The deviation of the average values from the mathematical expectation becomes arbitrarily small with a probability close to one if n is large enough. In other words, the probability of any deviation of the means from a arbitrarily small with growth n.

30. Bernoulli's theorem.

Bernoulli's theorem: Event frequency in n repeated independent trials, in each of which it can occur with the same probability p, with an unlimited increase in the number n converge in probability to the probability p of this event in a separate trial: \

Bernoulli's theorem is a consequence of Chebyshev's theorem, because the frequency of an event can be represented as the arithmetic mean of n independent alternative random variables that have the same distribution law.

18. Mathematical expectation of a discrete and continuous random variable and their properties.

mathematical expectation is the sum of the products of all its values and their corresponding probabilities

For a discrete random variable:

For a continuous random variable:

Properties of mathematical expectation:

1. The mathematical expectation of a constant value is equal to the constant itself: M(S)=S

2. The constant factor can be taken out of the expectation sign, i.e. M(kX)=kM(X).

3. The mathematical expectation of the algebraic sum of a finite number of random variables is equal to the same sum of their mathematical expectations, i.e. M(X±Y)=M(X)±M(Y).

4. The mathematical expectation of the product of a finite number of independent random variables is equal to the product of their mathematical expectations: M(XY)=M(X)*M(Y).

5. If all values of a random variable are increased (decreased) by a constant C, then the mathematical expectation of this random variable will increase (decrease) by the same constant C: M(X±C)=M(X)±C.

6. The mathematical expectation of the deviation of a random variable from its mathematical expectation is zero: M=0.

If the phenomenon of sustainability medium takes place in reality, then in the mathematical model with which we study random phenomena, there must be a theorem reflecting this fact.
Under the conditions of this theorem, we introduce restrictions on random variables X 1 , X 2 , …, X n:

a) each random variable Х i has mathematical expectation

M(Х i) = a;

b) the variance of each random variable is finite, or we can say that the variances are bounded from above by the same number, for example FROM, i.e.

D(Х i) < C, i = 1, 2, …, n;

c) random variables are pairwise independent, i.e. any two X i and Xj at i¹ j independent.

Then obviously

D(X 1 + X 2 + … + X n)=D(X 1) + D(X 2) + ... + D(X n).

Let us formulate the law of large numbers in the Chebyshev form.

Chebyshev's theorem: with an unlimited increase in the number n independent tests " the arithmetic mean of the observed values of a random variable converges in probability to its mathematical expectation ”, i.e. for any positive ε

R(| –a| < ε ) = 1. (4.1.1)

The meaning of the expression "arithmetic mean = converges in probability to a" is that the probability that will differ arbitrarily little from a, approaches 1 indefinitely as the number n.

Proof. For a finite number n independent tests, we apply the Chebyshev inequality for a random variable = :

R(|–M()| < ε ) ≥ 1 – . (4.1.2)

Taking into account the restrictions a - b, we calculate M( ) and D( ):

M( ) = = = = = = a;

D( ) = = = = = = .

Substituting M( ) and D( ) into inequality (4.1.2), we obtain

R(| –a| < ε )≥1 – .

If in inequality (4.1.2) we take an arbitrarily small ε >0 and n® ¥, then we get

which proves the Chebyshev theorem.

An important practical conclusion follows from the considered theorem: we have the right to replace the unknown value of the mathematical expectation of a random variable by the arithmetic mean value obtained from a sufficiently large number of experiments. In this case, the more experiments to calculate, the more likely (reliable) it can be expected that the error associated with this replacement ( - a) will not exceed the given value ε .

In addition, other practical problems can be solved. For example, according to the values of probability (reliability) R=R(| – a|< ε ) and the maximum allowable error ε determine the required number of experiments n; on R and P define ε; on ε and P determine the probability of an event | – a |< ε.

special case. Let at n trials observed n values of a random variable x, having mathematical expectation M(X) and dispersion D(X). The obtained values can be considered as random variables X 1 ,X 2 ,X 3 , ... ,X n,. It should be understood as follows: a series of P tests are carried out repeatedly, so as a result i th test, i= l, 2, 3, ..., P, in each series of tests one or another value of a random variable will appear X, not known in advance. Consequently, i-e value x i random variable obtained in i th test, changes randomly if you move from one series of tests to another. So every value x i can be considered random X i .

Assume that the tests meet the following requirements:

1. Tests are independent. This means that the results X 1 , X 2 ,
X 3 , ..., X n tests are independent random variables.

2. Tests are carried out under the same conditions - this means, from the point of view of probability theory, that each of the random variables X 1 ,X 2 ,X 3 , ... ,X n has the same distribution law as the original value X, that's why M(X i) = M(X)and D(X i) = D(X), i = 1, 2, .... P.

Considering the above conditions, we get

R(| –a| < ε )≥1 – . (4.1.3)

Example 4.1.1. X is equal to 4. How many independent experiments are required so that with a probability of at least 0.9 it can be expected that the arithmetic mean of this random variable will differ from the mathematical expectation by less than 0.5?

Solution.According to the condition of the problem ε = 0,5; R(| – a|< 0,5) ≥ 0.9. Applying formula (4.1.3) for the random variable X, we get

P(|–M(X)| < ε ) ≥ 1 – .

From the relation

1 – = 0,9

define

P= = = 160.

Answer: it is required to make 160 independent experiments.

Assuming that the arithmetic mean normally distributed, we get:

R(| – a|< ε )= 2Φ () ≥ 0,9.

From where, using the table of the Laplace function, we get ≥
≥ 1.645, or ≥ 6.58 i.e. n ≥49.

Example 4.1.2. Variance of a random variable X is equal to D( X) = 5. 100 independent experiments were carried out, according to which . Instead of the unknown value of the mathematical expectation a adopted . Determine the maximum amount of error allowed in this case with a probability of at least 0.8.

Solution. According to the task n= 100, R(| –a|< ε ) ≥0.8. We apply the formula (4.1.3)

R(| –a|< ε ) ≥1 – .

From the relation

1 – = 0,8

define ε :

ε 2 = = = 0,25.

Consequently, ε = 0,5.

Answer: maximum error value ε = 0,5.

4.2. Law of large numbers in Bernoulli form

Although the concept of probability is the basis of any statistical inference, we can only in a few cases determine the probability of an event directly. Sometimes this probability can be established from considerations of symmetry, equal opportunity, etc., but there is no universal method that would allow one to indicate its probability for an arbitrary event. Bernoulli's theorem makes it possible to approximate the probability if for the event of interest to us BUT repeated independent tests can be carried out. Let produced P independent tests, in each of which the probability of occurrence of some event BUT constant and equal R.

Bernoulli's theorem. With an unlimited increase in the number of independent trials P relative frequency of occurrence of an event BUT converges in probability to probability p occurrence of an event BUT,t. e.

P(½ - p½≤ ε) = 1, (4.2.1)

where ε is an arbitrarily small positive number.

For the final n provided that , Chebyshev's inequality for a random variable will have the form:

P(| –p|< ε ) ≥ 1 – .(4.2.2)

Proof. We apply the Chebyshev theorem. Let X i– number of occurrences of the event BUT in i th test, i= 1, 2, . . . , n. Each of the quantities X i can only take two values:

X i= 1 (event BUT happened) with a probability p,

X i= 0 (event BUT did not occur) with a probability q= 1–p.

Let Y n= . Sum X 1 + X 2 + … + X n is equal to the number m event occurrences BUT in n tests (0 m n), which means Y n= – relative frequency of occurrence of the event BUT in n tests. Mathematical expectation and variance X i are equal respectively:

M( ) = 1∙p + 0∙q = p,

Example 4.2.1. In order to determine the percentage of defective products, 1000 units were tested according to the return sampling scheme. What is the probability that the absolute value of the reject rate determined by this sample will differ from the reject rate for the entire batch by no more than 0.01, if it is known that, on average, there are 500 defective items for every 10,000 items?

Solution. According to the condition of the problem, the number of independent trials n= 1000;

p= = 0,05; q= 1 – p= 0,95; ε = 0,01.

Applying formula (4.2.2), we obtain

P(| –p|< 0,01) ≥ 1 – = 1 – = 0,527.

Answer: with a probability of at least 0.527, it can be expected that the sample fraction of defects (the relative frequency of occurrence of defects) will differ from the share of defects in all products (from the probability of defects) by no more than 0.01.

Example 4.2.2. When stamping parts, the probability of marriage is 0.05. How many parts must be checked so that with a probability of at least 0.95 it can be expected that the relative frequency of defective products will differ from the probability of defects by less than 0.01?

Solution. According to the task R= 0,05; q= 0,95; ε = 0,01;

P(| – p|<0,01) ≥ 0,95.

From equality 1 – = 0.95 find n:

n= = =9500.

Answer: 9500 items need to be checked.

Comment. Estimates of the required number of observations obtained by applying Bernoulli's (or Chebyshev's) theorem are greatly exaggerated. There are more precise estimates proposed by Bernstein and Khinchin, but requiring a more complex mathematical apparatus. To avoid exaggeration of estimates, the Laplace formula is sometimes used

P(| – p|< ε ) ≈ 2Φ .

The disadvantage of this formula is the lack of an estimate of the allowable error.