Старый 18.10.2015, 00:03   #1
Статистика употребления букв в английском языке.

Пускай тут висит)


Order Of Frequency Of Single Letters E T A O I N S H R D L U

Order Of Frequency Of Digraphs th er on an re he in ed nd ha at en es of or nt ea ti to it st io le is ou ar as de rt ve

Order Of Frequency Of Trigraphs the and tha ent ion tio for nde has nce edt tis oft sth men

Order Of Frequency Of Most Common Doubles ss ee tt ff ll mm oo

Order Of Frequency Of Initial Letters T O A W B C D S F M R H I Y E G L N P U J K

Order Of Frequency Of Final Letters E S T D N R Y F L O G H A K M P U W

One-Letter Words a, I.

Most Frequent Two-Letter Words of, to, in, it, is, be, as, at, so, we, he, by, or, on, do, if, me, my, up, an, go, no, us, am

Most Frequent Three-Letter Words the, and, for, are, but, not, you, all, any, can, had, her, was, one, our, out, day, get, has, him, his, how, man, new, now, old, see, two, way, who, boy, did, its, let, put, say, she, too, use

Most Frequent Four-Letter Words that, with, have, this, will, your, from, they, know, want, been, good, much, some, time

The inventor of Morse code, Samuel Morse (1791-1872), needed to know this so that he could give the simplest codes to the most frequently used letters. He did it simply by counting the number of letters in sets of printers' type. The figures he came up with were:

12,000 E 2,500 F
9,000 T 2,000 W, Y
8,000 A, I, N, O, S 1,700 G, P
6,400 H 1,600 B
6,200 R 1,200 V
4,400 D 800 K
4,000 L 500 Q
3,400 U 400 J, X
3,000 C, M 200 Z

However, this gives the frequency of letters in English text, which is dominated by a relatively small number of common words. For word games, it is often the frequency of letters in English vocabulary, regardless of word frequency, which is of more interest. We did an analysis of the letters occurring in the words listed in the main entries of the Concise Oxford Dictionary (11th edition revised, 2004) and came up with the following table:

E 11.1607% 56.88 M 3.0129% 15.36
A 8.4966% 43.31 H 3.0034% 15.31
R 7.5809% 38.64 G 2.4705% 12.59
I 7.5448% 38.45 B 2.0720% 10.56
O 7.1635% 36.51 F 1.8121% 9.24
T 6.9509% 35.43 Y 1.7779% 9.06
N 6.6544% 33.92 W 1.2899% 6.57
S 5.7351% 29.23 K 1.1016% 5.61
L 5.4893% 27.98 V 1.0074% 5.13
C 4.5388% 23.13 X 0.2902% 1.48
U 3.6308% 18.51 Z 0.2722% 1.39
D 3.3844% 17.25 J 0.1965% 1.00
P 3.1671% 16.14 Q 0.1962% (1)

The third column represents proportions, taking the least common letter (q) as equal to 1. The letter E is over 56 times more common than Q in forming individual English words.

The frequency of letters at the beginnings of words is different again. There are more English words beginning with the letter 's' than with any other letter. (This is mainly because clusters such as 'sc', 'sh', 'sp', and 'st' act almost like independent letters.) The letter 'e' only comes about halfway down the order, and the letter 'x' unsurprisingly comes last.

Хай квитнее Беларусь!
Старый 18.10.2015, 16:13   #2
хорошие цифры, жаль списки неполные, и без цифр.
вообще в идеале было бы такую статистику с цифрами и БЕЗ СТОПСЛОВ.
В принципе цифры можно восстановить по закону Зипфа. Точность будет приемлемой.
Вообще хочу переделать мой генератор свободных исходя из статистики а не из моего видения читабельности, но думаю не скоро доберусь до этого, если доберусь)

В любой ситуации выбор всегда за вами. Вы либо гуляете под дождем, либо просто под ним мокнете.
Старый 18.10.2015, 23:45   #3
Самый необходимый в таких случаях анализ - это взять ВСЕ аббревиатуры в мире и по ним составить статистику. Что конечно же анриал)

Хай квитнее Беларусь!
Старый 19.11.2015, 19:18   #4
Волшебные значения латинских букв.

Хай квитнее Беларусь!
