Letter | Frequency (%) |
---|---|
a | 6.2193% |
á | 2.2355% |
b | 1.5582% |
c | 1.6067% |
č | 0.9490% |
d | 3.6019% |
ď | 0.0222% |
e | 7.6952% |
é | 1.3346% |
ě | 1.6453% |
f | 0.2732% |
g | 0.2729% |
h | 1.2712% |
ch | 1.1709% |
i | 4.3528% |
í | 3.2699% |
j | 2.1194% |
k | 3.7367% |
l | 3.8424% |
m | 3.2267% |
n | 6.5353% |
ň | 0.0814% |
o | 8.6664% |
ó | 0.0313% |
p | 3.4127% |
q | 0.0013% |
r | 3.6970% |
ř | 1.2166% |
s | 4.5160% |
š | 0.8052% |
t | 5.7268% |
ť | 0.0426% |
u | 3.1443% |
ú | 0.1031% |
ů | 0.6948% |
v | 4.6616% |
w | 0.0088% |
x | 0.0755% |
y | 1.9093% |
ý | 1.0721% |
z | 2.1987% |
ž | 0.9952% |
Relative letter frequencies (%)
Bigraphs
ST, PR, SK, CH, DN, TR
Trigraphs
PRO, UNI, OST, STA, ANI, OVA, YCH, STI, PRI, PRE, OJE, REN, IST, STR, EHO, TER, RED, ICH
Code
01.
/**
02.
* Prints out frequencies of input characters (in percent)
03.
* @param source input file
04.
* @param encoding encoding of the file
05.
*/
06.
public
static
void
count(File source, String encoding)
throws
UnsupportedEncodingException, IOException{
07.
BufferedReader reader =
new
BufferedReader(
new
InputStreamReader(
new
FileInputStream(source), encoding));
08.
09.
TreeMap<Character, Integer> occurences =
new
TreeMap<Character, Integer>();
10.
11.
String s =
null
;
12.
int
counter =
0
;
13.
while
((s = reader.readLine())!=
null
){
14.
for
(
int
i =
0
; i < s.length(); i++){
15.
counter++;
16.
Character curr = (Character) s.charAt(i);
17.
if
(occurences.get(curr) ==
null
){
18.
occurences.put(curr,
new
Integer(
1
));
19.
}
else
{
20.
occurences.put(curr, occurences.get(curr).intValue() +
1
);
21.
}
22.
}
23.
}
24.
25.
for
(Character ch : occurences.keySet()){
26.
System.out.println(ch.toString() +
": "
+ (occurences.get(ch).intValue()/(
double
)counter *
100
));
27.
}
28.
}
Sources
- KRÁLÍK, Jan. Czech Alphabet. The Czech Language [online]. 2001 [cit. 2012-09-18]. Available at WWW: http://www.czech-language.cz/alphabet/alph-prehled.html