Finding files in directory... Files found in directory: 11579 Loading files info... [10000/11579] 86.3632% | 1000 items in 0.114279 sec | 1.14279s | Total: 1.32323s Files info loaded Detecting language of files... [10000/11579] 86.3632% | 1000 items in 0.0068935 sec | 0.068935s | Total: 0.0798198s ru: 11560 99.8359% en: 1 0.00863632% total: 11579 Detecting language of files done Categorizing started... Categorizing en files other: 1 files ./source_dir/20200214/14/2638876896949915693.html Preparing files... files readed [other] words: 58 [other] words after min filter: 56 [other] pairs built [other] grouped all threads done [1/1] 100% | 1000 items in 1.982 sec | 0.001982s ( groups: 0.001971s/99.445% ) | Total: 0.001982s done Categorizing ru files [5000/11560] 43.2526% | 1000 items in 0.231133 sec | 1.15566s | Total: 2.6719s [10000/11560] 86.5052% | 1000 items in 0.239613 sec | 2.39614s | Total: 2.76993s entertainment: 1510 files ./source_dir/20200214/08/4265183765121971895.html sports: 925 files ./source_dir/20200214/12/13991501620720024.html society: 5152 files ./source_dir/20200214/15/7204039799081050050.html science: 1098 files ./source_dir/20200214/10/5071570187983611913.html other: 295 files ./source_dir/20200214/14/5884236726315539249.html technology: 1090 files ./source_dir/20200214/15/1360813307680623435.html economy: 1490 files ./source_dir/20200214/11/1157119135192720645.html Preparing files... files readed [technology] words: 30017 [other] words: 10572 [entertainment] words: 39026 [economy] words: 30235 [science] words: 28756 [society] words: 61834 [other] words after min filter: 10571 [sports] words: 21682 [technology] words after min filter: 12649 [science] words after min filter: 12212 [sports] words after min filter: 9138 [entertainment] words after min filter: 15933 [economy] words after min filter: 14089 [other] pairs built [other] grouped [society] words after min filter: 32220 [sports] pairs built [sports] grouped [science] pairs built [technology] pairs built [science] grouped [technology] grouped [entertainment] pairs built [economy] pairs built [entertainment] grouped [economy] grouped [society] pairs built [society] grouped all threads done 10514 ./source_dir/20200214/09/9118270336134658106.html 11409 ./source_dir/20200214/12/4773894379368937889.html 11444 ./source_dir/20200214/10/9173763593309557437.html 10891 ./source_dir/20200214/10/1157119136385582949.html 11331 ./source_dir/20200214/13/3273163147746842856.html 10926 ./source_dir/20200214/09/1894541394677411613.html ---------------------------------- 1998 ./source_dir/20200214/15/4773894379525400766.html 2448 ./source_dir/20200214/11/5116951207728311987.html 2748 ./source_dir/20200214/15/7912524225517994349.html 2172 ./source_dir/20200214/10/7924146019272960283.html ---------------------------------- 977 ./source_dir/20200214/09/4775004977765990848.html 1465 ./source_dir/20200214/09/4775004977287150676.html 969 ./source_dir/20200214/09/4775004977075920253.html 477 ./source_dir/20200214/09/4775004976222910550.html 275 ./source_dir/20200214/09/4775004976276812309.html 349 ./source_dir/20200214/09/4775004976128943852.html ---------------------------------- 3786 ./source_dir/20200214/10/1395614417633861246.html 4038 ./source_dir/20200214/17/652432217408074669.html 4276 ./source_dir/20200214/14/5245498076308274680.html 3887 ./source_dir/20200214/10/4521806110950635868.html ---------------------------------- 3013 ./source_dir/20200214/12/7138596758441784874.html 3175 ./source_dir/20200214/11/6269024783524649317.html ---------------------------------- 8426 ./source_dir/20200214/15/1894541394421482208.html 8616 ./source_dir/20200214/16/4311596829867560174.html 8388 ./source_dir/20200214/15/121480015735476884.html 8884 ./source_dir/20200214/15/2038992938683054463.html 8219 ./source_dir/20200214/14/4764842325053012262.html 8202 ./source_dir/20200214/13/770303467444676195.html 7876 ./source_dir/20200214/15/4799432765934168691.html 7265 ./source_dir/20200214/17/1894541393431369783.html 6193 ./source_dir/20200214/14/7267840628449533362.html 6615 ./source_dir/20200214/15/3531949787757397384.html 5731 ./source_dir/20200214/13/1818969637326721521.html 5914 ./source_dir/20200214/14/7113642701978408440.html 7164 ./source_dir/20200214/15/6827084786017887988.html 7826 ./source_dir/20200214/13/2576472881826808150.html 5557 ./source_dir/20200214/12/6827084786240431167.html 4497 ./source_dir/20200214/15/2857286274642814158.html 4932 ./source_dir/20200214/12/4230842155668939114.html 4976 ./source_dir/20200214/14/1688417001513078900.html 4849 ./source_dir/20200214/14/4773894379182369067.html 8619 ./source_dir/20200214/13/1839581755845507030.html 6854 ./source_dir/20200214/13/4265183765477744733.html 4444 ./source_dir/20200214/15/1193690860845703341.html 8429 ./source_dir/20200214/15/3344377948272258251.html ---------------------------------- 9889 ./source_dir/20200214/11/1851638045735421791.html 10264 ./source_dir/20200214/10/7916917744315257887.html 10382 ./source_dir/20200214/10/4307021368311650566.html 9866 ./source_dir/20200214/11/8639226688021596849.html 10052 ./source_dir/20200214/09/4853830194939075350.html ---------------------------------- [11560/11560] 100% | 1000 items in 0.746707 sec | 8.63193s ( groups: 8.56235s/99.1939% ) | Total: 8.63193s done Categorizing done