Arabic / اَللُّغَةُ اَلْعَرَبِيَّة - text corpora

Post Reply
User avatar
Optilon
Site Admin
Posts: 50
Joined: Mon Aug 31, 2020 8:36 am

Arabic / اَللُّغَةُ اَلْعَرَبِيَّة - text corpora

Post by Optilon »

For Arabic, I used the following text corpora from uni-leipzig:
https://wortschatz.uni-leipzig.de/en/download/arabic
ara_news_2016_10K
ara_news_2017_10K
ara_wikipedia_2012_10K
ara_wikipedia_2016_10K
ara-ae_web_2017_10K
ara-eg_web_2015_10K
ara-ps_newscrawl_2012_10K
ara-sy_newscrawl_2012_10K
ara-tn_newscrawl_2012_10K
https://wortschatz.uni-leipzig.de/en/do ... ian-arabic:
arz_wikipedia_2016_10K
---
total: 100K
---
Arabic: Arabic 100k - 10 files
---
Character frequency with symbols:
arabic-characterfrequency-with-symbols.png
arabic-characterfrequency-with-symbols.png (190.75 KiB) Viewed 19668 times
---
Arabic optimized with optS1V1.cfg:
arabicS1V1arabic.png
arabicS1V1arabic.png (39.6 KiB) Viewed 19674 times
---
Maximum possible hand alternation 63.60% (approximate, for 26 most frequent letters):
max hand alternation arabic.png
max hand alternation arabic.png (40.1 KiB) Viewed 19674 times
---
transliteration chart for arabic: (for letters with a green background: i'm very sure that the transliteration is correct. I'm not sure about the ones with orange background. their transliteration was chosen in such a way that the most frequent letters correspond to the most frequently used letters in Latin alphabets and preferential assignment of vowel-like letters to vowels and consonant-like letters to consonants. letters beginning with a small "s" are typed with "shift"+letter)
arabic transliteration chart.png
arabic transliteration chart.png (128.09 KiB) Viewed 19673 times
arabic transliteration.ods
(31.71 KiB) Downloaded 1137 times
---
Transliteration tool for arabic (ا|ل|ي|م|و|ن|ر|ت|ب|ع|ة|د|ف|ه|س|ق|ك|ح|أ|ج|ى|ش|ط|ص|خ|ض|إ|ز|ذ|ث|ئ|غ|ء|ظ|ؤ):
arabictoroman.ps1
(995 Bytes) Downloaded 1099 times
---
Conversion to all small characters: viewtopic.php?f=12&t=8
characterfrequency.ods
(90.51 KiB) Downloaded 1098 times
---
Arabic romanized with diacritics and symbols:
arabic-roman-characterfrequency-with-symbols.png
arabic-roman-characterfrequency-with-symbols.png (150.4 KiB) Viewed 19668 times
---
I uploaded the configuration files that I used for the optimization so that the optimization can be reproduced later:
Link: viewtopic.php?f=12&t=20

Code: Select all

./opt -2 arabic2020.txt -i 20000 -K optS1V1.cfg

Code: Select all

./opt -2 arabicroman2020.txt -i 20000 -K optS1V1.cfg

Code: Select all

echo ARABIC:;./opt -2 arabicroman2020.txt -r bsptast.txt -K controlS1V1.cfg;
---
First optimization result:
arabicromanS1V1.png
arabicromanS1V1.png (266.23 KiB) Viewed 19666 times

Code: Select all

qwertyuiop■☻asdfghjkl▓█▒░zxcvbnm,. 	QWERTY
▓,.pyfgcrl■☻aoeuidhtns█▒░qjkxbmwvz 	DVORAK-EN
wflcgqku,y■☻rsntdoeaih█▒░vmpbzx.j▓ 	optLAT-S1-V1
wqrdfgockv■☻snlmbeuait█▒░j.zpxh,y▓ 	optAR-S1-V1
▓pfocqdeg█■☻bmautrlinsv▒zyjw.h,kx░ 	ar-lulua
User avatar
Optilon
Site Admin
Posts: 50
Joined: Mon Aug 31, 2020 8:36 am

Re: Arabic / اَللُّغَةُ اَلْعَرَبِيَّة - text corpora

Post by Optilon »

Further optimizations:
Arabic Layout with normal shift keys (2 shift keys): optAR-S2-V1
arabic-S2-V1-layout.png
arabic-S2-V1-layout.png (77.21 KiB) Viewed 19658 times

Code: Select all

[{c:"#282828",t:"#3386ff\n\n#47c930\n\n\n\n\n\n\n#ffffff",f:6,fa:[0,0,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"\n\nF1\n\n\n\n\n\n\nEsc",{w:1.5,h:1.5},"\n\nF2\n\n\n\n\n\n\n1",{w:1.5,h:1.5},"\n\nF3\n\n\n\n\n\n\n2",{w:1.5,h:1.5},"\n\nF4\n\n\n\n\n\n\n3",{w:1.5,h:1.5},"\n\nF5\n\n\n\n\n\n\n4",{w:1.5,h:1.5},"\n\nF6\n\n\n\n\n\n\n5",{w:1.5,h:1.5},"\n\nF7\n\n\n\n\n\n\n6",{w:1.5,h:1.5},"\n\nF8\n\n\n\n\n\n\n7",{w:1.5,h:1.5},"\n\nF9\n\n\n\n\n\n\n8",{w:1.5,h:1.5},"\n\nF10\n\n\n\n\n\n\n9",{w:1.5,h:1.5},"\n\nF11\n\n\n\n\n\n\n0",{t:"#3386ff\n\n#47c930\n#c20f3d\n\n\n\n\n\n#ffffff",w:1.5,h:1.5},"\n\nF12\nInsert\n\n\n\n\n\nDel"],
[{y:0.5,t:"#ffffff",a:7,fa:[9],w:1.5,h:1.5},"Tab",{t:"#3386ff\n#ff8414",a:4,fa:[9,5,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"\nW\n\n\n\n\n\n\n\nش",{w:1.5,h:1.5},"\nK\n\n\n\n\n\n\n\nك",{w:1.5,h:1.5},"\nC\n\n\n\n\n\n\n\nأ",{w:1.5,h:1.5},"\nO\n\n\n\n\n\n\n\nة",{w:1.5,h:1.5},"\nG\n\n\n\n\n\n\n\nح",{w:1.5,h:1.5},"\nF\n\n\n\n\n\n\n\nف",{fa:[0,5,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"ذ\nD\n\n\n\n\n\n\n\nد",{w:1.5,h:1.5},"\nR\n\n\n\n\n\n\n\nر",{w:1.5,h:1.5},"\nQ\n\n\n\n\n\n\n\nق",{w:1.5,h:1.5},"\n,\n\n\n\n\n\n\n\n،",{t:"#ffffff",a:7,fa:[9],w:1.5,h:1.5},"Back"],
[{y:0.5,t:"#47c930",w:1.5,h:1.5},"Sym",{t:"#3386ff\n#ff8414",a:4,fa:[9,5,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"\nS\n\n\n\n\n\n\n\nس",{fa:[0,5,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"ئ\nI\n\n\n\n\n\n\n\nي",{w:1.5,h:1.5},"إ\nA\n\n\n\n\n\n\n\nا",{w:1.5,h:1.5,n:true},"ؤ\nU\n\n\n\n\n\n\n\nو",{w:1.5,h:1.5},"غ\nE\n\n\n\n\n\n\n\nع",{w:1.5,h:1.5},"ض\nB\n\n\n\n\n\n\n\nب",{w:1.5,h:1.5,n:true},"\nM\n\n\n\n\n\n\n\nم",{w:1.5,h:1.5},"\nL\n\n\n\n\n\n\n\nل",{w:1.5,h:1.5},"\nN\n\n\n\n\n\n\n\nن",{w:1.5,h:1.5},"ث\nT\n\n\n\n\n\n\n\nت",{t:"#ffffff",a:7,fa:[9],w:1.5,h:1.5},"Enter"],
[{y:0.5,t:"#3386ff",w:3,h:1.5},"Shift",{t:"#3386ff\n#ff8414",a:4,fa:[9,5,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"\nV\n\n\n\n\n\n\n\nص",{w:1.5,h:1.5},"\nY\n\n\n\n\n\n\n\nى",{w:1.5,h:1.5},"\nH\n\n\n\n\n\n\n\nه",{w:1.5,h:1.5},"\nX\n\n\n\n\n\n\n\nخ",{w:1.5,h:1.5},"\nP\n\n\n\n\n\n\n\nط",{fa:[0,5,0,0,0,0,0,0,0,9],w:1.5,h:1.5},"ظ\nZ\n\n\n\n\n\n\n\nز",{w:1.5,h:1.5},"\n.\n\n\n\n\n\n\n\n.",{w:1.5,h:1.5},"\nJ\n\n\n\n\n\n\n\nج",{t:"#3386ff",a:7,fa:[9],w:3,h:1.5},"Shift"],
[{y:0.5,t:"#ffffff",w:1.5,h:1.5},"Ctrl",{w:1.5,h:1.5},"❖",{w:1.5,h:1.5},"Alt",{t:"#c20f3d",w:1.5,h:1.5},"Char",{t:"#ffffff",w:6,h:1.5},"Space",{t:"#f5ef36\n#ff8414\n\n\n\n\n#ff8414",a:5,fa:[9,3,0,0,0,0,9],w:1.5,h:1.5},"\n(Toggle)\n\n\n\n\nAlt Gr",{t:"#ffffff\n\n#47c930",a:4,fa:[9,3,0,0,0,0,9,0,0,9],w:1.5,h:1.5},"\n\n←\n\n\n\n\n\n\n←",{w:1.5,h:0.75},"\n\n↑\n\n\n\n\n\n\n↑",{w:1.5,h:1.5},"\n\n→\n\n\n\n\n\n\n→"],
[{y:-0.25,x:15,w:1.5,h:0.75},"\n\n↓\n\n\n\n\n\n\n↓"]
For 100% Arabic:
arabicromanS2V1-ARABIC.png
arabicromanS2V1-ARABIC.png (321.36 KiB) Viewed 19658 times
For 50% Arabic and 50% English:
arabicromanS2V1-AR50EN50.png
arabicromanS2V1-AR50EN50.png (325.34 KiB) Viewed 19658 times
For 100% English:
arabicromanS2V1-ENGLISH.png
arabicromanS2V1-ENGLISH.png (318.97 KiB) Viewed 19658 times
Post Reply