Portuguese / Português - text corpora
Posted: Sun Oct 04, 2020 8:16 am
For Portuguese, I used the following text corpora from uni-leipzig:
https://wortschatz.uni-leipzig.de/en/do ... portuguese:
por-an_web_2015_10K
por-br_newscrawl_2011_10K
por-cv_web_2015_10K
por-mo_newscrawl_2011_10K
por-mo_web_2016_10K
por-mz_web_2016_10K
por_newscrawl_2018_10K
por-pt_newscrawl_2011_10K
por-pt_web_2015_10K
por_wikipedia_2016_10K
---
total: 100K
---
Portuguese:
Portuguese Google Drive - corpora in folder sentences
---
Conversion tool for diacritics (ñ|á|é|í|ó|ú|Ñ|Á|É|Í|Ó|Ú):
https://drive.google.com/drive/folders/ ... sp=sharing
---
Conversion to all small characters:
Spanish with diacritics: ---
Spanish with converted diacritics (ñ|á|é|í|ó|ú|Ñ|Á|É|Í|Ó|Ú): ---
I uploaded the configuration files that I used for the optimization so that the optimization can be reproduced later:
Link: https://opt-in-layout.org/viewtopic.php?f=12&t=20
https://wortschatz.uni-leipzig.de/en/do ... portuguese:
por-an_web_2015_10K
por-br_newscrawl_2011_10K
por-cv_web_2015_10K
por-mo_newscrawl_2011_10K
por-mo_web_2016_10K
por-mz_web_2016_10K
por_newscrawl_2018_10K
por-pt_newscrawl_2011_10K
por-pt_web_2015_10K
por_wikipedia_2016_10K
---
total: 100K
---
Portuguese:
Portuguese Google Drive - corpora in folder sentences
---
Conversion tool for diacritics (ñ|á|é|í|ó|ú|Ñ|Á|É|Í|Ó|Ú):
https://drive.google.com/drive/folders/ ... sp=sharing
---
Conversion to all small characters:
Spanish with diacritics: ---
Spanish with converted diacritics (ñ|á|é|í|ó|ú|Ñ|Á|É|Í|Ó|Ú): ---
I uploaded the configuration files that I used for the optimization so that the optimization can be reproduced later:
Link: https://opt-in-layout.org/viewtopic.php?f=12&t=20
Code: Select all
./opt -2 spanish2020.txt -i 20000 -K optS1V1.cfg
Code: Select all
./opt -2 spanish2020.txt -i 20000 -K optS2V1.cfg
Code: Select all
echo SPANISH:;./opt -2 spanish2020.txt -r bsptast.txt -K controlS1V1.cfg;