Name | Last modified | Size | Description | |
---|---|---|---|---|
Parent Directory | - | |||
suc.saldo | 2016-06-06 15:26 | 398K | ||
swectors-300dim.txt.bz2 | 2016-07-16 04:30 | 187M | ||
create_vectors.py | 2016-10-25 10:07 | 1.5K | ||
sentences/ | 2020-06-22 14:43 | - | ||
This is the code used for the paper "Towards a Standard Dataset of Swedish Word Vectors", pdf can be found here.
The script takes four parameters, method (cbow or sgns), dimensionality, window size and iterations.
Such as:
python3 create_vectors.py cbow 300 10 40
python3 create_vectors.py sgns 50 10 5
A textfile is created where each the first index of each row is a unique word and the rest of the row is each element of the vector, separated by spaces.
The training set is located in 'sentences', each row corresponds to one sentence. Included is a sample of Göteborgsposten-2013 (100k rows).
./qvec_cca.py --in_vectors /path/to/vecs --in_oracle suc.saldo