Index of /divisions/hcs/nlplab/swectors

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory  -  
[   ]create_vectors.py2016-10-25 10:07 1.5K 
[DIR]sentences/2020-06-22 14:43 -  
[TXT]suc.saldo2016-06-06 15:26 398K 
[   ]swectors-300dim.txt.bz22016-07-16 04:30 187M 



This is the code used for the paper "Towards a Standard Dataset of Swedish Word Vectors", pdf can be found here.

Creating Swedish word vectors

The script takes four parameters, method (cbow or sgns), dimensionality, window size and iterations.

Such as:

python3 cbow 300 10 40
python3 sgns 50 10 5

A textfile is created where each the first index of each row is a unique word and the rest of the row is each element of the vector, separated by spaces.


The training set is located in 'sentences', each row corresponds to one sentence. Included is a sample of Göteborgsposten-2013 (100k rows).


Tools and instructions to how the use QVEC-CCA can be found here. For a quick start, simply download the file and add 'suc.saldo', then use the following line to evalute a set a vectors.
./ --in_vectors /path/to/vecs --in_oracle suc.saldo


Per Fallgren

Jesper Segeblad

Marco Kuhlmann