Text Curation for Clustering of Free-Text Survey Responses
Anton Gefvert
Avancerad (30hp)
kl 13:15, Alan Turing (In English)
[Abstract]When issuing surveys, having the option for free-text answer fields is only feasible where the number of respondents is small, as the work to summarize the answers becomes unmanageable with a large number of responses. Using NLP techniques to cluster these answers and summarize them would allow a greater range of survey creators to incorporate free-text answers in their survey, without making their workload too large. Academic work in this domain is sparse, especially for smaller languages such as Swedish.
The Swedish company iMatrics is regularly hired to do this kind of summarizing, specifically for workplace-related surveys. Their method of clustering has been semi-automatic, where both manual preprocessing and postprocessing have been necessary to accomplish this task.
This thesis aims to explore if using more advanced, unsupervised NLP text representation methods, namely SentenceBERT and Sent2Vec, can improve upon these results and reduce the manual work needed for this task. Specifically, we try to answer three questions. Firstly, do the methods show good results? Secondly, can they remove the time-consuming postprocessing step of combining a large number of clusters into a smaller number? Lastly, can a model where unsupervised learning metrics can be shown to correlate to the real-world usability of the model, thus indicating that these metrics can be used to optimize the model for new data?
To answer these questions, we train and employ several Sent2Vec, SentenceBERT and, traditional baseline models, which are then compared using both internal and external metrics. A manual evaluation procedure is performed to assess the real-world usability of the clusterings looks like, to see how well the models perform as well as to see if there is any correlation between this result and the internal metrics for the clusterings. The results indicate that improving the text representation step is not sufficient for fully automating this task. Some of the models show promise in the results of human evaluation, but given the unsupervised nature of the problem and the large variance between models, it is difficult to predict the performance of new data. Thus, the models can serve as an improvement to the workflow, but the need for manual work remains. |
Reducing Power Consumption For Signal Computation In Radio Access Networks: Optimization With Linear Programming and Graph Attention Networks
Martin Kristoffer Nordberg
Avancerad (30hp)
kl 10:00, John Von Neumann (In English)
[Abstract]The radio access network (RAN) is a vital part of a mobile telecommunication system handling the connection between user equipment, such as a mobile device, and the core network. The newer generation of RAN has enabled greater virtualization through the Cloud-RAN architecture. This virtualization facilitates the usage of commercial off-the-shelf servers (COTS) in the network replacing specialized hardware servers and making it easier to scale up or down the network capacity after traffic demand. Especially when traffic demand is low, energy could be saved by switching off unnecessary servers in the network.
This thesis looks at how we efficiently can identify servers needed to meet traffic demand in a network consisting of both COTS servers and specialized hardware servers while trying to reduce the energy consumption of the network. We model the problem as a constraint optimization problem and generate problem instances with varying topologies, server profiles and traffic demands. These problem instances are then solved using two differently configured mixed integer linear programming (MILP) solvers and a greedy method. One of the solvers is configured to scale the traffic horizontally across the servers and used as the baseline. The other is configured to minimize energy consumption of the network.
We also look at how the problem could be reduced by identifying servers that are not needed to meet the current traffic demand using a graph attention network (GAT). GAT are neural networks specifically designed to work with graph-structured data. We train a GAT network and use its predictions to remove servers from a problem instance before trying to solve it with MILP. A random predictor is used as a comparison for the predictions made by the GAT.
Our results show that the MILP method generates the best solutions, but they suffer from a relatively slow computation time that grows quickly as the problem size increases. The GAT model shows promising results in making predictions regarding what servers should be included, making it possible to reduce the problem and solve it faster with MILP. |
Crawling records on the InterPlanetary Name System
Axel Gard
Avancerad (30hp)
kl 13:15, Alan Turing (In English)
[Abstract]This thesis studies the characteristics of data hosted on the interplanetary name system,
which is a part of the interplanetary file system. From these records, information such
as file names, locations, and sizes, was investigated. Data was collected on the number
of peers hosting the records, thereby determining the decentralization of the record on
the network. Data on how often content on the network changes, were collected and
investigated. In addition to evaluating records, a search engine was prototyped to show
how to integrate the data into a system. A large part of the network was crawled and the
rate of change was found to be high. Most of the peers were found to host HTML files.
Most content identifiers found were hosted by more than one peer. This means that a
search engine needs to be able to support text file formats and revisit peers regularly to be
up-to-date with the records. |
How to measure the true end user impact of an Energy Performance feature in a mobile network
Diba Rezaie
Avancerad (30hp)
kl 13:15, Alan Turing (In English)
[Abstract]The Information and Communication Technology (ICT) industry is one of the most energy consuming industries in the world. With the increase of the global mobile traffic users which is growing rapidly for each year, it is more important than ever for all industries to implement energy efficient methods to decrease the greenhouse gas emissions. This thesis looks into Ericsson, one of the largest companies within ICT, and how their energy efficient methods in LTE impacts the end-users. The experiment was conducted in a Ericsson laboratory in Lund and while the result in Quality of Service (latency, throughput, etc.) showed some poor results with the different energy efficient features enabled, performing a mean opinion score showed that the end-users were not as affected while browsing through different sites and streaming videos in 720p. While the experiment was performed on a smaller scale (4 User Equipments and 3 End-users) the result was promising enough for it to be conducted on a larger scale in the future. With enough data Ericsson and other ICT companies can be able to convince mobile operators to enable more energy efficient features (without it having any impact on the end-users) while contributing and fighting the climate change. |