# Publications

### 2019

 (J) Christos Diou, Ioannis Sarafis, Vasileios Papapanagiotou, Ioannis Ioakimidis and Anastasios Delopoulos Statistical Journal of the IAOS, 35, (4), pp. 677-690, 2019 Dec [Abstract][BibTex][pdf]The way we eat and what we eat, the way we move and the way we sleep significantly impact the risk of becoming obese. These aspects of behavior decompose into several personal behavioral elements including our food choices, eating place preferences, transportation choices, sleeping periods and duration etc. Most of these elements are highly correlated in a causal way with the conditions of our local urban, social, regulatory and economic environment. To this end, the H2020 project “BigO: Big Data Against Childhood Obesity” (http://bigoprogram.eu) aims to create new sources of evidence together with exploration tools, assisting the Public Health Authorities in their effort to tackle childhood obesity. In this paper, we present the technology-based methodology that has been developed in the context of The way we eat and what we eat, the way we move and the way we sleep significantly impact the risk of becoming obese. These aspects of behavior decompose into several personal behavioral elements including our food choices, eating place preferences, transportation choices, sleeping periods and duration etc. Most of these elements are highly correlated in a causal way with the conditions of our local urban, social, regulatory and economic environment. To this end, the H2020 project “BigO: Big Data Against Childhood Obesity” (http://bigoprogram.eu) aims to create new sources of evidence together with exploration tools, assisting the Public Health Authorities in their effort to tackle childhood obesity. In this paper, we present the technology-based methodology that has been developed in the context of BigO in order to: (a) objectively monitor a matrix of a population’s obesogenic behavioral elements using commonly available wearable sensors (accelerometers, gyroscopes, GPS), embedded in smart phones and smart watches; (b) acquire information for the environment from open and online data sources; (c) provide aggregation mechanisms to correlate the population behaviors with the environmental characteristics; (d) ensure the privacy protection of the participating individuals; and (e) quantify the quality of the collected big data. BigO in order to: (a) objectively monitor a matrix of a population’s obesogenic behavioral elements using commonly available wearable sensors (accelerometers, gyroscopes, GPS), embedded in smart phones and smart watches; (b) acquire information for the environment from open and online data sources; (c) provide aggregation mechanisms to correlate the population behaviors with the environmental characteristics; (d) ensure the privacy protection of the participating individuals; and (e) quantify the quality of the collected big data.@article{DiouIAOS2019,author={Christos Diou and Ioannis Sarafis and Vasileios Papapanagiotou and Ioannis Ioakimidis and Anastasios Delopoulos},title={A methodology for obtaining objective measurements of population obesogenic behaviors in relation to the environment},journal={Statistical Journal of the IAOS},volume={35},number={4},pages={677-690},year={2019},month={12},date={2019-12-10},url={https://arxiv.org/pdf/1911.08315.pdf},doi={http://10.3233/SJI-190537},abstract={The way we eat and what we eat, the way we move and the way we sleep significantly impact the risk of becoming obese. These aspects of behavior decompose into several personal behavioral elements including our food choices, eating place preferences, transportation choices, sleeping periods and duration etc. Most of these elements are highly correlated in a causal way with the conditions of our local urban, social, regulatory and economic environment. To this end, the H2020 project “BigO: Big Data Against Childhood Obesity” (http://bigoprogram.eu) aims to create new sources of evidence together with exploration tools, assisting the Public Health Authorities in their effort to tackle childhood obesity. In this paper, we present the technology-based methodology that has been developed in the context of The way we eat and what we eat, the way we move and the way we sleep significantly impact the risk of becoming obese. These aspects of behavior decompose into several personal behavioral elements including our food choices, eating place preferences, transportation choices, sleeping periods and duration etc. Most of these elements are highly correlated in a causal way with the conditions of our local urban, social, regulatory and economic environment. To this end, the H2020 project “BigO: Big Data Against Childhood Obesity” (http://bigoprogram.eu) aims to create new sources of evidence together with exploration tools, assisting the Public Health Authorities in their effort to tackle childhood obesity. In this paper, we present the technology-based methodology that has been developed in the context of BigO in order to: (a) objectively monitor a matrix of a population’s obesogenic behavioral elements using commonly available wearable sensors (accelerometers, gyroscopes, GPS), embedded in smart phones and smart watches; (b) acquire information for the environment from open and online data sources; (c) provide aggregation mechanisms to correlate the population behaviors with the environmental characteristics; (d) ensure the privacy protection of the participating individuals; and (e) quantify the quality of the collected big data. BigO in order to: (a) objectively monitor a matrix of a population’s obesogenic behavioral elements using commonly available wearable sensors (accelerometers, gyroscopes, GPS), embedded in smart phones and smart watches; (b) acquire information for the environment from open and online data sources; (c) provide aggregation mechanisms to correlate the population behaviors with the environmental characteristics; (d) ensure the privacy protection of the participating individuals; and (e) quantify the quality of the collected big data.}}

### 2019

 (C) Ioannis Sarafis, Christos Diou, Ioannis Ioakimidis and Anastasios Delopoulos 41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019 Jul [Abstract][BibTex][pdf]Certain patterns of eating behaviour during meal have been identified as risk factors for long-term abnormal eating development in healthy individuals and, eventually, can affect the body weight. To detect early signs of problematic eating behaviour, this paper proposes a novel method for building behaviour assessment models. The goal of the models is to predict whether the in-meal eating behaviour resembles patterns associated with obesity, eating disorders, or low-risk behaviours. The models are trained using meals recorded with a plate scale from a reference population and labels annotated by a domain expert. In addition, the domain expert assigned scores that characterise the degree of any exhibited abnormal patterns. To improve model effectiveness, we use the domain expert’s scores to create training error regularisation weights that alter the importance of each training instance for its class during model training. The behaviour assessment models are based on the SVM algorithm and the fuzzy SVM algorithm for their instance-weighted variation. Experiments conducted on meals recorded from 120 individuals show that: (a) the proposed approach can produce effective models for eating behaviour classification (for individuals), or for ranking (for populations); and (b) the instance-weighted fuzzy SVM models achieve significant performance improvements, compared to the non-weighted, standard SVM models.@conference{sarafis2019assessment,author={Ioannis Sarafis and Christos Diou and Ioannis Ioakimidis and Anastasios Delopoulos},title={Assessment of In-Meal Eating Behaviour using Fuzzy SVM},booktitle={41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},year={2019},month={07},date={2019-07-27},url={https://mug.ee.auth.gr/wp-content/uploads/sarafis2019assessment.pdf},doi={https://doi.org/10.1109/EMBC.2019.8857606},abstract={Certain patterns of eating behaviour during meal have been identified as risk factors for long-term abnormal eating development in healthy individuals and, eventually, can affect the body weight. To detect early signs of problematic eating behaviour, this paper proposes a novel method for building behaviour assessment models. The goal of the models is to predict whether the in-meal eating behaviour resembles patterns associated with obesity, eating disorders, or low-risk behaviours. The models are trained using meals recorded with a plate scale from a reference population and labels annotated by a domain expert. In addition, the domain expert assigned scores that characterise the degree of any exhibited abnormal patterns. To improve model effectiveness, we use the domain expert’s scores to create training error regularisation weights that alter the importance of each training instance for its class during model training. The behaviour assessment models are based on the SVM algorithm and the fuzzy SVM algorithm for their instance-weighted variation. Experiments conducted on meals recorded from 120 individuals show that: (a) the proposed approach can produce effective models for eating behaviour classification (for individuals), or for ranking (for populations); and (b) the instance-weighted fuzzy SVM models achieve significant performance improvements, compared to the non-weighted, standard SVM models.}} (C) Ioannis Sarafis, Christos Diou and Anastasios Delopoulos 41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019 Jul [Abstract][BibTex][pdf]Obesity is a preventable disease that affects the health of a significant population percentage, reduces the life expectancy and encumbers the health care systems. The obesity epidemic is not caused by isolated factors, but it is the result of multiple behavioural patterns and complex interactions with the living environment. Therefore, in-depth understanding of the population behaviour is essential in order to create successful policies against obesity prevalence. To this end, the BigO system facilitates the collection, processing and modelling of behavioural data at population level to provide evidence for effective policy and interventions design. In this paper, we introduce the behaviour profiles mechanism of BigO that produces comprehensive models for the behavioural patterns of individuals, while maintaining high levels of privacy protection. We give examples for the proposed mechanism from real world data and we discuss usages for supporting various types of evidence-based policy design.@conference{sarafis2019behaviour,author={Ioannis Sarafis and Christos Diou and Anastasios Delopoulos},title={Behaviour Profiles for Evidence-based Policies Against Obesity},booktitle={41th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},year={2019},month={07},date={2019-07-26},url={https://mug.ee.auth.gr/wp-content/uploads/sarafis2019behaviour.pdf},doi={https://doi.org/10.1109/EMBC.2019.8857161},abstract={Obesity is a preventable disease that affects the health of a significant population percentage, reduces the life expectancy and encumbers the health care systems. The obesity epidemic is not caused by isolated factors, but it is the result of multiple behavioural patterns and complex interactions with the living environment. Therefore, in-depth understanding of the population behaviour is essential in order to create successful policies against obesity prevalence. To this end, the BigO system facilitates the collection, processing and modelling of behavioural data at population level to provide evidence for effective policy and interventions design. In this paper, we introduce the behaviour profiles mechanism of BigO that produces comprehensive models for the behavioural patterns of individuals, while maintaining high levels of privacy protection. We give examples for the proposed mechanism from real world data and we discuss usages for supporting various types of evidence-based policy design.}}

### 2018

 (J) Ioannis Sarafis, Christos Diou and Anastasios Delopoulos CoRR, abs/1809.06124, 2018 Sep [Abstract][BibTex][pdf]Weighted SVM (or fuzzy SVM) is the most widely used SVM variant owning its effectiveness to the use of instance weights. Proper selection of the instance weights can lead to increased generalization performance. In this work, we extend the span error bound theory to weighted SVM and we introduce effective hyperparameter selection methods for the weighted SVM algorithm. The significance of the presented work is that enables the application of span bound and span-rule with weighted SVM. The span bound is an upper bound of the leave-one-out error that can be calculated using a single trained SVM model. This is important since leave-one-out error is an almost unbiased estimator of the test error. Similarly, the span-rule gives the actual value of the leave-one-out error. Thus, one can apply span bound and span-rule as computationally lightweight alternatives of leave-one-out procedure for hyperparameter selection. The main theoretical contributions are: (a) we prove the necessary and sufficient condition for the existence of the span of a support vector in weighted SVM; and (b) we prove the extension of span bound and span-rule to weighted SVM. We experimentally evaluate the span bound and the span-rule for hyperparameter selection and we compare them with other methods that are applicable to weighted SVM: the K-fold cross-validation and the $\xi - \alpha$ bound. Experiments on 14 benchmark data sets and data sets with importance scores for the training instances show that: (a) the condition for the existence of span in weighted SVM is satisfied almost always; (b) the span-rule is the most effective method for weighted SVM hyperparameter selection; (c) the span-rule is the best predictor of the test error in the mean square error sense; and (d) the span-rule is efficient and, for certain problems, it can be calculated faster than K-fold cross-validation.@article{Sarafis2018CoRR,author={Ioannis Sarafis and Christos Diou and Anastasios Delopoulos},title={Span error bound for weighted SVM with applications in hyperparameter selection (preprint)},journal={CoRR},volume={abs/1809.06124},year={2018},month={09},date={2018-09-17},url={https://arxiv.org/pdf/1809.06124.pdf},abstract={Weighted SVM (or fuzzy SVM) is the most widely used SVM variant owning its effectiveness to the use of instance weights. Proper selection of the instance weights can lead to increased generalization performance. In this work, we extend the span error bound theory to weighted SVM and we introduce effective hyperparameter selection methods for the weighted SVM algorithm. The significance of the presented work is that enables the application of span bound and span-rule with weighted SVM. The span bound is an upper bound of the leave-one-out error that can be calculated using a single trained SVM model. This is important since leave-one-out error is an almost unbiased estimator of the test error. Similarly, the span-rule gives the actual value of the leave-one-out error. Thus, one can apply span bound and span-rule as computationally lightweight alternatives of leave-one-out procedure for hyperparameter selection. The main theoretical contributions are: (a) we prove the necessary and sufficient condition for the existence of the span of a support vector in weighted SVM; and (b) we prove the extension of span bound and span-rule to weighted SVM. We experimentally evaluate the span bound and the span-rule for hyperparameter selection and we compare them with other methods that are applicable to weighted SVM: the K-fold cross-validation and the $\\xi - \\alpha$ bound. Experiments on 14 benchmark data sets and data sets with importance scores for the training instances show that: (a) the condition for the existence of span in weighted SVM is satisfied almost always; (b) the span-rule is the most effective method for weighted SVM hyperparameter selection; (c) the span-rule is the best predictor of the test error in the mean square error sense; and (d) the span-rule is efficient and, for certain problems, it can be calculated faster than K-fold cross-validation.}}

### 2017

 (C) Christos Diou, Ioannis Sarafis, Ioannis Ioakimidis and Anastasios Delopoulos "Data-driven assessments for sensor measurements of eating behavior" Biomedical & Health Informatics (BHI), 2017 IEEE EMBS International Conference on, pp. 129-132, 2017 Jan [Abstract][BibTex][pdf]Two major challenges in sensor-based measurement and assessment of healthy eating behavior are (a) choosing the behavioral indicators to be measured, and (b) interpreting the measured values. While much of the work towards solving these problems belongs in the domain of behavioral science, there are several areas where technology can help. This paper outlines an approach for representing and interpreting eating and activity behavior based on sensor measurements and data available from a reference population. The main idea is to assess the “similarity” of an individual\'s behavior to previous data recordings of a relevant reference population. Thus, by appropriate selection of the indicators and reference data it is possible to perform comparative behavioral evaluation and support decisions, even in cases where no clear medical guidelines for the indicator values exist. We examine the simple, univariate case (one indicator) and then extend these ideas to the multivariate problem (several indicators) using one-class SVM to measure the distance from the reference population.@inproceedings{diou2017data,author={Christos Diou and Ioannis Sarafis and Ioannis Ioakimidis and Anastasios Delopoulos},title={Data-driven assessments for sensor measurements of eating behavior},booktitle={Biomedical & Health Informatics (BHI), 2017 IEEE EMBS International Conference on},pages={129-132},year={2017},month={01},date={2017-01-01},url={http://ieeexplore.ieee.org/document/7897222/},abstract={Two major challenges in sensor-based measurement and assessment of healthy eating behavior are (a) choosing the behavioral indicators to be measured, and (b) interpreting the measured values. While much of the work towards solving these problems belongs in the domain of behavioral science, there are several areas where technology can help. This paper outlines an approach for representing and interpreting eating and activity behavior based on sensor measurements and data available from a reference population. The main idea is to assess the “similarity” of an individual\\'s behavior to previous data recordings of a relevant reference population. Thus, by appropriate selection of the indicators and reference data it is possible to perform comparative behavioral evaluation and support decisions, even in cases where no clear medical guidelines for the indicator values exist. We examine the simple, univariate case (one indicator) and then extend these ideas to the multivariate problem (several indicators) using one-class SVM to measure the distance from the reference population.}}

### 2016

 (J) Ioannis Sarafis, Christos Diou and Anastasios Delopoulos "Online training of concept detectors for image retrieval using streaming clickthrough data" Engineering Applications of Artificial Intelligence, 51, pp. 150-162, 2016 Jan [Abstract][BibTex][pdf]Clickthrough data from image search engines provide a massive and continuously generated source of user feedback that can be used to model how the search engine users perceive the visual content. Image clickthrough data have been successfully used to build concept detectors without any manual annotation effort, although the generated annotations suffer from labeling errors. Previous research efforts therefore focused on modeling the sample uncertainty in order to improve concept detector effectiveness. In this paper, we study the problem in an online learning setting using streaming clickthrough data where each click is treated seperately when it becomes available; the concept detector model is therefore continuously updated without batch retraining. We argue that sample uncertainty can be incorporated in the online learning setting by exploiting the repetitions of incoming clicks at the classifier level, where these act as an implicit importance weighting mechanism. For online concept detector training we use the LASVM algorithm. The inferred weighting approximates the solution of batch trained concept detectors using weighted SVM variants that are known to achieve improved performance and high robustness to noise compared to the standard SVM. Furthermore, we evaluate methods for selecting negative samples using a small number of candidates sampled locally from the incoming stream of clicks. The selection criteria aim at drastically improving the performance and the convergence speed of the online concept detectors. To validate our arguments we conduct experiments for 30 concepts on the Clickture-Lite dataset. The experimental results demonstrate that: (a) the proposed online approach produces effective and noise resilient concept detectors that can take advantage of streaming clickthrough data and achieve performance that is equivalent to Fuzzy SVM concept detectors with sample weights and 78.6% improved compared to standard SVM concept detectors; and (b) the selection criteria speed up convergence and improve effectiveness compared to random negative sampling even for a small number of available clicks (up to 134% after 100 clicks).@article{Sarafis2016Online,author={Ioannis Sarafis and Christos Diou and Anastasios Delopoulos},title={Online training of concept detectors for image retrieval using streaming clickthrough data},journal={Engineering Applications of Artificial Intelligence},volume={51},pages={150-162},year={2016},month={01},date={2016-01-29},url={http://www.sciencedirect.com/science/article/pii/S095219761600021X},doi={http://dx.doi.org/10.1016/j.engappai.2016.01.017},keywords={Clickthrough data;Online learning;Image retrieval;Label noise;Fuzzy SVM;LASVM},abstract={Clickthrough data from image search engines provide a massive and continuously generated source of user feedback that can be used to model how the search engine users perceive the visual content. Image clickthrough data have been successfully used to build concept detectors without any manual annotation effort, although the generated annotations suffer from labeling errors. Previous research efforts therefore focused on modeling the sample uncertainty in order to improve concept detector effectiveness. In this paper, we study the problem in an online learning setting using streaming clickthrough data where each click is treated seperately when it becomes available; the concept detector model is therefore continuously updated without batch retraining. We argue that sample uncertainty can be incorporated in the online learning setting by exploiting the repetitions of incoming clicks at the classifier level, where these act as an implicit importance weighting mechanism. For online concept detector training we use the LASVM algorithm. The inferred weighting approximates the solution of batch trained concept detectors using weighted SVM variants that are known to achieve improved performance and high robustness to noise compared to the standard SVM. Furthermore, we evaluate methods for selecting negative samples using a small number of candidates sampled locally from the incoming stream of clicks. The selection criteria aim at drastically improving the performance and the convergence speed of the online concept detectors. To validate our arguments we conduct experiments for 30 concepts on the Clickture-Lite dataset. The experimental results demonstrate that: (a) the proposed online approach produces effective and noise resilient concept detectors that can take advantage of streaming clickthrough data and achieve performance that is equivalent to Fuzzy SVM concept detectors with sample weights and 78.6% improved compared to standard SVM concept detectors; and (b) the selection criteria speed up convergence and improve effectiveness compared to random negative sampling even for a small number of available clicks (up to 134% after 100 clicks).}}

### 2015

 (J) Ioannis Sarafis, Christos Diou and Anastasios Delopoulos "Building effective SVM concept detectors from clickthrough data for large-scale image retrieval" International Journal of Multimedia Information Retrieval, 4, (2), pp. 129-142, 2015 Jun [Abstract][BibTex][pdf]Clickthrough data is a source of information that can be used for automatically building concept detectors for image retrieval. Previous studies, however, have shown that in many cases the resulting training sets suffer from severe label noise that has a significant impact in the SVM concept detector performance. This paper evaluates and proposes a set of strategies for automatically building effective concept detectors from clickthrough data. These strategies focus on: (1) automatic training set generation; (2) assignment of label confidence weights to the training samples and (3) using these weights at the classifier level to improve concept detector effectiveness. For training set selection and in order to assign weights to individual training samples three Information Retrieval (IR) models are examined: vector space models, BM25 and language models. Three SVM variants that take into account importance at the classifier level are evaluated and compared to the standard SVM: the Fuzzy SVM, the Power SVM, and the Bilateral-weighted Fuzzy SVM. Experiments conducted on the MM Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) for 40 concepts demonstrate that (1) on average, all weighted SVM variants are more effective than the standard SVM; (2) the vector space model produces the best training sets and best weights; (3) the Bilateral-weighted Fuzzy SVM produces the best results but is very sensitive to weight assignment and (4) the Fuzzy SVM is the most robust training approach for varying levels of label noise.@article{Sarafis2015Building,author={Ioannis Sarafis and Christos Diou and Anastasios Delopoulos},title={Building effective SVM concept detectors from clickthrough data for large-scale image retrieval},journal={International Journal of Multimedia Information Retrieval},volume={4},number={2},pages={129-142},year={2015},month={06},date={2015-06-01},url={http://link.springer.com/article/10.1007/s13735-015-0080-5},doi={http://10.1007/s13735-015-0080-5},abstract={Clickthrough data is a source of information that can be used for automatically building concept detectors for image retrieval. Previous studies, however, have shown that in many cases the resulting training sets suffer from severe label noise that has a significant impact in the SVM concept detector performance. This paper evaluates and proposes a set of strategies for automatically building effective concept detectors from clickthrough data. These strategies focus on: (1) automatic training set generation; (2) assignment of label confidence weights to the training samples and (3) using these weights at the classifier level to improve concept detector effectiveness. For training set selection and in order to assign weights to individual training samples three Information Retrieval (IR) models are examined: vector space models, BM25 and language models. Three SVM variants that take into account importance at the classifier level are evaluated and compared to the standard SVM: the Fuzzy SVM, the Power SVM, and the Bilateral-weighted Fuzzy SVM. Experiments conducted on the MM Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) for 40 concepts demonstrate that (1) on average, all weighted SVM variants are more effective than the standard SVM; (2) the vector space model produces the best training sets and best weights; (3) the Bilateral-weighted Fuzzy SVM produces the best results but is very sensitive to weight assignment and (4) the Fuzzy SVM is the most robust training approach for varying levels of label noise.}}

### 2014

 (C) Ioannis Sarafis, Christos Diou and Anastasios Delopoulos "Building Robust Concept Detectors from Clickthrough Data: A Study in the MSR-Bing Dataset" 2014 9th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), pp. 66-71, 2014 Nov [Abstract][BibTex][pdf]In this paper we extend our previous work on strategies for automatically constructing noise resilient SVM detectors from click through data for large scale concept-based image retrieval. First, search log data is used in conjunction with Information Retrieval (IR) models to score images with respect to each concept. The IR models evaluated in this work include Vector Space Models (VSM), BM25 and Language Models (LM). The scored images are then used to create training sets for SVM and appropriate sample weights for two SVM variants: the Fuzzy SVM (FSVM) and the Power SVM (PSVM). These SVM variants incorporate weights for each individual training sample and can therefore be used to model label uncertainty at the classifier level. Experiments on the MSR-Bing Image Retrieval Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) show that FSVM is the most robust SVM algorithm for handling label noise and that the highest performance is achieved with weights derived from VSM. These results extend our previous findings on the value of FSVM from professional image archives to large-scale general purpose search engines, and furthermore identify VSM as the most appropriate sample weighting model.@inproceedings{Sarafis2014Building,author={Ioannis Sarafis and Christos Diou and Anastasios Delopoulos},title={Building Robust Concept Detectors from Clickthrough Data: A Study in the MSR-Bing Dataset},booktitle={2014 9th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP)},pages={66-71},year={2014},month={11},date={2014-11-01},url={http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6978955},doi={http://10.1109/SMAP.2014},abstract={In this paper we extend our previous work on strategies for automatically constructing noise resilient SVM detectors from click through data for large scale concept-based image retrieval. First, search log data is used in conjunction with Information Retrieval (IR) models to score images with respect to each concept. The IR models evaluated in this work include Vector Space Models (VSM), BM25 and Language Models (LM). The scored images are then used to create training sets for SVM and appropriate sample weights for two SVM variants: the Fuzzy SVM (FSVM) and the Power SVM (PSVM). These SVM variants incorporate weights for each individual training sample and can therefore be used to model label uncertainty at the classifier level. Experiments on the MSR-Bing Image Retrieval Grand Challenge dataset (consisting of 1M images and 82.3M unique clicks) show that FSVM is the most robust SVM algorithm for handling label noise and that the highest performance is achieved with weights derived from VSM. These results extend our previous findings on the value of FSVM from professional image archives to large-scale general purpose search engines, and furthermore identify VSM as the most appropriate sample weighting model.}} (C) Ioannis Sarafis, Christos Diou, Theodora Tsikrika and Anastasios Delopoulos "Weighted SVM from clickthrough data for image retrieval" 2014 IEEE International Conference on Image Processing (ICIP), pp. 3013-3017, 2014 Aug [Abstract][BibTex][pdf]In this paper we propose a novel approach to training noise-resilient concept detectors from clickthrough data collected by image search engines. We take advantage of the query logs to automatically produce concept detector training sets; these suffer though from label noise, i.e., erroneously assigned labels. We explore two alternative approaches for handling noisy training data at the classifier level by training concept detectors with two SVM variants: the Fuzzy SVM and the Power SVM. Experimental results on images collected from a professional image search engine indicate that 1) Fuzzy SVM outperforms both SVM and Power SVM and is the most effective approach towards handling label noise and 2) the performance gain of Fuzzy SVM compared to SVM increases progressively with the noise level in the training sets.@inproceedings{Sarafis2014Weighted,author={Ioannis Sarafis and Christos Diou and Theodora Tsikrika and Anastasios Delopoulos},title={Weighted SVM from clickthrough data for image retrieval},booktitle={2014 IEEE International Conference on Image Processing (ICIP)},pages={3013-3017},year={2014},month={08},date={2014-08-01},url={http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7025609},doi={http://10.1109/ICIP.2014.7025609},abstract={In this paper we propose a novel approach to training noise-resilient concept detectors from clickthrough data collected by image search engines. We take advantage of the query logs to automatically produce concept detector training sets; these suffer though from label noise, i.e., erroneously assigned labels. We explore two alternative approaches for handling noisy training data at the classifier level by training concept detectors with two SVM variants: the Fuzzy SVM and the Power SVM. Experimental results on images collected from a professional image search engine indicate that 1) Fuzzy SVM outperforms both SVM and Power SVM and is the most effective approach towards handling label noise and 2) the performance gain of Fuzzy SVM compared to SVM increases progressively with the noise level in the training sets.}}