Maintain any classifier throughout the lifetime of its usage

Team

A basic requirement for supervised machine learning is reliable ground truth data. While machine learning techniques are advancing rapidly, those techniques would not perform if the ground truth data powering the machine learning system is inadequate or faulty. It is critical to build clean and good datasets, but that is both time consuming and expensive. The standard practice today is to use a crowdsourcing platform, such as Amazon MTurk or Crowdflower, to process and label the data. This approach requires considerable care and expertise.

This project aims to automate the whole process through an AI-Driven Classifier Building Pipeline, which produces high-quality data labels and better classifiers by relying on machine learning. It works with different data modalities such as text and images.

To create the initial dataset, we use the Semantic Pipeline built at the LSIR lab, which gives access to multiple social media data sources. Our platform selects images randomly from the image collection and presents them for labeling. In that process, the classifier is trained with the new images, identifying the images that are worth labeling. In the following steps, the system doesn’t draw images randomly; instead, it draws as many images it has low confidence about as possible.

Instead of showing images one by one in the user interface for labelling, we show multiple images per page in a clustered representation so that similar images are shown near each other, facilitating labeling by group rather than individual images. By scaling up labeling to 10-30 items at a time, we aim to accelerate the labeling process dramatically and increase the size of datasets.

Using our platform, it is possible to maintain any classifier throughout the lifetime of its usage and to iterate on its accuracy in a smarter way.

Suggested Reading

https://www.epfl.ch/labs/lsir/ai-driven-classifier-building-pipeline/

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Upcoming Events

Future Health: Harnessing Multimodal Data and GenAI for Health Promotion

Swiss Federal Offices Day 2024

Annual Event

AI-Driven Classifier Building Pipeline

Maintain any classifier throughout the lifetime of its usage

Team