/data/ibm – gridfm.org

Can we share models trained on our grid data?

Kathrin Grosse – IBM Research

The relationship between data volume and accuracy is well-established: increased data availability typically comes with higher performance. However, when data is limited, the practical implementation of data sharing introduces significant complexities about who obtains access to the data. Not publishing data, but a trained model instead, seems to be a way to prevent releasing the data. In this talk, we focus on the different possibilities to extract data from a trained model: for example, confidence values are used in membership inference to conclude whether data was used in training. Another example is model inversion, where backpropagation allows inferring the original training data. Analyzing these privacy attacks, we conclude what data can possibly be leaked by a model.