You have a black box that contains secrets. Secrets you want to protect. Here I assume that the black box is a model that has many inputs and a single output (for simplicity). I assume that you allow many people to supply inputs to your model and receive the output, but you do not want anyone (e.g., an adversary) knowing how the model determines outputs from inputs. There are many simple solutions, including varying the model weights at inference time, or adjusting the prediction value within reasonable bounds. This would help protect the importance of features, but will provide a worse model for all users. I propose an approach where most users get the correct model, but adversaries (identified by patterns in input requests) are tricked into getting an incorrect mapping from inputs to outputs.
|Effective start/end date||30/10/20 → 15/02/21|