38TB of Microsoft data leaked by AI researchers on company’s GitHub page - Hindustan Times

38TB of Microsoft data ‘accidentally exposed’ by AI researchers on company’s GitHub page: Report

Sep 19, 2023 05:27 PM IST

The data trove included backups of two former employees’ workstations, including keys, passwords, and more than 30,000 private team messages.

Reports suggest that Microsoft researchers accidentally leaked 38TB of confidential information onto the company’s GitHub page where it could be downloaded by potentially everyone. The incident reportedly happened in June 2023.

Representational Image(REUTERS)
Representational Image(REUTERS)

According to a report in Digital Trends, the data trove included backups of two former employees’ workstations, including keys, passwords, and more than 30,000 private team messages.

Now catch your favourite game on Crickit. Anytime Anywhere. Find out how

The report in Digital Trends further stated that the leak was accidentally included in a tranche of open-source training data and visitors were encouraged to download it, which would mean that it could fall into the wrong hands again and again.

While data breachers are embarrassing for any company, it’s even worse since it came from Microsoft’s AI researchers. The data was uploaded using Shared Access Signature (SAS) tokens, that lets users share data through Azure Storage accounts.

Visitors to the repository were told to download the training data from a provided URL. However, the web address granted access to much more than just the planned training data, and allowed users to browse files and folders that were not intended to be publicly accessible.

Meanwhile, Wiz, the American cloud security startup added that Azure access tokens allowed full control permissions which meant that anyone who visited the URL could delete and overwrite the files they found, going far beyond just viewing them. Wiz further extolled that there were more serious consequences for such a leak since the repository contained AI training data

Wiz explains that this could have had dire consequences. As the repository was full of AI training data, the intention was for users to download it and feed it into a script, thereby improving their own AI models.

Wiz added: “Yet because it was open to manipulation thanks to its wrongly configured permissions, an attacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.”

The Wiz report further added that the creation of SAS tokens wouldn’t create a paper trail or breadcrumbs so even the admin wouldn’t know if the token existed or where it circulated. Wiz added that it had reported the issue to Microsoft in June 2023, the SAS token was replaced in July and Microsoft completed its investigation in August. The lapse was made public once the issue was fully resolved. You can read more in this Wiz report.

Unlock the power of data-driven insights with IIT Delhi's Data Science & Machine Learning Certificate Program! Click here to know more.

See more

Share this article
Story Saved
Live Score
Saved Articles
My Reads
Sign out
New Delhi 0C
Thursday, June 20, 2024
Start 14 Days Free Trial Subscribe Now
Follow Us On