Approaches to this problem
When developing a new approach, it is helpful to do a competitive analysis. Informative examples:
Other informative examples:
- My son’s data is spread across hundreds of systems
- There is a poorly-understood shadow data ecosystem, where data is bundled, aggregate, sold, and bought by hundreds of shady vendors.
- Research data frequently wanders among grad student laptops, with no controls or auditability.
- A very significant portion of computers have malware. If any of the above computers are compromised, so is student data.
There are many others you can find.
There are also many models of research data access, including:
- Visiting isolated computers in a secure facility (e.g. FSDRC)
- Sending data to institutions with specific requirements on both the institution and contractual requirements on how the data is managed
- Central location, but accessing data remotely
- Central location, sending algorithms to run over data (MRF).
- Sending data to institutions with weak requirements
- Allowing access with a click-through license
Data can also be de-identified to different extents.
More conservative approaches include:
- Not collecting data beyond the minimum required for a web service to work
- Storing data semi-locally (e.g. local servers for a school or district)
- Storing data fully locally (e.g. applications on laptops, where no data leaves)
What are the lessons learned? What worked well? What didn’t? How should we manage student data in the future?