Skip to content
Snippets Groups Projects
Commit cef471d4 authored by Mark Hoeber's avatar Mark Hoeber
Browse files

Merge pull request #3038 from edx/ahodges/documentation/dataczar

AN-167 New chapter on role & skills of data czar and research team
parents 5471bab8 8d827332
No related merge requests found
......@@ -51,6 +51,7 @@ These documents describe how we store course structure, student state/progress,
:maxdepth: 2
internal_data_formats/change_log.rst
internal_data_formats/data_czar.rst
internal_data_formats/sql_schema.rst
internal_data_formats/discussion_data.rst
internal_data_formats/wiki_data.rst
......
......@@ -10,6 +10,8 @@ Change Log
* - Date
- Change
* - 28 Mar 2014
- Added the :ref:'Data_Czar' chapter.
* - 24 Mar 2014
- Added the ``user_api_usercoursetag`` table to the :ref:`Student_Info` chapter and the ``assigned_user_to_partition`` and ``child_render`` event types to the :ref:`Tracking Logs` chapter.
* - 19 Mar 2014
......
.. _Data_Czar:
####################################################
Data Czar/Data Team Selection and Responsibilities
####################################################
A data czar is the single representative at a partner institution who has the
credentials to download and decrypt edX data packages. The data czar is
responsible for transferring data securely to researchers and other interested
parties after it is received. Due to the sensitivity of this data, the
responsibility for these activities is restricted to one individual. At each
partner institution, the data czar is the primary point of contact for
information about edX data.
* :ref:`Skills_Experience_Data_Czar`
* :ref:`Getting_Credentials_Data_Czar`
* :ref:`Resources_Information`
At some institutions, only the data czar works on research projects that use
the course data in edX data packages. At other institutions, the data czar
works with a team of additional contributors, or is responsible only for
making a secure transfer of the data to the research team. Typically, the data
team includes members in the following roles (or a data czar with these skill
sets):
* Database administrators work with the SQL and NoSQL data files and write
queries on the data.
* Statisticians and data analysts mine the data.
* Educational researchers pose questions and interpret the results of queries on the data.
See :ref:`Skills_Experience_Contributors`.
All of the individuals who are permitted to access the data should be trained
in, and comply with, their institution's secure data handling protocols.
.. _Skills_Experience_Data_Czar:
**************************************
Skills and Experience of Data Czars
**************************************
The individuals who are selected by a partner institution to be edX data czars
typically have experience working with sensitive student data, are familiar
with encryption/decryption and file transfer protocols, and can validate,
copy, move, and store large files. The data czar is responsible for ensuring
compliance with your institution's and country's regulations with respect to
the sharing of this data.
=====================
General Skills
=====================
- Ability to set up and manage data access.
- Knowledgeable of general data privacy and security best practices.
- Experience with management of sensitive student data.
=====================
Technical Skills
=====================
- Familiarity with PGP and GPG encryption and decryption.
- Ability to download large files from Amazon Web Service (AWS) Simple Storage
Service (S3).
- Experience working with archive files in TAR, GZ, and ZIP formats.
- Familiarity with SQL and noSQL (Mongo) databases.
- Familiarity with CSV and JSON file formats.
- Experience copying, moving, and storing large files in bulk.
- Ability to validate the data and files received and distributed.
.. _Getting_Credentials_Data_Czar:
**************************************
Getting Credentials for Data Czars
**************************************
The designated data czar at each institution works with an edX Program Manager
to set up a public/private key pair for GNU Privacy Guard (GNUPG).
* The edX Analytics team creates an account on the Amazon Web Service (AWS)
Simple Storage Service (S3), and provides the Program Manager with the
public key for account access.
* When a data package is available, the data czar downloads it from S3 and
decrypts it using the private key.
For detailed information on this procedure, see the *How Do I get my Research
Data Package?* article on the Open edX Analytics wiki_.
.. _wiki: https://edx-wiki.atlassian.net/wiki/pages/viewpage.action?pageId=36044863
.. _Resources_Information:
**************************************
Resources and Information
**************************************
The edX Analytics team adds every data czar to a Google Group and mailing
list_ called course-data.
.. _list: http://groups.google.com/a/edx.org/forum/#!forum/course-data
EdX also hosts an **Open edX Analytics** wiki_ that is available to the
public. The wiki provides links to the engineering roadmap, information about
operational issues, and release notes describing past releases.
.. _wiki: http://edx-wiki.atlassian.net/wiki/display/OA/Open+edX+Analytics+Home
.. _Skills_Experience_Contributors:
*************************************************
Skills and Experience of Other Contributors
*************************************************
In addition to the data czar, each partner institution assembles a team of
contributors to their research projects. This team can include database
administrators, software engineers, data specialists, and educational
researchers. The team can be large or small, but collectively its members need
to be able to work with SQL and NoSQL databases, write queries, and convert
the data from raw formats into standard research packages, such as CSV files,
spreadsheets, or other desired formats.
=====================
General Skills
=====================
- Attention to detail.
- Experience setting up and testing a data conversion pipeline.
- Ability to identify interesting features in a complex and rich data set.
- Familiarity with anonymization and obfuscation techniques.
- Familiarity with data privacy and security best practices.
- Experience managing sensitive student data.
=====================
Technical Skills
=====================
- Familiarity with CSV, MongoDB, JSON, Unicode, XML, HTML.
- Ability to set up, query, and administer both SQL and noSQL databases.
- Experience with console/bash scripts.
- Basic or advanced scripting (for example, using Python or Ruby) to convert,
join, and aggregate data from different data sources, handle JSON
serialization, and Unicode specificities.
- Experience with data mining and data aggregation across a rich, varied data set.
- Ability to write parsing scripts that properly handle JSON serialization and
Unicode.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment