In this proposal, we document our blueprint for building a community centred data hub for law and justice in India. We are interested in partnering on how to build a community of people who would like to use and understand data on law and justice and aim to contribute especially on projects on data literacy. We look forward to conversations with interested collaborators. Why build a Hub? Data on law and justice plays an extremely crucial role in the development of human and ecological well being. It intersects with a variety of disciplines including technology, design, ecology, and sociology. At a time when unprecedented amounts of actionable but siloed data are being generated across various human endeavours it becomes important to create multi-disciplinary platforms that empower those working in the field of law and justice to leverage this data by making it accessible and usable, while at the same time engaging critically with the nature of the data. Doing so will help users bring about sustainable change through data-driven advocacy, legislative reform, and systemic and institutional change, while ensuring that the citizenry can increase their understanding and awareness about the ways in which their institutions function. Our mission with the Hub is to not only have a community of users contribute data sets, but also one that facilitates collaboration and exchange in building projects and partnerships.To this end, we are inspired by the Northeast Big Data Innovation Hub for taking initiatives in raising awareness around data literacy, responsible data science and privacy and security aspects. We are also concerned with the usability of such platforms and intend to introduce design solutions whilst drawing inspiration from platforms like DataHub which focus on usability for people not familiar with working with data. This is done by introducing tutorials on how to use their tools. Impact Hub has caught our attention in the methodology they have used to build and keep up a community and we are interested in building a collaborative community that can prove to be mutually beneficial for all those working in the field. How does the Hub work? Usually, non-technologists and domain experts of law are unable to harness the complete potential of data in their work because contemporary data hubs are not equipped to process and combine fragmented datasets from multiple sources. The Hub seeks to empower multi-disciplinary teams to collaborate and create new datasets related to the law and justice sectors through a structured process of (re)combining or (re)mixing existing data sets. It will provide the necessary data processing, analysis and visualisation GUI tools required for merging datasets procured from multiple sources. Such value-adding capabilities encourage and incentivise teams and individuals to collaborate and bring their own datasets into the mix. An emerging dataset is credible because it is reviewed by each domain expert in the project. This collaborative process also ensures consensus around which datasets and methodologies to use. All such decisions would be recorded in the history of remixed datasets as meta-data for other users to understand and scrutinise sources and methods used in creating them. This will further ensure credibility and accuracy of data. Like GitHub, the Hub will provide necessary controls and security protocols that allow teams to keep in-progress datasets and resulting datasets for private or public use, depending on project requirements. For private datasets, teams could control, access, edit, view and share options. The Hub will provide fully searchable repository of open datasets, which are more diverse, non-fragmented and specific in scope, as new and remixed datasets are contributed back into the Hub’s public repository. Use of personal information of users is subject to the Hub’s privacy policy. Key features of the Hub Data Repository The Hub will include a curated repository of open data sets related to the law and justice sectors, cleaned and categorised such that they are ready for further processing and combining with other data sets. Collaboration The Hub would be structured to support multi-disciplinary teams of users who can start a project to which they can upload their own data sets and/or use data sets from the public repository for further processing and remixing. Like GitHub, it will support per project task management features to converse and manage how the data sets are processed. Data Mixing The Hub will provide in-built tools to clean, analyse, and merge two or more data sets within a project. Data processing features such as data normalisation, unit and format conversions, filtering, aggregation, validation, statistical analysis, smart tagging, amongst other features aid in merging two or more data sets. A history of applied processes on the data will be saved as meta-data for future reference. This will help trace how a data set was created, sources used and decisions made in order to process the data by the user. In the future, the Hub can support more complex data processing methods through community developed plugins.
A visual programming language for the Hub allows users to collaborative and visually construct data processing steps required to combine or remix data sets.
Privacy & Sharing The Hub will provide privacy controls for the project as well as the data sets within it. Users can choose to keep their data sets private or public as the project progresses through different stages of completion. They can choose to open up their data sets to the community so that they can be forked (for further remixing by other users) just like code on GitHub. Community The Hub aims to provide community spaces where new remixed data sets that are public are discussed and critiqued for feedback as they are developed. Additionally, we aim to curate offline components of hackathons and immersive exhibitions to improve awareness about the implications of data for justice . The Hub’s main objective is data co-creation and distribution, therefore our focus is to ensure that datasets are interoperable, easy to modify, compile remix/adapt and redistribute without technological barriers and flexible with fair use restrictions. The Hub’s design will support data synchronization and reconciliation of conflicting data changes, and maintain a strong change history and version control system. Creators/users will be required to submit machine readable data sets with associated metadata, to ensure maximum sharing and reuse of data and leading to high impact of data in the wider community. The Hub would enable fast searches and discovery of such data sets content and its metadata through the platform as well as through an API. The user community will be encouraged to categorise and improve the quality of metadata through auto-targeted call to actions to users with expertise in the related domains of the data set(s) in question. A community wide task board is provided that leverages collaborative filtering to drive consensus on which data sets need to be worked on. This enables the community to build consensus, create interest/ownership and drive scope of work. A council of moderators ensure regular review of the uploaded data sets alongside features such as rating/voting, reporting abuse, and commenting on each uploaded data set to highlight any issues and discuss resolutions.
The community wide task board leverages collaborative filtering to drive consensus on data sets which need to be worked on.
Keeping the Hub updated
The platform aims to leverage the community as well as technology to maintain and keep data on the platform relevant. It will drive consensus building, encourage user feedback, as well as brings out ‘under utilised’ datasets to the attention of the community or project. The Hub will feature a community wide task board with collaborative filtering that helps the community prioritise and determine which data set(s) need to be worked on or created. We envision a homepage that is used extensively to promote such practices through editorials around interesting, creative and empirical data driven research projects, featured active projects, calls for collaboration, featured data sets as well as featured ongoing discussions. The Hub will be equipped to detect when the source data set has been updated and inform the relevant users and projects about such updates so they can take necessary actions. In the longer term, we are interested in exploring automating updation of such source data that is updated periodically.
In addition to the above, we aim to design a number of features and initiatives to enable wider community growth. Here are some:
A community wide task board with voting to drive consensus around topics of interest, topics of priority, topics of urgency that could benefit from this community working together in producing new data sets
Workshops, trainings and hackathons on data literacy in context of how it is enabled via the Hub as well as topic based hackathons to create new data sets. A workshop methodology on how to run Hub workshops in your own city
Training materials and support to encourage interested users to take initiatives to raise awareness of data literacy, responsible use, privacy and security of data
An active editorial component of the Hub to inform the community of active or featured projects, discussions, and pressing needs of the community through the home page and other sections of the Hub
A reputation system that will allow contributors to be seen and appreciated by the community
A visualisation system that will allow users to showcase their output data set through visual data-stories
How can you be involved? If you are an organisation or individual interested in collaborating with us or commenting on the proposal, please get in touch with us at siddharth.justiceadda@gmail.com or varsha.justiceadda@gmail.com. This proposal was a finalist at the Agami Data for Justice Challenge 2019. The concepts described in this document are under Creative Commons License CC BY-NC-SA 4.0 International
Comments