This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools.
We are social scientists applying a range of quantitative and qualitative methods to the study of online communities. We seek to understand both how and why some attempts at collaborative production — like Wikipedia and Linux — build large volunteer communities and high quality work products
Our research is particularly focused on how the design of communication and information technologies shape fundamental social outcomes with broad theoretical and practical implications — like an individual’s decision to join a community, contribute to a public good, or a group’s ability to make decisions democratically.
Our research is deeply interdisciplinary, most frequently consists of “big data” quantitative analyses, and lies at the intersection of communication, sociology, and human-computer interaction.
“La ciencia de los datos puede ayudar a formar a ciudadanos capaces de leer las noticias sin dejarse engañar por gráficos y datos maliciosos”.
“El tratamiento de datos para la resolución de problemas reales y la toma de decisiones debería ser una competencia transversal básica en todos los niveles educativos”, sentencia Teresa Sancho, directora del grado en Ciencia de Datos de la Universitat Oberta de Catalunya (UOC). “No es necesario crear una asignatura para ello; basta con introducir la perspectiva propia de la ciencia de datos en las distintas asignaturas que existen”.
NumPy (short for Numerical Python) is one of the top libraries equipped with useful resources to help data scientists turn Python into a powerful scientific analysis and modelling tool. The popular open source library is available under the BSD license. It is the foundational Python library for performing tasks in scientific computing. NumPy is part of a bigger Python-based ecosystem of open source tools called SciPy.
Pandas is another great library that can enhance your Python skills for data science. Just like NumPy, it belongs to the family of SciPy open source software and is available under the BSD free software license.
Matplotlib is also part of the SciPy core packages and offered under the BSD license. It is a popular Python scientific library used for producing simple and powerful visualizations. You can use the Python framework for data science for generating creative graphs, charts, histograms, and other shapes and figures—without worrying about writing many lines of code. For example, let’s see how the Matplotlib library can be used to create a simple bar chart.