Open source science literature review

Below is a chronological list of articles pertaining to Open Source Science in Software and Hardware.


P Rice, I Longden and A Bleasby, (2000), EMBOSS: The European Molecular Biology Open Software Suite, The European Molecular Biology Open Software Suite, Volume 16, No.6

  • Abstract

EMBOSS is "The European Molecular Biology Open Software Suite". EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.


Akinobu Lee, Kiyohiro Shikano and Tatsuya Kawahara. (2001). Julius - an Open Source Real-Time Large Vocabulary Recognition Engine. In Eurospeech 2001 - Scandinavia, 1691-1694.

  • Abstract

Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.


Chihiro Watanabe, Youichirou S. Tsuji, and Charla Griffy-Brown, “Patent statistics: deciphering a 'real' versus a 'pseudo' proxy of innovation,” Technovation 21, no. 12 (December 2001): 783-790, doi:10.1016/S0166-4972(01)00025-6.

  • Abstract

Patent statistics have fascinated economists concerned about innovation for a long time. However, fundamental questions remain as to whether or not patent statistics represent the real state of innovation. As Griliches pointed out, substantial questions involve: What aspects of economic activities do patent statistics actually capture? And, what would we like them to measure? He pointed out that these statistics can be a mirage appearing to provide a great number of objective and reliable proxies for innovation.

This paper aims to address some of these questions by making a comparative evaluation of the representability of patent statistics in four levels of the innovation process, using as examples research and development (R&D) in Japan's printer and photovoltaic solar cell (PV) industries over the last two decades. Furthermore, this research provides a new set of patent statistics which could be considered a more reliable proxy for innovation.


Brian Bruns, “Open sourcing nanotoechnology research and development: issues and opportunities,” Nanotechnology 12 (2001): 198-210.

  • This is an excellent paper examining the viability of open source design in the nanotech industry. Important things to learn from the open source software (OSS) successes are the bazaar-style design process, as well as the gift-culture created. Concerns regarding the tragedy of anti-commons provide reason to examine alternative research methods within nanotechnology. The paper discusses various licenses possible for nanotechnology and identifies this as an area where more research should be done. Various business models are highlighted, including the ‘’producer coalition’’, and reminds the reader that there are various levels of openness that firms could adopt depending on their business. A survey of the nanotech industry is done, and it is important to note that the many nanotechnology firms get funding from the US government, which favours strong IP and patenting laws.


Mark D. Wilkinson,(2002),BioMOBY: An open source biological web services proposal, Brief Bioinform, 3 (4): 331-341. doi: 10.1093/bib/3.4.331

  • Abstract

BioMOBY is an Open Source research project which aims to generate an architecture for the discovery and distribution of biological data through web services; data and services are decentralised, but the availability of these resources, and the instructions for interacting with them, are registered in a central location called MOBY Central. BioMOBY adds to the web services paradigm, as exemplified by Universal Data Discovery and Integration (UDDI), by having an object-driven registry query system with object and service ontologies. This allows users to traverse expansive and disparate data sets where each possible next step is presented based on the data object currently in-hand. Moreover, a path from the current data object to a desired final data object could be automatically discovered using the registry. Native BioMOBY objects are lightweight XML, and make up both the query and the response of a simple object access protocol(SOAP) transaction.


Stefan Koch and Georg Schneider, (2002), Effort, co-operation and co-ordination in an open source software project: GNOME, Information Systems Journal, Volume 12, Issue 1, pages 27–42,DOI: 10.1046/j.1365-2575.2002.00110.x


  • Abstract

This paper presents results from research into open source projects from a software engineering perspective. The research methodology employed relies on public data retrieved from the CVS repository of the GNOME project and relevant discussion groups. This methodology is described, and results concerning the special characteristics of open source software development are given. These data are used for a first approach to estimating the total effort to be expended.


Alessandro Cimatti et al., (2002), NuSMV 2: An OpenSource Tool for Symbolic Model Checking, Computer Science, Volume 2404/2002, 241-268, DOI: 10.1007/3-540-45657-0_29

  • Abstract

This paper describes version 2 of the NuSMV tool (computer aided verification). NuSMV is a symbolic model checker originated from the reengineering, reimplementation and extension of SMV, the original BDD-based model checker developed at CMU. The NuSMV project aims at the development of a state-of-the-art symbolic model checker, designed to be applicable in technology transfer projects: it is a well structured, open, flexible and documented platform for model checking, and is robust and close to industrial systems standards.


R. Lougee-Heimer, (2003), The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research community, IBM Journal of Research and Development, Volume: 47 , Issue: 1, p. 57- 66

  • Abstract

The Common Optimization INterface for Operations Research (COIN-OR, http://www.coin-or.org/) is an initiative to promote open-source software for the operations research (OR) community. In OR practice and research, software is fundamental. The dependence of OR on software implies that the ways in which software is developed, managed, and distributed can have a significant impact on the field. Open source is a relatively new software development and distribution model which offers advantages over current practices. Its viability depends on the precise definition of open source, on the culture of a distributed developer community, and on a version-control system which makes distributed development possible. In this paper, we review open-source philosophy and culture, and present the goals and status of COIN-OR.


M.K. Smith et al., (2003), DSpace: An Open Source Dynamic Digital Repository, D-Lib Magazine, Volume 9 Number 1. DOI: 10.1045/january2003-smith

  • Abstract

For the past two years the Massachusetts Institute of Technology (MIT) Libraries and Hewlett-Packard Labs have been collaborating on the development of an open source system called DSpaceâ„¢ that functions as a repository for the digital research and educational material produced by members of a research university or organization. Running such an institutionally-based, multidisciplinary repository is increasingly seen as a natural role for the libraries and archives of research and teaching organizations. As their constituents produce increasing amounts of original material in digital formats—much of which is never published by traditional means—the repository becomes vital to protect the significant assets of the institution and its faculty. The first part of this article describes the DSpace system including its functionality and design, and its approach to various problems in digital library and archives design. The second part discusses the implementation of DSpace at MIT, plans for federating the system, and issues of sustainability.


Natalya F. Noy et al., (2003), Protégé-2000: An Open-Source Ontology-Development and Knowledge-Acquisition Environment, AMIA 2003 Open Source Expo

  • Abstract

Protégé-2000 is an open-source tool that assists users in the construction of large electronic knowledge bases. It has an intuitive user interface that enables developers to create and edit domain ontologies. Numerous plugins provide alternative visualization mechanisms, enable management of multiple ontologies, allow the use of inference engines and problem solvers with Protégé ontologies, and provide other functionality. The Protégé user community has more than 7000 members.


R. Lougee-Heimer, (2003), The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research communityIBM Journal of Research and Development, Volume: 47 , Issue: 1, pp 57- 66

  • Abstract

The Common Optimization INterface for Operations Research (COIN-OR, http://www.coin-or.org/) is an initiative to promote open-source software for the operations research (OR) community. In OR practice and research, software is fundamental. The dependence of OR on software implies that the ways in which software is developed, managed, and distributed can have a significant impact on the field. Open source is a relatively new software development and distribution model which offers advantages over current practices. Its viability depends on the precise definition of open source, on the culture of a distributed developer community, and on a version-control system which makes distributed development possible. In this paper, we review open-source philosophy and culture, and present the goals and status of COIN-OR.


T. Staples et al., (2003), The Fedora Project : An open-source Digital Object Repository Management System, D-Lib Magazine, April 2003, v. 9, no. 4

  • About

Using a grant from the Andrew W. Mellon Foundation, the University of Virginia Library has released an open-source digital object repository management system. The Fedora Project, a joint effort of the University of Virginia and Cornell University, has now made available the first version of a system based on the Flexible Extensible Digital Object Repository Architecture, originally developed at Cornell.

Fedora repositories can provide the foundation for a variety of information management schemes, including digital library systems. At the University of Virginia, Fedora is being used to build a large-scale digital library that will soon have millions of digital resources of all media and content types. A consortium of institutions that include the Library of Congress, Northwestern University, and Tufts University is also currently testing the program. They are building test beds drawn from their own digital collections that they will use to evaluate the software and give feedback to the project.


S. Dudoit, R. C. Gentleman and J. Quackenbush, (2003),Open Source Software for the Analysis of Microarray Data, BioTechniques34, pp45-51

  • Abstract

DNA microarray assays represent the first widely used application that attempts to build upon the information provided by genome projects in the study of biological questions. One of the greatest challenges with working with microarrays is collecting, managing, and analyzing data. Although several commercial and noncommercial solutions exist, there is a growing body of freely available, open source software that allows users to analyze data using a host of existing techniques and to develop their own and integrate them within the system. Here we review three of the most widely used and comprehensive systems, the statistical analysis tools written in R through the Bioconductor project (http://www.bioconductor.org), the Java®-based TM4 software system available from The Institute for Genomic Research (http://www.tigr.org/software), and BASE, the Web-based system developed at Lund University (http://base.thep.lu.se).


F., Meyer et al., (2003), GenDB—an open source genome annotation system for prokaryote genomes, Nucleic Acids Res, 2003 April 15; 31(8): 2187–2195.

  • Abstract

The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.


S.M. Maurer, (2003), New Institutions for Doing Science: From Databases to Open Source Biology, European Policy for Intellectual Property Conference, University of Maastricht, The Netherlands, November 24-25, 2003

  • Abstract

Recently, several authors have suggested that a new method of doing science called “open source biology” is about to emerge. However, very little has been written about how such an institution would differ from existing research institutions. Scientific databases provide a natural model. During the 1990s, scientists experimented with several new database initiatives designed to reconcile private support with the ideals of open science. Despite significant controversy, this paper argues that private/public transactions that unambiguously promote academic science should be encouraged. In principle, research communities can also organize database collaborations to pursue social and political goals. Examples include discouraging software patents, promoting “green” investment, and improving internet security. Finally, the new field of computational genomics blurs the traditional line between database creation and product development. This paper describes how traditional database institutions can be modified and extended to discover pharmaceuticals. The proposed institution (“open source drug discovery”) would be particularly useful for combating Third World diseases. Success would demonstrate that the open source institution is not limited to computer science and can develop products other than software.


M. Dougiamas & P. Taylor, (2003). Moodle: Using Learning Communities to Create an Open Source Course Management System. In D. Lassner & C. McNaught (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2003, pp. 171-178

  • Abstract

This paper summarizes a PhD research project that has contributed towards the development of Moodle - a popular open-source course management system (moodle.org). In this project we applied theoretical perspectives such as "social constructionism" and "connected knowing" to the analysis of our own online classes as well as the growing learning community of other Moodle users. We used the mode of participatory action research, including techniques such as case studies, ethnography, learning environment surveys and design methodologies. This ongoing analysis is being used to guide the development of Moodle as a tool for improving processes within communities of reflective inquiry. At the time of writing (April 2003), Moodle has been translated into twenty-seven languages and is being used by many hundreds of educators around the world, including universities, schools and independent teachers.


Arnaud Delorme and Scott Makeig, (2003). EEGLAB: an opensource toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, Volume 134, Issue 1, Pages 9–21

  • Abstract

We have developed a toolbox and graphic user interface, EEGLAB, running under the crossplatform MATLAB environment (The Mathworks, Inc.) for processing collections of single-trial and/or averaged EEG data of any number of channels. Available functions include EEG data, channel and event information importing, data visualization (scrolling, scalp map and dipole model plotting, plus multi-trial ERP-image plots), preprocessing (including artifact rejection, filtering, epoch selection, and averaging), independent component analysis (ICA) and time/frequency decompositions including channel and component cross-coherence supported by bootstrap statistical methods based on data resampling. EEGLAB functions are organized into three layers. Top-layer functions allow users to interact with the data through the graphic interface without needing to use MATLAB syntax. Menu options allow users to tune the behavior of EEGLAB to available memory. Middle-layer functions allow users to customize data processing using command history and interactive ‘pop’ functions. Experienced MATLAB users can use EEGLAB data structures and stand-alone signal processing functions to write custom and/or batch analysis scripts. Extensive function help and tutorial information are included. A ‘plug-in’ facility allows easy incorporation of new EEG modules into the main menu. EEGLAB is freely available (http://www.sccn.ucsd.edu/eeglab/) under the GNU public license for noncommercial use and opensource development, together with sample data, user tutorial and extensive documentation.


Richard C. Atkinson et al., (2003), “INTELLECTUAL PROPERTY RIGHTS: Public Sector Collaboration for Agricultural IP Management,” Science 301, no. 5630 (July 11, 2003): 174-175, doi:10.1126/science.1085553.

  • Abstract

The fragmented ownership of rights to intellectual property (IP) in agricultural biotechnology leads to situations where no single public-sector institution can provide a complete set of IP rights to ensure freedom to operate with a particular technology. This situation causes obstacles to the distribution of improved staple crops for humanitarian purposes in the developing world and specialty crops in the developed world. This Policy Forum describes an initiative by the major agricultural universities in the United States and other public-sector institutions to establish a new paradigm in the management of IP to facilitate commercial development of such crops.


Antoine Rosset, Luca Spadola and Osman Ratib, (2004), OsiriX: An Open-Source Software for Navigating in Multidimensional DICOM Images, Journal of Digital Imaging, Volume 17, Number 3, pp. 205-216, DOI: 10.1007/s10278-004-1014-6

  • Abstract

A multidimensional image navigation and display software was designed for display and interpretation of large sets of multidimensional and multimodality images such as combined PET-CT studies. The software is developed in Objective-C on a Macintosh platform under the MacOS X operating system using the GNUstep development environment. It also benefits from the extremely fast and optimized 3D graphic capabilities of the OpenGL graphic standard widely used for computer games optimized for taking advantage of any hardware graphic accelerator boards available. In the design of the software special attention was given to adapt the user interface to the specific and complex tasks of navigating through large sets of image data. An interactive jog-wheel device widely used in the video and movie industry was implemented to allow users to navigate in the different dimensions of an image set much faster than with a traditional mouse or on-screen cursors and sliders. The program can easily be adapted for very specific tasks that require a limited number of functions, by adding and removing tools from the programs toolbar and avoiding an overwhelming number of unnecessary tools and functions. The processing and image rendering tools of the software are based on the open-source libraries ITK and VTK. This ensures that all new developments in image processing that could emerge from other academic institutions using these libraries can be directly ported to the OsiriX program. OsiriX is provided free of charge under the GNU open-source licensing agreement at http://homepage.mac.com/rossetantoine/osirix.


Stephen M. Maurer, Arti Rai, and Andrej Sali, (2004), Finding Cures for Tropical Diseases: Is Open Source an Answer?", PLoS Medicine 1, no. 3 (December 2004): 183-186.

  • Abstract

This paper showcases that the current models of encouraging pharmaceuticals to research and develop drugs curing tropical diseases that affects poor people aren’t working. These methods are 1)asking governments and NGOs to subsidize drugs rates for developed countries, and 2) to create non-profit venture capital firms. It proposes an open-source model for developing these drugs through a website (www.tropicaldisease.org). It describes how scientists could use chat pages and shared databases to make discoveries.

  • The payment of scientists working on this database would not be monetary, but scientists would gain stature and enhance their reputation, as is similar to the motivations of the hacker community. The drugs would not be patented in order to ensure that retail costs remained low. Companies and universities would allow their workers to volunteer, and would even donate databases and resources because the value of their IP lies in North American and European medicines.


C. Robertson, J.P Cortens and R. C. Beavis,(2004), Open Source System for Analyzing, Validating, and Storing Protein Identification Data, Journal of Proteome Research, 3 (6), pp 1234–1242, DOI: 10.1021/pr049882h


  • Abstract

This paper describes an open-source system for analyzing, storing, and validating proteomics information derived from tandem mass spectrometry. It is based on a combination of data analysis servers, a user interface, and a relational database. The database was designed to store the minimum amount of information necessary to search and retrieve data obtained from the publicly available data analysis servers. Collectively, this system was referred to as the Global Proteome Machine (GPM). The components of the system have been made available as open source development projects. A publicly available system has been established, comprised of a group of data analysis servers and one main database server.


M.J.L. de Hoon, S. Imoto, J. Nolan and S. Miyano, (2004), Open Source Clustering Software, Oxford University Press, 20 (9): 1453-1454. DOI: 10.1093/bioinformatics/bth078

  • Summary

We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C.


Willie Walker et al., (2004), Sphinx-4: a flexible open source framework for speech recognition, Technical Report

  • Abstract

Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "researchready" system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source


Burr Settles, (2005), ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21 (14): 3191-3192. doi: 10.1093/bioinformatics/bti475

  • Summary

ABNER (A Biomedical Named Entity Recognizer) is an open source software tool for molecular biology text mining. At its core is a machine learning system using conditional random fields with a variety of orthographic and contextual features. The latest version is 1.5, which has an intuitive graphical interface and includes two modules for tagging entities (e.g. protein and cell line) trained on standard corpora, for which performance is roughly state of the art. It also includes a Java application programming interface allowing users to incorporate ABNER into their own systems and train models on new corpora.


P.A. Cook et al., (2005), Camino: Open-source diffusion-MRI reconstruction and processing, The Insight Journal - 2005 MICCAI Open-Source Workshop.

  • Abstract

Camino is an open-source, object-oriented software package for processing diffusion MRI data. Camino implements a data processing pipeline, which allows for easy scripting and flexible integration with other software. This paper summarises the features of Camino at each stage of the pipeline from the raw data to the statistics used by clinicians and researchers. The paper also discusses the role of Camino in the paper "An Automated Approach to Connectivity-based Partitioning of Brain Structures",


Stein Aerts et al., (2005), [TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis], Nucleic Acids Research, Volume 33, Issue suppl 2, pp. 393-396, doi: 10.1093/nar/gki354

  • Abstract

We present the second and improved release of the TOUCAN workbench for cis-regulatory sequence analysis. TOUCAN implements and integrates fast state-of-the-art methods and strategies in gene regulation bioinformatics, including algorithms for comparative genomics and for the detection of cis-regulatory modules. This second release of TOUCAN has become open source and thereby carries the potential to evolve rapidly. The main goal of TOUCAN is to allow a user to come to testable hypotheses regarding the regulation of a gene or of a set of co-regulated genes. TOUCAN can be launched from this location: http://www.esat.kuleuven.ac.be/~saerts/software/toucan.php.


Will Schroeder, (2005), The ITK Software Guide Second Edition Updated for ITK version 2.4, Computer and Information Science, Volume: 525, Issue: 1-3, Publisher: Citeseer, Pages: 53-58

  • Abstract

The Insight Toolkit (ITK) is an open-source software toolkit for performing registration and segmentation. Segmentation is the process of identifying and classifying data found in a digi- tally sampled representation. Typically the sampled representation is an image acquired from suchmedical instrumentation as CT orMRI scanners. Registration is the task of aligning or de- veloping correspondences between data. For example, in the medical environment, a CT scan may be aligned with aMRI scan in order to combine the information contained in both. ITK is implemented in C++. It is cross-platform, using a build environment known as CMake to manage the compilation process in a platform-independent way. In addition, an automated wrapping process (Cable) generates interfaces between C++ and interpreted programming lan- guages such as Tcl, Java, and Python. This enables developers to create software using a variety of programming languages. ITKs C++ implementation style is referred to as generic program- ming, which is to say that it uses templates so that the same code can be applied generically to any class or type that happens to support the operations used. Such C++ templating means that the code is highly efficient, and that many software problems are discovered at compile-time, rather than at run-time during programexecution. Because ITKis an open-source project, developers fromaround theworld can use, debug,main- tain, and extend the software. ITKuses amodel of software development referred to as Extreme Programming. Extreme Programming collapses the usual software creation methodology into a simultaneous and iterative process of design-implement-test-release. The key features of Ex- treme Programming are communication and testing. Communication among the members of the ITK community is what helps manage the rapid evolution of the software. Testing is what keeps the software stable. In ITK, an extensive testing process (using a system known as Dart) is in place that measures the quality on a daily basis. The ITK Testing Dashboard is posted continuously, reflecting the quality of the software at any moment. This book is a guide to using and developing with ITK. The sample code in the directory pro- vides a companion to the material presented here. The most recent version of this document is available online at http://www.itk.org/ItkSoftwareGuide.pdf.


AG González, (2005), Open science: open source licenses in scientific research, North Carolina Journal of Law & Technology., Vol. 7, Issue 2.

  • Abstract

In recent years, there has been growing interest in the area of open source software (OSS) as an alternative economic model. However, the success of the OSS mindshare and collaborative online experience has wider implications to many other fields of human endeavour than the mere licensing of computer programmes. There are a growing number of institutions interested in using OSS licensing schemes to distribute creative works, scientific research and even to publish online journals through open access licenses (OA).

There appears to be growing concern in the scientific community about the trend to fence and protect scientific research through intellectual property, particularly by the abuse of patent applications for biotechnology research. The OSS experience represents a successful model that demonstrates that IP licenses could eventually be used to protect against the misuse and misappropriation of basic scientific research. This would be done by translating existing OSS licenses to protect scientific research. Some efforts are already paying dividends in areas such as scientific publishing, evidenced by the growing number of OA journals. However, the process of translating software licenses to areas other than publishing has been more difficult. OSS and open access licenses work best with works subject to copyright protection because copyright subsists in an original work as soon as it is created. However, it has been more difficult to generate a license that covers patented works because patents are only awarded through a lengthy application and registration process. If the open science experiment is to work, it needs the intervention of the legal community to draft new licenses that may apply to scientific research. This work will look at the issue of such open science licenses, paying special care as to how the system can best be exported to scientific research based on OSS and OA ideals.


Ethan G Cerami et al., (2006), cPath: open source software for collecting, storing, and querying biological pathways, BMC Bioinformatics 2006, 7:497 doi:10.1186/1471-2105-7-497


  • Background

Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required.

Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use.

Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.


D. Krajzewicz, M. Bonert and P. Wagner, (2006), RoboCup 2006 Infrastructure Simulation Competition, Computer and Information Science › Miscellaneous Papers

  • Abstract

Since the year 2000, the Institute of Transportation Research (IVF) at the German Aerospace Centre (DLR) is developing a microscopic, traffic simulation package. The complete package is offered as open source to establish the software as a common testbed for algorithms and models from traffic research. Since the year 2003 the IVF also works on a virtual traffic management centre and in conjunction with this on traffic management. Several large-scale projects have been done since this time, most importantly INVENT where modern traffic management methods have been evaluated and the online-simulation and prediction of traffic during the world youth day (Weltjugendtag) 2005 in Cologne/Germany. This publication briefly describes the simulation package together with the projects mentioned above to show how SUMO can be used to simulate large- scale traffic scenarios. Additionally, it is pointed out how SUMO may be used as a testbed for automatic management algorithms with minor effort in developing extensions.


Alan MacCormack, John Rusnak and Carliss Y. Baldwin, (2006). Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code. Management Science, Vol. 52, No. 7, pp. 1015-1030

  • Abstract

This paper reports data from a research project which seeks to characterize the differences in design structure between complex software products. In particular, we adopt a technique based upon Design Structure Matrices (DSMs) to map the dependencies between different elements of a design then develop metrics that allow us to compare the structures of these different DSMs. We demonstrate the power of this approach in two ways: First, we compare the design structures of two complex software products – the Linux operating system and the Mozilla web browser – that were developed via contrasting modes of organization: specifically, open source versus proprietary development. We find significant differences in their designs, consistent with an interpretation that Linux possesses a more “modular” architecture. We then track the evolution of Mozilla, paying particular attention to a major “re-design” effort that took place several months after its release as an open source product. We show that this effort resulted in a design structure that was significantly more modular than its predecessor, and indeed, more modular than that of a comparable version of Linux.

Our findings demonstrate that it is possible to characterize the structure of complex product designs and draw meaningful conclusions about the precise ways in which they differ. We provide a description of a set of tools that will facilitate this analysis for software products, which should prove fruitful for researchers and practitioners alike. Empirically, the data we provide, while exploratory, is consistent with a view that different modes of organization may tend to produce designs possessing different architectural characteristics. However, we also find that purposeful efforts to re-design a product’s architecture can have a significant impact on the structure of a design, at least for products of comparable complexity to the ones we examine here.


S. Kerrien et al., (2006), IntAct—open source resource for molecular interaction data, Nucleic Acids Research 35 (suppl 1): D561-D565. doi: 10.1093/nar/gkl958

  • Abstract

IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data. The data available in the database originates entirely from published literature and is manually annotated by expert biologists to a high level of detail, including experimental methods, conditions and interacting domains. The database features over 126 000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies. The web site provides tools allowing users to search, visualize and download data from the repository. IntAct supports and encourages local installations as well as direct data submission and curation collaborations. IntAct source code and data are freely available from http://www.ebi.ac.uk/intact.


Andrea Bonaccorsi, Silvia Giannangeli and Cristina Rossi (2006),Entry strategies under competing standards: Hybrid business models in the open source software industry,Management Science, Vol. 52, No. 7, pp. 1085-1098

  • Abstract

The paper analyses the entry strategies of software firms that adopt the Open Source production model. A new definition of business model is proposed. Empirical evidence, based on an exploratory survey taken on 146 Italian software firms shows that firms adapted to an environment dominated by incumbent standards by combining Open Source and proprietary software. The paper examines the determinants of business models and discusses the stability of hybrid models in the evolution of the industry.


C. Steinbeck et al., (2006), Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics, Current Pharmaceutical Design, Volume 12, Number 17, June 2006 , pp. 2111-2120(10) DOI: http://dx.doi.org/10.2174/138161206777585274


  • Abstract

The Chemistry Development Kit (CDK) provides methods for common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possibly equipped with a web interface, as well as for applications and client-side applets. This article introduces the CDK's new QSAR capabilities and the recently introduced interface to statistical software.


Pierre Azoulaya, Andrew Stellmanb and Joshua Graff Zivinc, (2006), PublicationHarvester: An open-source software tool for science policy research, Research Policy, Volume 35, Issue 7, Pages 970–974

  • Abstract

We present PublicationHarvester, an open-source software tool for gathering publication information on individual life scientists. The software interfaces with MEDLINE, and allows the end-user to specify up to four MEDLINE-formatted names for each researcher. Using these names along with a user-specified search query, PublicationHarvester generates yearly publication counts, optionally weighted by Journal Impact Factors. These counts are further broken-down by order on the authorship list (first, last, second, next-to-last, middle) and by publication type (clinical trials, regular journal articles, reviews, letters/editorials, etc.) The software also generates a keywords report at the scientist-year level, using the medical subject headings (MeSH) assigned by the National Library of Medicine to each publication indexed by MEDLINE. The software, source code, and user manual can be downloaded at http://www.stellman-greene.com/PublicationHarvester/.


Jeffrey A. Roberts, Il-Horn Hann and Sandra A. Slaughter. (2006), Understanding the Motivations, Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects. Management Science, Vol. 52, No. 7, pp. 984-999 DOI: 10.1287/mnsc.1060.0554

  • Abstract

Understanding what motivates participation is a central theme in the research on open source software (OSS) development. Our study contributes by revealing how the different motivations of OSS developers are interrelated, how these motivations influence participation leading to performance, and how past performance influences subsequent motivations. Drawing on theories of intrinsic and extrinsic motivation, we develop a theoretical model relating the motivations, participation, and performance of OSS developers. We evaluate our model using survey and archival data collected from a longitudinal field study of software developers in the Apache projects. Our results reveal several important findings. First, we find that developers' motivations are not independent but rather are related in complex ways. Being paid to contribute to Apache projects is positively related to developers' status motivations but negatively related to their use-value motivations. Perhaps surprisingly, we find no evidence of diminished intrinsic motivation in the presence of extrinsic motivations; rather, status motivations enhance intrinsic motivations. Second, we find that different motivations have an impact on participation in different ways. Developers' paid participation and status motivations lead to above-average contribution levels, but use-value motivations lead to below-average contribution levels, and intrinsic motivations do not significantly impact average contribution levels. Third, we find that developers' contribution levels positively impact their performance rankings. Finally, our results suggest that past-performance rankings enhance developers' subsequent status motivations.


Janos Demeter et al., (2007), The Stanford Microarray Database: implementation of new analysis tools and open source release of software, Nucleic Acids Research (2007) 35 (suppl 1): D766-D770. doi: 10.1093/nar/gkl1019

  • Abstract

The Stanford Microarray Database (SMD; http://smd.stanford.edu/) is a research tool and archive that allows hundreds of researchers worldwide to store, annotate, analyze and share data generated by microarray technology. SMD supports most major microarray platforms, and is MIAME-supportive and can export or import MAGE-ML. The primary mission of SMD is to be a research tool that supports researchers from the point of data generation to data publication and dissemination, but it also provides unrestricted access to analysis tools and public data from 300 publications. In addition to supporting ongoing research, SMD makes its source code fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD. In this article, we describe several data analysis tools implemented in SMD and we discuss features of our software release.


Evren Sirin et al., Pellet: A practical OWL-DL reasoner, Software Engineering and the Semantic Web, Volume 5, Issue 2, June 2007, Pages 51–53

  • Abstract

In this paper, we present a brief overview of Pellet: a complete OWL-DL reasoner with acceptable to very good performance, extensive middleware, and a number of unique features. Pellet is the first sound and complete OWL-DL reasoner with extensive support for reasoning with individuals (including nominal support and conjunctive query), user-defined datatypes, and debugging support for ontologies. It implements several extensions to OWL-DL including a combination formalism for OWL-DL ontologies, a non-monotonic operator, and preliminary support for OWL/Rule hybrid reasoning. Pellet is written in Java and is opensource.


Scott L. Delp et al., (2007). OpenSim: Open-Source Software to Create and Analyze Dynamic Simulations of Movement, IEEE Engineering in Medicine and Biology Society Volume: 54 , Issue: 11, pages 1940-1950 doi: 10.1109/TBME.2007.901024

  • Abstract

We have developed a freely available, open-source software system (OpenSim) that lets users develop models of musculoskeletal structures and create dynamic simulations of a wide variety of movements. We are using this system to simulate the dynamics of individuals with pathological gait and to explore the biomechanical effects of treatments. Dynamic simulations of movement allow one to study neuromuscular coordination, analyze athletic performance, and estimate internal loading of the musculoskeletal system. Simulations can also be used to identify the sources of pathological movement and establish a scientific basis for treatment planning. OpenSim provides a platform on which the biomechanics community can build a library of simulations that can be exchanged, tested, analyzed, and improved through a multi-institutional collaboration. Developing software that enables a concerted effort from many investigators poses technical and sociological challenges. Meeting those challenges will accelerate the discovery of principles that govern movement control and improve treatments for individuals with movement pathologies.


T.D., Crawford et al., (2007), PSI3: An open-source Ab Initio electronic structure package, Journal of Computational Chemistry, Volume 28, Issue 9, pages 1610–1616, 15 July 2007 DOI: 10.1002/jcc.20573

  • Abstract

PSI3 is a program system and development platform for ab initio molecular electronic structure computations. The package includes mature programming interfaces for parsing user input, accessing commonly used data such as basis-set information or molecular orbital coefficients, and retrieving and storing binary data (with no software limitations on file sizes or file-system-sizes), especially multi-index quantities such as electron repulsion integrals. This platform is useful for the rapid implementation of both standard quantum chemical methods, as well as the development of new models. Features that have already been implemented include Hartree-Fock, multiconfigurational self-consistent-field, second-order Møller-Plesset perturbation theory, coupled cluster, and configuration interaction wave functions. Distinctive capabilities include the ability to employ Gaussian basis functions with arbitrary angular momentum levels; linear R12 second-order perturbation theory; coupled cluster frequency-dependent response properties, including dipole polarizabilities and optical rotation; and diagonal Born-Oppenheimer corrections with correlated wave functions. This article describes the programming infrastructure and main features of the package. PSI3 is available free of charge through the open-source, GNU General Public License. © 2007 Wiley Periodicals, Inc. J Comput Chem, 2007


L. T. Kell et al., (2007). FLR: an open-source framework for the evaluation and development of management strategies, ICES Journal of Marine Science, 64 (4): 640-646. doi: 10.1093/icesjms/fsm012

  • Abstract

The FLR framework (Fisheries Library for R) is a development effort directed towards the evaluation of fisheries management strategies. The overall goal is to develop a common framework to facilitate collaboration within and across disciplines (e.g. biological, ecological, statistical, mathematical, economic, and social) and, in particular, to ensure that new modelling methods and software are more easily validated and evaluated, as well as becoming widely available once developed. Specifically, the framework details how to implement and link a variety of fishery, biological, and economic software packages so that alternative management strategies and procedures can be evaluated for their robustness to uncertainty before implementation. The design of the framework, including the adoption of object-orientated programming, its feasibility to be extended to new processes, and its application to new management approaches (e.g. ecosystem affects of fishing), is discussed. The importance of open source for promoting transparency and allowing technology transfer between disciplines and researchers is stressed.


A.F. Albuquerquea et al.,(2007), The ALPS project release 1.3: Open-source software for strongly correlated systems, Journal of Magnetism and Magnetic Materials, Volume 310, Issue 2, Part 2, March 2007, Pages 1187–1193

  • Abstract

We present release 1.3 of the ALPS (Algorithms and Libraries for Physics Simulations) project, an international open-source software project to develop libraries and application programs for the simulation of strongly correlated quantum lattice models such as quantum magnets, lattice bosons, and strongly correlated fermion systems. Development is centered on common XML and binary data formats, on libraries to simplify and speed up code development, and on full-featured simulation programs. The programs enable non-experts to start carrying out numerical simulations by providing basic implementations of the important algorithms for quantum lattice models: classical and quantum Monte Carlo (QMC) using non-local updates, extended ensemble simulations, exact and full diagonalization (ED), as well as the density matrix renormalization group (DMRG). Changes in the new release include a DMRG program for interacting models, support for translation symmetries in the diagonalization programs, the ability to define custom measurement operators, and support for inhomogeneous systems, such as lattice models with traps. The software is available from our web server at http://alps.comp-phys.org/.


Morgan L. Maeder et al., (2008), Rapid “Open-Source” Engineering of Customized Zinc-Finger Nucleases for Highly Efficient Gene Modification, Molecular Cell, Volume 31, Issue 2, 294-301, doi:10.1016/j.molcel.2008.06.016

  • Summary

Custom-made zinc-finger nucleases (ZFNs) can induce targeted genome modifications with high efficiency in cell types including Drosophila, C. elegans, plants, and humans. A bottleneck in the application of ZFN technology has been the generation of highly specific engineered zinc-finger arrays. Here we describe OPEN (Oligomerized Pool ENgineering), a rapid, publicly available strategy for constructing multifinger arrays, which we show is more effective than the previously published modular assembly method. We used OPEN to construct 37 highly active ZFN pairs which induced targeted alterations with high efficiencies (1%–50%) at 11 different target sites located within three endogenous human genes (VEGF-A, HoxB13, and CFTR), an endogenous plant gene (tobacco SuRA), and a chromosomally integrated EGFP reporter gene. In summary, OPEN provides an “open-source” method for rapidly engineering highly active zinc-finger arrays, thereby enabling broader practice, development, and application of ZFN technology for biological research and gene therapy.


R. C. G. Holland et al., (2008), BioJava: an open-source framework for bioinformatics, Oxford Journals - Life Sciences & Mathematics & Physical Sciences: Bioinformatics, Volume 24, Issue 18, pp. 2096-2097.

  • Summary

BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org).


Marc Sturm et l., (2008), OpenMS – An open-source software framework for mass spectrometry,BMC Bioinformatics 9:163 doi:10.1186/1471-2105-9-163

  • Abstract

Background Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow.

Results We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.

Conclusion OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at http://www.openms.de


M. Neteler and H. Mitasova, (2008), Open Source GIS- A GRASS GIS Approach. (Hardcover). Originally published as volume 773 in the series: The International Series in Engineering and Computer Science 3rd ed., 2008, XX, 406 p. 80 illus.

  • Includes rich set of practical examples extensively tested by users with data and changes, related to software updates readily available on a related website
  • The vector data architecture in the third edition is completely new; database support added
  • Vector data covers both polygons, lines and sites in a new way and includes database management

With this third edition of Open Source GIS: A GRASS GIS Approach, we enter the new era of GRASS6, the first release that includes substantial new code developed by the International GRASS Development Team. The dramatic growth in open source software libraries has made GRASS6 development more efficient, and has enhanced GRASS interoperability with a wide range of open source and proprietary geospatial tools.


M. Quigley et al., (2009), ROS: an open-source Robot Operating System, Conference Paper - ICRA Workshop on Open Source Software

  • Abstract

This paper gives an overview of ROS, an opensource robot operating system. ROS is not an operating system in the traditional sense of process management and scheduling; rather, it provides a structured communications layer above the host operating systems of a heterogenous compute cluster. In this paper, we discuss how ROS relates to existing robot software frameworks, and briefly overview some of the available application software which uses ROS.


D. Nurmmi et al., (2009). The Eucalyptus Open-source Cloud-computing System, 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, Pages: 124-131

  • Abstract

Cloud computing systems fundamentally provide access to large pools of data and computational resources through a variety of interfaces similar in spirit to existing grid and HPC resource management and programming systems. These types of systems offer a new programming target for scalable application developers and have gained popularity over the past few years. However, most cloud computing systems in operation today are proprietary, rely upon infrastructure that is invisible to the research community, or are not explicitly designed to be instrumented and modified by systems researchers. In this work, we present Eucalyptus - an open-source software framework for cloud computing that implements what is commonly referred to as infrastructure as a service (IaaS); systems that give users the ability to run and control entire virtual machine instances deployed across a variety physical resources. We outline the basic principles of the Eucalyptus design, detail important operational aspects of the system, and discuss architectural trade-offs that we have made in order to allow EUCALYPTUS to be portable, modular and simple to use on infrastructure commonly found within academic settings. Finally, we provide evidence that EUCALYPTUS enables users familiar with existing grid and HPC systems to explore new cloud computing functionality while maintaining access to existing, familiar application development software and grid middleware.


D. Schloss et al., (2009). Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. American Society for Microbiology, vol. 75 no. 23 pp. 7537-7541. doi: 10.1128/​AEM.01541-09

  • Abstract

Mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.



P. Giannozzi et al., (2009). QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials, Journal of Physics: Condensed Matter, Volume 21, Number 39, doi:10.1088/0953-8984/21/39/395502

  • Abstract

QUANTUM ESPRESSO is an integrated suite of computer codes for electronicstructure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). QUANTUM ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newlyrestructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively-parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and inter-operable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.


John H. Barton, (2009), “Patenting and Access to Clean Energy Technologies in Developing Countries,” WIPO Magazine, March 2009, Full paper

  • The paper examines other questions of importance to developing nations including the benefits of strengthening IP protection in order to make foreign investors more willing to transfer technology and asking whether or not local trade barriers are proving helpful or harmful in developing these industries. The author concludes with specific suggestions for developing countries themselves, lenders and donors, and international negotiations. The development and diffusion of renewable energy technologies is only one part of the challenge

of bringing down emissions from the energy sector. Much needs to be done to harvest the largest potential in energy efficiency improvements. Nevertheless, it is our hope that this study will contribute to informing policy processes and negotiations related to technological cooperation and intellectual property in the energy, climate change and trade arenas.


M. Valiev et al., (2010), NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Computer Physics Communications, Volume 181, Issue 9, Pages 1477–1489

  • Abstract

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance.


Michael Woelfle, Piero Olliaro & Matthew H. Todd. (2011), Open science is a research accelerator. Commentary, Nature Chemistry, 3, 745–748, doi:10.1038/nchem.1149

  • Synopsis

An open-source approach to the problem of producing an off-patent drug in enantiopure form serves as an example of how academic and industrial researchers can join forces to make new scientific discoveries that could have a huge impact on human health. This Commentary describes a case study — a chemical project where open-source methodologies were employed to accelerate the process of discovery.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.