Open source science literature review

From Appropedia

Sunhusky.png Michigan Tech's Open Sustainability Technology Lab.

Wanted: Students to make a distributed future with solar-powered open-source 3-D printing and recycling.
Contact Dr. Joshua Pearce - Apply here

MOST: Projects & Publications, Methods, Lit. reviews, People, Sponsors, News
Updates: Twitter, YouTube


Open source science literature review[edit | edit source]

Below is a chronological list of articles pertaining to Open Source Science in Software and Hardware.

Chemistry/ Biology / Medicine / Molecular Modeling[edit | edit source]

P Rice, I Longden and A Bleasby, (2000), EMBOSS: The European Molecular Biology Open Software Suite, The European Molecular Biology Open Software Suite, Volume 16, No.6

  • Abstract

EMBOSS is "The European Molecular Biology Open Software Suite". EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software automatically copes with data in a variety of formats and even allows transparent retrieval of sequence data from the web. Also, as extensive libraries are provided with the package, it is a platform to allow other scientists to develop and release software in true open source spirit. EMBOSS also integrates a range of currently available packages and tools for sequence analysis into a seamless whole. EMBOSS breaks the historical trend towards commercial software packages.

Stefan Steinigera and Erwan Bocherb, (2009), An overview on current free and open source desktop GIS developments, International Journal of Geographical Information Science, Volume 23, Issue 10, 2009, DOI:10.1080/13658810802634956

  • Abstract

Over the past few years the world of free and open source geospatial software has experienced some major changes. For instance, the website currently lists 330 GIS‐related projects. Besides the advent of new software projects and the growth of established projects, a new organisation known as the OSGeo Foundation has been established to offer a point of contact. This paper will give an overview on existing free and open source desktop GIS projects. To further the understanding of the open source software development, we give a brief explanation of associated terms and introduce the two most established software license types: the General Public License (GPL) and the Lesser General Public License (LGPL). After laying out the organisational structures, we describe the different desktop GIS software projects in terms of their main characteristics. Two main tables summarise information on the projects and functionality of the currently available software versions. Finally, the advantages and disadvantages of open source software, with an emphasis on research and teaching, are discussed.

Mark D. Wilkinson,(2002),BioMOBY: An open source biological web services proposal, Brief Bioinform, 3 (4): 331-341. doi: 10.1093/bib/3.4.331

  • Abstract

BioMOBY is an Open Source research project which aims to generate an architecture for the discovery and distribution of biological data through web services; data and services are decentralised, but the availability of these resources, and the instructions for interacting with them, are registered in a central location called MOBY Central. BioMOBY adds to the web services paradigm, as exemplified by Universal Data Discovery and Integration (UDDI), by having an object-driven registry query system with object and service ontologies. This allows users to traverse expansive and disparate data sets where each possible next step is presented based on the data object currently in-hand. Moreover, a path from the current data object to a desired final data object could be automatically discovered using the registry. Native BioMOBY objects are lightweight XML, and make up both the query and the response of a simple object access protocol(SOAP) transaction.

F., Meyer et al., (2003), GenDB—an open source genome annotation system for prokaryote genomes, Nucleic Acids Res, 2003 April 15; 31(8): 2187–2195.

  • Abstract

The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.

S.M. Maurer, (2003), New Institutions for Doing Science: From Databases to Open Source Biology, European Policy for Intellectual Property Conference, University of Maastricht, The Netherlands, November 24-25, 2003

  • Abstract

Recently, several authors have suggested that a new method of doing science called “open source biology” is about to emerge. However, very little has been written about how such an institution would differ from existing research institutions. Scientific databases provide a natural model. During the 1990s, scientists experimented with several new database initiatives designed to reconcile private support with the ideals of open science. Despite significant controversy, this paper argues that private/public transactions that unambiguously promote academic science should be encouraged. In principle, research communities can also organize database collaborations to pursue social and political goals. Examples include discouraging software patents, promoting “green” investment, and improving internet security. Finally, the new field of computational genomics blurs the traditional line between database creation and product development. This paper describes how traditional database institutions can be modified and extended to discover pharmaceuticals. The proposed institution (“open source drug discovery”) would be particularly useful for combating Third World diseases. Success would demonstrate that the open source institution is not limited to computer science and can develop products other than software.

    • great article - lots of good refs to 3rd world applications using drug discovery good lit review of open source databases

Arnaud Delorme and Scott Makeig, (2003). EEGLAB: an opensource toolbox for analysis of single-trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, Volume 134, Issue 1, Pages 9–21

  • Abstract

We have developed a toolbox and graphic user interface, EEGLAB, running under the crossplatform MATLAB environment (The Mathworks, Inc.) for processing collections of single-trial and/or averaged EEG data of any number of channels. Available functions include EEG data, channel and event information importing, data visualization (scrolling, scalp map and dipole model plotting, plus multi-trial ERP-image plots), preprocessing (including artifact rejection, filtering, epoch selection, and averaging), independent component analysis (ICA) and time/frequency decompositions including channel and component cross-coherence supported by bootstrap statistical methods based on data resampling. EEGLAB functions are organized into three layers. Top-layer functions allow users to interact with the data through the graphic interface without needing to use MATLAB syntax. Menu options allow users to tune the behavior of EEGLAB to available memory. Middle-layer functions allow users to customize data processing using command history and interactive ‘pop’ functions. Experienced MATLAB users can use EEGLAB data structures and stand-alone signal processing functions to write custom and/or batch analysis scripts. Extensive function help and tutorial information are included. A ‘plug-in’ facility allows easy incorporation of new EEG modules into the main menu. EEGLAB is freely available ( under the GNU public license for noncommercial use and opensource development, together with sample data, user tutorial and extensive documentation.

Richard C. Atkinson et al., (2003), “INTELLECTUAL PROPERTY RIGHTS: Public Sector Collaboration for Agricultural IP Management,” Science 301, no. 5630 (July 11, 2003): 174-175, doi:10.1126/science.1085553.

  • Abstract

The fragmented ownership of rights to intellectual property (IP) in agricultural biotechnology leads to situations where no single public-sector institution can provide a complete set of IP rights to ensure freedom to operate with a particular technology. This situation causes obstacles to the distribution of improved staple crops for humanitarian purposes in the developing world and specialty crops in the developed world. This Policy Forum describes an initiative by the major agricultural universities in the United States and other public-sector institutions to establish a new paradigm in the management of IP to facilitate commercial development of such crops.

Clement J. McDonald et al., (2003),OpenSource software in medical informatics—why, how and what, International Journal of Medical Informatics, Volume 69, Issues 2–3, Working Conference on Health Information Systems, March 2003, Pages 175–184

  • Abstract

‘OpenSource’ is a 20–40 year old approach to licensing and distributing software that has recently burst into public view. Against conventional wisdom this approach has been wildly successful in the general software market—probably because the openness lets programmers the world over obtain, critique, use, and build upon the source code without licensing fees. Linux, a UNIX-like operating system, is the best known success. But computer scientists at the University of California, Berkeley began the tradition of software sharing in the mid 1970s with BSD UNIX and distributed the major internet network protocols as source code without a fee. Medical informatics has its own history of OpenSource distribution: Massachusetts General's COSTAR and the Veterans Administration's VISTA software have been distributed as source code at no cost for decades. Bioinformatics, our sister field, has embraced the OpenSource movement and developed rich libraries of open-source software. OpenSource has now gained a tiny foothold in health care (OSCAR GEHR, OpenEMed). Medical informatics researchers and funding agencies should support and nurture this movement. In a world where open-source modules were integrated into operational health care systems, informatics researchers would have real world niches into which they could engraft and test their software inventions. This could produce a burst of innovation that would help solve the many problems of the health care system. We at the Regenstrief Institute are doing our part by moving all of our development to the open-source model.

Warren L. DeLano, (2005), The case for open-source software in drug discovery, Drug Discovery Today,Volume 10, Issue 3, 1 February 2005, Pages 213–217,

  • Abstract

Widespread adoption of open-source software for network infrastructure, web servers, code development, and operating systems leads one to ask how far it can go. Will ‘opensource’ spread broadly, or will it be restricted to niches frequented by hopeful hobbyists and midnight hackers? Here we identify reasons for the success of open-source software and predict how consumers in drug discovery will benefit from new open-source products that address their needs with increased flexibility and in ways complementary to proprietary options.

D. Schloss et al., (2009). Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities. American Society for Microbiology, vol. 75 no. 23 pp. 7537-7541. doi: 10.1128/​AEM.01541-09

  • Abstract

Mothur aims to be a comprehensive software package that allows users to use a single piece of software to analyze community sequence data. It builds upon previous tools to provide a flexible and powerful software package for analyzing sequencing data. As a case study, we used mothur to trim, screen, and align sequences; calculate distances; assign sequences to operational taxonomic units; and describe the α and β diversity of eight marine samples previously characterized by pyrosequencing of 16S rRNA gene fragments. This analysis of more than 222,000 sequences was completed in less than 2 h with a laptop computer.

R.C., Gentleman et al., (2004), Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. Epub 2004 Sep 15.

Abstract The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

Stephen M. Maurer, Arti Rai, and Andrej Sali, (2004), Finding Cures for Tropical Diseases: Is Open Source an Answer?", PLoS Medicine 1, no. 3 (December 2004): 183-186.

  • Abstract

This paper showcases that the current models of encouraging pharmaceuticals to research and develop drugs curing tropical diseases that affects poor people aren’t working. These methods are 1)asking governments and NGOs to subsidize drugs rates for developed countries, and 2) to create non-profit venture capital firms. It proposes an open-source model for developing these drugs through a website ( It describes how scientists could use chat pages and shared databases to make discoveries.

  • The payment of scientists working on this database would not be monetary, but scientists would gain stature and enhance their reputation, as is similar to the motivations of the hacker community. The drugs would not be patented in order to ensure that retail costs remained low. Companies and universities would allow their workers to volunteer, and would even donate databases and resources because the value of their IP lies in North American and European medicines.

C. Robertson, J.P Cortens and R. C. Beavis,(2004), Open Source System for Analyzing, Validating, and Storing Protein Identification Data, Journal of Proteome Research, 3 (6), pp 1234–1242, DOI: 10.1021/pr049882h

  • Abstract

This paper describes an open-source system for analyzing, storing, and validating proteomics information derived from tandem mass spectrometry. It is based on a combination of data analysis servers, a user interface, and a relational database. The database was designed to store the minimum amount of information necessary to search and retrieve data obtained from the publicly available data analysis servers. Collectively, this system was referred to as the Global Proteome Machine (GPM). The components of the system have been made available as open source development projects. A publicly available system has been established, comprised of a group of data analysis servers and one main database server.

Ethan G Cerami et al., (2006), cPath: open source software for collecting, storing, and querying biological pathways, BMC Bioinformatics 2006, 7:497 doi:10.1186/1471-2105-7-497

  • Background

Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required.

Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use.

Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.

Burr Settles, (2005), ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text. Bioinformatics, 21 (14): 3191-3192. doi: 10.1093/bioinformatics/bti475

  • Summary

ABNER (A Biomedical Named Entity Recognizer) is an open source software tool for molecular biology text mining. At its core is a machine learning system using conditional random fields with a variety of orthographic and contextual features. The latest version is 1.5, which has an intuitive graphical interface and includes two modules for tagging entities (e.g. protein and cell line) trained on standard corpora, for which performance is roughly state of the art. It also includes a Java application programming interface allowing users to incorporate ABNER into their own systems and train models on new corpora.

P.A. Cook et al., (2005), Camino: Open-source diffusion-MRI reconstruction and processing, The Insight Journal - 2005 MICCAI Open-Source Workshop.

  • Abstract

Camino is an open-source, object-oriented software package for processing diffusion MRI data. Camino implements a data processing pipeline, which allows for easy scripting and flexible integration with other software. This paper summarises the features of Camino at each stage of the pipeline from the raw data to the statistics used by clinicians and researchers. The paper also discusses the role of Camino in the paper "An Automated Approach to Connectivity-based Partitioning of Brain Structures",

Stein Aerts et al., (2005), [TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis], Nucleic Acids Research, Volume 33, Issue suppl 2, pp. 393-396, doi: 10.1093/nar/gki354

  • Abstract

We present the second and improved release of the TOUCAN workbench for cis-regulatory sequence analysis. TOUCAN implements and integrates fast state-of-the-art methods and strategies in gene regulation bioinformatics, including algorithms for comparative genomics and for the detection of cis-regulatory modules. This second release of TOUCAN has become open source and thereby carries the potential to evolve rapidly. The main goal of TOUCAN is to allow a user to come to testable hypotheses regarding the regulation of a gene or of a set of co-regulated genes. TOUCAN can be launched from this location:

S. Kerrien et al., (2006), IntAct—open source resource for molecular interaction data, Nucleic Acids Research 35 (suppl 1): D561-D565. doi: 10.1093/nar/gkl958

  • Abstract

IntAct is an open source database and software suite for modeling, storing and analyzing molecular interaction data. The data available in the database originates entirely from published literature and is manually annotated by expert biologists to a high level of detail, including experimental methods, conditions and interacting domains. The database features over 126 000 binary interactions extracted from over 2100 scientific publications and makes extensive use of controlled vocabularies. The web site provides tools allowing users to search, visualize and download data from the repository. IntAct supports and encourages local installations as well as direct data submission and curation collaborations. IntAct source code and data are freely available from

C. Steinbeck et al., (2006), Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics, Current Pharmaceutical Design, Volume 12, Number 17, June 2006 , pp. 2111-2120(10) DOI:

  • Abstract

The Chemistry Development Kit (CDK) provides methods for common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Implemented in Java, it is used both for server-side computational services, possibly equipped with a web interface, as well as for applications and client-side applets. This article introduces the CDK's new QSAR capabilities and the recently introduced interface to statistical software.

T.D., Crawford et al., (2007), PSI3: An open-source Ab Initio electronic structure package, Journal of Computational Chemistry, Volume 28, Issue 9, pages 1610–1616, 15 July 2007 DOI: 10.1002/jcc.20573

  • Abstract

PSI3 is a program system and development platform for ab initio molecular electronic structure computations. The package includes mature programming interfaces for parsing user input, accessing commonly used data such as basis-set information or molecular orbital coefficients, and retrieving and storing binary data (with no software limitations on file sizes or file-system-sizes), especially multi-index quantities such as electron repulsion integrals. This platform is useful for the rapid implementation of both standard quantum chemical methods, as well as the development of new models. Features that have already been implemented include Hartree-Fock, multiconfigurational self-consistent-field, second-order Møller-Plesset perturbation theory, coupled cluster, and configuration interaction wave functions. Distinctive capabilities include the ability to employ Gaussian basis functions with arbitrary angular momentum levels; linear R12 second-order perturbation theory; coupled cluster frequency-dependent response properties, including dipole polarizabilities and optical rotation; and diagonal Born-Oppenheimer corrections with correlated wave functions. This article describes the programming infrastructure and main features of the package. PSI3 is available free of charge through the open-source, GNU General Public License. © 2007 Wiley Periodicals, Inc. J Comput Chem, 2007

Scott L. Delp et al., (2007). OpenSim: Open-Source Software to Create and Analyze Dynamic Simulations of Movement, IEEE Engineering in Medicine and Biology Society Volume: 54 , Issue: 11, pages 1940-1950 doi: 10.1109/TBME.2007.901024

  • Abstract

We have developed a freely available, open-source software system (OpenSim) that lets users develop models of musculoskeletal structures and create dynamic simulations of a wide variety of movements. We are using this system to simulate the dynamics of individuals with pathological gait and to explore the biomechanical effects of treatments. Dynamic simulations of movement allow one to study neuromuscular coordination, analyze athletic performance, and estimate internal loading of the musculoskeletal system. Simulations can also be used to identify the sources of pathological movement and establish a scientific basis for treatment planning. OpenSim provides a platform on which the biomechanics community can build a library of simulations that can be exchanged, tested, analyzed, and improved through a multi-institutional collaboration. Developing software that enables a concerted effort from many investigators poses technical and sociological challenges. Meeting those challenges will accelerate the discovery of principles that govern movement control and improve treatments for individuals with movement pathologies.

L. T. Kell et al., (2007). FLR: an open-source framework for the evaluation and development of management strategies, ICES Journal of Marine Science, 64 (4): 640-646. doi: 10.1093/icesjms/fsm012

  • Abstract

The FLR framework (Fisheries Library for R) is a development effort directed towards the evaluation of fisheries management strategies. The overall goal is to develop a common framework to facilitate collaboration within and across disciplines (e.g. biological, ecological, statistical, mathematical, economic, and social) and, in particular, to ensure that new modelling methods and software are more easily validated and evaluated, as well as becoming widely available once developed. Specifically, the framework details how to implement and link a variety of fishery, biological, and economic software packages so that alternative management strategies and procedures can be evaluated for their robustness to uncertainty before implementation. The design of the framework, including the adoption of object-orientated programming, its feasibility to be extended to new processes, and its application to new management approaches (e.g. ecosystem affects of fishing), is discussed. The importance of open source for promoting transparency and allowing technology transfer between disciplines and researchers is stressed.

Ola Spjuth et al., (2007), Bioclipse: an open source workbench for chemo- and bioinformatics, BMC Bioinformatics, 8:59 doi:10.1186/1471-2105-8-59

  • Abstract

Background There is a need for software applications that provide users with a complete and extensible toolkit for chemo- and bioinformatics accessible from a single workbench. Commercial packages are expensive and closed source, hence they do not allow end users to modify algorithms and add custom functionality. Existing open source projects are more focused on providing a framework for integrating existing, separately installed bioinformatics packages, rather than providing user-friendly interfaces. No open source chemoinformatics workbench has previously been published, and no sucessful attempts have been made to integrate chemo- and bioinformatics into a single framework.

Results Bioclipse is an advanced workbench for resources in chemo- and bioinformatics, such as molecules, proteins, sequences, spectra, and scripts. It provides 2D-editing, 3D-visualization, file format conversion, calculation of chemical properties, and much more; all fully integrated into a user-friendly desktop application. Editing supports standard functions such as cut and paste, drag and drop, and undo/redo. Bioclipse is written in Java and based on the Eclipse Rich Client Platform with a state-of-the-art plugin architecture. This gives Bioclipse an advantage over other systems as it can easily be extended with functionality in any desired direction.

Conclusion Bioclipse is a powerful workbench for bio- and chemoinformatics as well as an advanced integration platform. The rich functionality, intuitive user interface, and powerful plugin architecture make Bioclipse the most advanced and user-friendly open source workbench for chemo- and bioinformatics. Bioclipse is released under Eclipse Public License (EPL), an open source license which sets no constraints on external plugin licensing; it is totally open for both open source plugins as well as commercial ones. Bioclipse is freely available at

Morgan L. Maeder et al., (2008), Rapid “Open-Source” Engineering of Customized Zinc-Finger Nucleases for Highly Efficient Gene Modification, Molecular Cell, Volume 31, Issue 2, 294-301, doi:10.1016/j.molcel.2008.06.016

  • Summary

Custom-made zinc-finger nucleases (ZFNs) can induce targeted genome modifications with high efficiency in cell types including Drosophila, C. elegans, plants, and humans. A bottleneck in the application of ZFN technology has been the generation of highly specific engineered zinc-finger arrays. Here we describe OPEN (Oligomerized Pool ENgineering), a rapid, publicly available strategy for constructing multifinger arrays, which we show is more effective than the previously published modular assembly method. We used OPEN to construct 37 highly active ZFN pairs which induced targeted alterations with high efficiencies (1%–50%) at 11 different target sites located within three endogenous human genes (VEGF-A, HoxB13, and CFTR), an endogenous plant gene (tobacco SuRA), and a chromosomally integrated EGFP reporter gene. In summary, OPEN provides an “open-source” method for rapidly engineering highly active zinc-finger arrays, thereby enabling broader practice, development, and application of ZFN technology for biological research and gene therapy.

R. C. G. Holland et al., (2008), BioJava: an open-source framework for bioinformatics, Oxford Journals - Life Sciences & Mathematics & Physical Sciences: Bioinformatics, Volume 24, Issue 18, pp. 2096-2097.

  • Summary

BioJava is a mature open-source project that provides a framework for processing of biological data. BioJava contains powerful analysis and statistical routines, tools for parsing common file formats and packages for manipulating sequences and 3D structures. It enables rapid bioinformatics application development in the Java programming language. BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (

Marc Sturm et l., (2008), OpenMS – An open-source software framework for mass spectrometry,BMC Bioinformatics 9:163 doi:10.1186/1471-2105-9-163

  • Abstract

Background Mass spectrometry is an essential analytical technique for high-throughput analysis in proteomics and metabolomics. The development of new separation techniques, precise mass analyzers and experimental protocols is a very active field of research. This leads to more complex experimental setups yielding ever increasing amounts of data. Consequently, analysis of the data is currently often the bottleneck for experimental studies. Although software tools for many data analysis tasks are available today, they are often hard to combine with each other or not flexible enough to allow for rapid prototyping of a new analysis workflow.

Results We present OpenMS, a software framework for rapid application development in mass spectrometry. OpenMS has been designed to be portable, easy-to-use and robust while offering a rich functionality ranging from basic data structures to sophisticated algorithms for data analysis. This has already been demonstrated in several studies.

Conclusion OpenMS is available under the Lesser GNU Public License (LGPL) from the project website at

Bernard Munos (2009), Can open-source R&D reinvigorate drug research?, Nature Reviews Drug Discovery 5, 723-729 (September 2006) | doi:10.1038/nrd2131

  • Abstract

The low number of novel therapeutics approved by the US FDA in recent years continues to cause great concern about productivity and declining innovation. Can open-source drug research and development, using principles pioneered by the highly successful open-source software movement, help revive the industry?

Brendan MacLean et al., (2010),Skyline: an open source document editor for creating and analyzing targeted proteomics experiments, Bioinformatics Volume 26, Issue 7, pp. 966-968

Summary: Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. It is open source and freely available for academic and commercial use. The Skyline user interface simplifies the development of mass spectrometer methods and the analysis of data from targeted proteomics experiments performed using selected reaction monitoring (SRM). Skyline supports using and creating MS/MS spectral libraries from a wide variety of sources to choose SRM filters and verify results based on previously observed ion trap data. Skyline exports transition lists to and imports the native output files from Agilent, Applied Biosystems, Thermo Fisher Scientific and Waters triple quadrupole instruments, seamlessly connecting mass spectrometer output back to the experimental design document. The fast and compact Skyline file format is easily shared, even for experiments requiring many sample injections. A rich array of graphs displays results and provides powerful tools for inspecting data integrity as data are acquired, helping instrument operators to identify problems early. The Skyline dynamic report designer exports tabular data from the Skyline document model for in-depth analysis with common statistical tools.

Availability: Single-click, self-updating web installation is available at This web site also provides access to instructional videos, a support board, an issues list and a link to the source code project.

M. Valiev et al., (2010), NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations, Computer Physics Communications, Volume 181, Issue 9, Pages 1477–1489

  • Abstract

The latest release of NWChem delivers an open-source computational chemistry package with extensive capabilities for large scale simulations of chemical and biological systems. Utilizing a common computational framework, diverse theoretical descriptions can be used to provide the best solution for a given scientific problem. Scalable parallel implementations and modular software design enable efficient utilization of current computational architectures. This paper provides an overview of NWChem focusing primarily on the core theoretical modules provided by the code and their parallel performance.

Declan Butler, (2010), Open-source science takes on neglected disease: Chemist launches collaborative project to make more potent form of much-needed drug, Nature, doi:10.1038/news.2010.50

  • Background

A chemist — and social entrepreneur — in Australia is launching an open-source research project to develop a more potent form of a front-line drug against the debilitating neglected tropical disease schistosomiasis.

Matthew Todd of the University of Sydney hopes to persuade research chemists across the world to share laboratory time and expertise in a collaborative effort to find a cheap and efficient synthesis of the drug praziquantel. All results will be published in almost real time on the project's website — free of intellectual property restrictions — and later in journals, with substantial contributors becoming authors on any resulting papers. "My funded project is intended to be the kernel, to which anyone can add," Todd says. He hopes that the project will become a successful example of open-source science, and open-source 'wet lab' chemistry in particular, a concept that has been slow to take off.

Energy / Physics / Nanotechnology[edit | edit source]

Chihiro Watanabe, Youichirou S. Tsuji, and Charla Griffy-Brown, (2001), “Patent statistics: deciphering a 'real' versus a 'pseudo' proxy of innovation,” Technovation 21, no. 12 (December 2001): 783-790, doi:10.1016/S0166-4972(01)00025-6.

  • Abstract

Patent statistics have fascinated economists concerned about innovation for a long time. However, fundamental questions remain as to whether or not patent statistics represent the real state of innovation. As Griliches pointed out, substantial questions involve: What aspects of economic activities do patent statistics actually capture? And, what would we like them to measure? He pointed out that these statistics can be a mirage appearing to provide a great number of objective and reliable proxies for innovation.

This paper aims to address some of these questions by making a comparative evaluation of the representability of patent statistics in four levels of the innovation process, using as examples research and development (R&D) in Japan's printer and photovoltaic solar cell (PV) industries over the last two decades. Furthermore, this research provides a new set of patent statistics which could be considered a more reliable proxy for innovation.

Jaroslav Hofierka and Marcel Suri, (2002), The solar radiation model for Open source GIS: implementation and applications, Proceedings of the Open source GIS - GRASS users conference 2002 - Trento, Italy, 11-13 September 2002

  • Conclusions

The r.sun is a complex and flexible solar radiation model, fully integrated within open source environment of GRASS GIS. It calculates all three components of solar irradiance/irradiation (beam, diffuse and reflected) for clear-sky as well as overcast conditions. The implemented equations follow the latest European research in solar radiation modelling. Integration in GRASS GIS enables to use interpolation tools that are necessary for data preparation. The model is especially appropriate for modelling of large areas with complex terrain because all spatially variable solar parameters can be defined as raster maps. The model can be used easily for long-term calculations at different map scales ñ from continental to detailed. Two operational modes enable the user account for temporal variability of solar radiation within a day or within a year (using shell scripts). These features offer wide variety of possible applications as documented on the two examples. Open source code enables to make modifications and improvements in future, according to research development in solar radiation modelling or to fit better specific user needs.

A.F. Albuquerquea et al.,(2007), The ALPS project release 1.3: Open-source software for strongly correlated systems, Journal of Magnetism and Magnetic Materials, Volume 310, Issue 2, Part 2, March 2007, Pages 1187–1193

  • Abstract

We present release 1.3 of the ALPS (Algorithms and Libraries for Physics Simulations) project, an international open-source software project to develop libraries and application programs for the simulation of strongly correlated quantum lattice models such as quantum magnets, lattice bosons, and strongly correlated fermion systems. Development is centered on common XML and binary data formats, on libraries to simplify and speed up code development, and on full-featured simulation programs. The programs enable non-experts to start carrying out numerical simulations by providing basic implementations of the important algorithms for quantum lattice models: classical and quantum Monte Carlo (QMC) using non-local updates, extended ensemble simulations, exact and full diagonalization (ED), as well as the density matrix renormalization group (DMRG). Changes in the new release include a DMRG program for interacting models, support for translation symmetries in the diagonalization programs, the ability to define custom measurement operators, and support for inhomogeneous systems, such as lattice models with traps. The software is available from our web server at

Brian Bruns, “Open sourcing nanotoechnology research and development: issues and opportunities,” Nanotechnology 12 (2001): 198-210.

  • This is an excellent paper examining the viability of open source design in the nanotech industry. Important things to learn from the open source software (OSS) successes are the bazaar-style design process, as well as the gift-culture created. Concerns regarding the tragedy of anti-commons provide reason to examine alternative research methods within nanotechnology. The paper discusses various licenses possible for nanotechnology and identifies this as an area where more research should be done. Various business models are highlighted, including the ‘’producer coalition’’, and reminds the reader that there are various levels of openness that firms could adopt depending on their business. A survey of the nanotech industry is done, and it is important to note that the many nanotechnology firms get funding from the US government, which favours strong IP and patenting laws.

P. Giannozzi et al., (2009). QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials, Journal of Physics: Condensed Matter, Volume 21, Number 39, doi:10.1088/0953-8984/21/39/395502

  • Abstract

QUANTUM ESPRESSO is an integrated suite of computer codes for electronicstructure calculations and materials modeling, based on density-functional theory, plane waves, and pseudopotentials (norm-conserving, ultrasoft, and projector-augmented wave). QUANTUM ESPRESSO stands for opEn Source Package for Research in Electronic Structure, Simulation, and Optimization. It is freely available to researchers around the world under the terms of the GNU General Public License. QUANTUM ESPRESSO builds upon newlyrestructured electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are still its main focus, with special attention paid to massively-parallel architectures, and a great effort being devoted to user friendliness. QUANTUM ESPRESSO is evolving towards a distribution of independent and inter-operable codes in the spirit of an open-source project, where researchers active in the field of electronic-structure calculations are encouraged to participate in the project by contributing their own codes or by implementing their own ideas into existing codes.

John H. Barton, (2009), “Patenting and Access to Clean Energy Technologies in Developing Countries,” WIPO Magazine, March 2009, Full paper

  • The paper examines other questions of importance to developing nations including the benefits of strengthening IP protection in order to make foreign investors more willing to transfer technology and asking whether or not local trade barriers are proving helpful or harmful in developing these industries. The author concludes with specific suggestions for developing countries themselves, lenders and donors, and international negotiations. The development and diffusion of renewable energy technologies is only one part of the challenge

of bringing down emissions from the energy sector. Much needs to be done to harvest the largest potential in energy efficiency improvements. Nevertheless, it is our hope that this study will contribute to informing policy processes and negotiations related to technological cooperation and intellectual property in the energy, climate change and trade arenas.

F. Alet et a., (2005), The ALPS project: open source software for strongly correlated systems, Journal of Physical Society of Japan. DOI: 10.1143/JPSJS.74S.30

  • Abstract

We present the ALPS (Algorithms and Libraries for Physics Simulations) project, an international open source software project to develop libraries and application programs for the simulation of strongly correlated quantum lattice models such as quantum magnets, lattice bosons, and strongly correlated fermion systems. Development is centered on common XML and binary data formats, on libraries to simplify and speed up code development, and on full-featured simulation programs. The programs enable non-experts to start carrying out numerical simulations by providing basic implementations of the important algorithms for quantum lattice models: classical and quantum Monte Carlo (QMC) using non-local updates, extended ensemble simulations, exact and full diagonalization (ED), as well as the density matrix renormalization group (DMRG). The software is available from our web server at

Computer / Information Sciences & Systems / Modeling / Programming / Processing[edit | edit source]

Stefan Koch and Georg Schneider, (2002), Effort, co-operation and co-ordination in an open source software project: GNOME, Information Systems Journal, Volume 12, Issue 1, pages 27–42,DOI: 10.1046/j.1365-2575.2002.00110.x

  • Abstract

This paper presents results from research into open source projects from a software engineering perspective. The research methodology employed relies on public data retrieved from the CVS repository of the GNOME project and relevant discussion groups. This methodology is described, and results concerning the special characteristics of open source software development are given. These data are used for a first approach to estimating the total effort to be expended.

Alessandro Cimatti et al., (2002), NuSMV 2: An OpenSource Tool for Symbolic Model Checking, Computer Science, Volume 2404/2002, 241-268, DOI: 10.1007/3-540-45657-0_29

  • Abstract

This paper describes version 2 of the NuSMV tool (computer aided verification). NuSMV is a symbolic model checker originated from the reengineering, reimplementation and extension of SMV, the original BDD-based model checker developed at CMU. The NuSMV project aims at the development of a state-of-the-art symbolic model checker, designed to be applicable in technology transfer projects: it is a well structured, open, flexible and documented platform for model checking, and is robust and close to industrial systems standards.

Carolina Cruz-Neira et al., (2002), VR Juggler -- An Open Source Platform for Virtual Reality Applications, IN 40TH AIAA AEROSPACE SCIENCES MEETING AND EXHIBIT 2002

  • Abstract

This paper describes VR Juggler, an Open Source platform used to develop and run virtual reality applications. We emphasize VR Juggler's ability to provide a uniform VR application environment and to allow extendibility to new devices without affecting existing applications. These features enable VR applications to evolve along side other technologies with minimal or no new developmental efforts.

R. Lougee-Heimer, (2003), The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research community, IBM Journal of Research and Development, Volume: 47 , Issue: 1, p. 57- 66

  • Abstract

The Common Optimization INterface for Operations Research (COIN-OR, is an initiative to promote open-source software for the operations research (OR) community. In OR practice and research, software is fundamental. The dependence of OR on software implies that the ways in which software is developed, managed, and distributed can have a significant impact on the field. Open source is a relatively new software development and distribution model which offers advantages over current practices. Its viability depends on the precise definition of open source, on the culture of a distributed developer community, and on a version-control system which makes distributed development possible. In this paper, we review open-source philosophy and culture, and present the goals and status of COIN-OR

M.K. Smith et al., (2003), DSpace: An Open Source Dynamic Digital Repository, D-Lib Magazine, Volume 9 Number 1. DOI: 10.1045/january2003-smith

  • Abstract

For the past two years the Massachusetts Institute of Technology (MIT) Libraries and Hewlett-Packard Labs have been collaborating on the development of an open source system called DSpaceâ„¢ that functions as a repository for the digital research and educational material produced by members of a research university or organization. Running such an institutionally-based, multidisciplinary repository is increasingly seen as a natural role for the libraries and archives of research and teaching organizations. As their constituents produce increasing amounts of original material in digital formats—much of which is never published by traditional means—the repository becomes vital to protect the significant assets of the institution and its faculty. The first part of this article describes the DSpace system including its functionality and design, and its approach to various problems in digital library and archives design. The second part discusses the implementation of DSpace at MIT, plans for federating the system, and issues of sustainability.

R. Lougee-Heimer, (2003), The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research communityIBM Journal of Research and Development, Volume: 47 , Issue: 1, pp 57- 66

  • Abstract

The Common Optimization INterface for Operations Research (COIN-OR, is an initiative to promote open-source software for the operations research (OR) community. In OR practice and research, software is fundamental. The dependence of OR on software implies that the ways in which software is developed, managed, and distributed can have a significant impact on the field. Open source is a relatively new software development and distribution model which offers advantages over current practices. Its viability depends on the precise definition of open source, on the culture of a distributed developer community, and on a version-control system which makes distributed development possible. In this paper, we review open-source philosophy and culture, and present the goals and status of COIN-OR.

T. Staples et al., (2003), The Fedora Project : An open-source Digital Object Repository Management System, D-Lib Magazine, April 2003, v. 9, no. 4

  • About

Using a grant from the Andrew W. Mellon Foundation, the University of Virginia Library has released an open-source digital object repository management system. The Fedora Project, a joint effort of the University of Virginia and Cornell University, has now made available the first version of a system based on the Flexible Extensible Digital Object Repository Architecture, originally developed at Cornell.

Fedora repositories can provide the foundation for a variety of information management schemes, including digital library systems. At the University of Virginia, Fedora is being used to build a large-scale digital library that will soon have millions of digital resources of all media and content types. A consortium of institutions that include the Library of Congress, Northwestern University, and Tufts University is also currently testing the program. They are building test beds drawn from their own digital collections that they will use to evaluate the software and give feedback to the project.

S. Dudoit, R. C. Gentleman and J. Quackenbush, (2003),Open Source Software for the Analysis of Microarray Data, BioTechniques34, pp45-51

  • Abstract

DNA microarray assays represent the first widely used application that attempts to build upon the information provided by genome projects in the study of biological questions. One of the greatest challenges with working with microarrays is collecting, managing, and analyzing data. Although several commercial and noncommercial solutions exist, there is a growing body of freely available, open source software that allows users to analyze data using a host of existing techniques and to develop their own and integrate them within the system. Here we review three of the most widely used and comprehensive systems, the statistical analysis tools written in R through the Bioconductor project (, the Java®-based TM4 software system available from The Institute for Genomic Research (, and BASE, the Web-based system developed at Lund University (

M. Dougiamas & P. Taylor, (2003). Moodle: Using Learning Communities to Create an Open Source Course Management System. In D. Lassner & C. McNaught (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2003, pp. 171-178

  • Abstract

This paper summarizes a PhD research project that has contributed towards the development of Moodle - a popular open-source course management system ( In this project we applied theoretical perspectives such as "social constructionism" and "connected knowing" to the analysis of our own online classes as well as the growing learning community of other Moodle users. We used the mode of participatory action research, including techniques such as case studies, ethnography, learning environment surveys and design methodologies. This ongoing analysis is being used to guide the development of Moodle as a tool for improving processes within communities of reflective inquiry. At the time of writing (April 2003), Moodle has been translated into twenty-seven languages and is being used by many hundreds of educators around the world, including universities, schools and independent teachers.

M. Dougiamas & P. Taylor, (2003). Moodle: Using Learning Communities to Create an Open Source Course Management System. In D. Lassner & C. McNaught (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2003, pp. 171-178

  • Abstract

This paper summarizes a PhD research project that has contributed towards the development of Moodle - a popular open-source course management system ( In this project we applied theoretical perspectives such as "social constructionism" and "connected knowing" to the analysis of our own online classes as well as the growing learning community of other Moodle users. We used the mode of participatory action research, including techniques such as case studies, ethnography, learning environment surveys and design methodologies. This ongoing analysis is being used to guide the development of Moodle as a tool for improving processes within communities of reflective inquiry. At the time of writing (April 2003), Moodle has been translated into twenty-seven languages and is being used by many hundreds of educators around the world, including universities, schools and independent teachers.

Antoine Rosset, Luca Spadola and Osman Ratib, (2004), OsiriX: An Open-Source Software for Navigating in Multidimensional DICOM Images, Journal of Digital Imaging, Volume 17, Number 3, pp. 205-216, DOI: 10.1007/s10278-004-1014-6

  • Abstract

A multidimensional image navigation and display software was designed for display and interpretation of large sets of multidimensional and multimodality images such as combined PET-CT studies. The software is developed in Objective-C on a Macintosh platform under the MacOS X operating system using the GNUstep development environment. It also benefits from the extremely fast and optimized 3D graphic capabilities of the OpenGL graphic standard widely used for computer games optimized for taking advantage of any hardware graphic accelerator boards available. In the design of the software special attention was given to adapt the user interface to the specific and complex tasks of navigating through large sets of image data. An interactive jog-wheel device widely used in the video and movie industry was implemented to allow users to navigate in the different dimensions of an image set much faster than with a traditional mouse or on-screen cursors and sliders. The program can easily be adapted for very specific tasks that require a limited number of functions, by adding and removing tools from the programs toolbar and avoiding an overwhelming number of unnecessary tools and functions. The processing and image rendering tools of the software are based on the open-source libraries ITK and VTK. This ensures that all new developments in image processing that could emerge from other academic institutions using these libraries can be directly ported to the OsiriX program. OsiriX is provided free of charge under the GNU open-source licensing agreement at

M.J.L. de Hoon, S. Imoto, J. Nolan and S. Miyano, (2004), Open Source Clustering Software, Oxford University Press, 20 (9): 1453-1454. DOI: 10.1093/bioinformatics/bth078

  • Summary

We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C.

Edgar Gabriel et al., (2004),Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE: Lecture Notes in Computer Science, 2004, Volume 3241/2004, 353-377, DOI: 10.1007/978-3-540-30218-6_19

  • Abstract

A large number of MPI implementations are currently available, each of which emphasize different aspects of high-performance computing or are intended to solve a specific research problem. The result is a myriad of incompatible MPI implementations, all of which require separate installation, and the combination of which present significant logistical challenges for end users. Building upon prior research, and influenced by experience gained from the code bases of the LAM/MPI, LA-MPI, and FT-MPI projects, Open MPI is an all-new, production-quality MPI-2 implementation that is fundamentally centered around component concepts. Open MPI provides a unique combination of novel features previously unavailable in an open-source, production-quality implementation of MPI. Its component architecture provides both a stable platform for third-party research as well as enabling the run-time composition of independent software add-ons. This paper presents a high-level overview the goals, design, and implementation of Open MPI.

Peter Christen, Tim Churches and Markus Hegland, (2004),Febrl – A Parallel Open Source Data Linkage System, ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING Lecture Notes in Computer Science, 2004, Volume 3056/2004, 638-647, DOI: 10.1007/978-3-540-24775-3_75

  • Abstract

In many data mining projects information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. Most of the time the linkage process is challenged by the lack of a common unique entity identifier, and thus becomes non-trivial. Linking todays large data collections becomes increasingly difficult using traditional linkage techniques. In this paper we present an innovating data linkage system called Febrl, which includes a new probabilistic approach for improved data cleaning and standardisation, innovative indexing methods, a parallelisation approach which is implemented transparently to the user, and a data set generator which allows the random creation of records containing names and addresses. Implemented as open source software, Febrl is an ideal experimental platform for new linkage algorithms and techniques.

Will Schroeder, (2005), The ITK Software Guide Second Edition Updated for ITK version 2.4, Computer and Information Science, Volume: 525, Issue: 1-3, Publisher: Citeseer, Pages: 53-58

  • Abstract

The Insight Toolkit (ITK) is an open-source software toolkit for performing registration and segmentation. Segmentation is the process of identifying and classifying data found in a digi- tally sampled representation. Typically the sampled representation is an image acquired from suchmedical instrumentation as CT orMRI scanners. Registration is the task of aligning or de- veloping correspondences between data. For example, in the medical environment, a CT scan may be aligned with aMRI scan in order to combine the information contained in both. ITK is implemented in C++. It is cross-platform, using a build environment known as CMake to manage the compilation process in a platform-independent way. In addition, an automated wrapping process (Cable) generates interfaces between C++ and interpreted programming lan- guages such as Tcl, Java, and Python. This enables developers to create software using a variety of programming languages. ITKs C++ implementation style is referred to as generic program- ming, which is to say that it uses templates so that the same code can be applied generically to any class or type that happens to support the operations used. Such C++ templating means that the code is highly efficient, and that many software problems are discovered at compile-time, rather than at run-time during programexecution. Because ITKis an open-source project, developers fromaround theworld can use, debug,main- tain, and extend the software. ITKuses amodel of software development referred to as Extreme Programming. Extreme Programming collapses the usual software creation methodology into a simultaneous and iterative process of design-implement-test-release. The key features of Ex- treme Programming are communication and testing. Communication among the members of the ITK community is what helps manage the rapid evolution of the software. Testing is what keeps the software stable. In ITK, an extensive testing process (using a system known as Dart) is in place that measures the quality on a daily basis. The ITK Testing Dashboard is posted continuously, reflecting the quality of the software at any moment. This book is a guide to using and developing with ITK. The sample code in the directory pro- vides a companion to the material presented here. The most recent version of this document is available online at

B. Alpern et al., (2005), The Jikes Research Virtual Machine project: Building an open-source research community, IBM Systems Journal, Volume: 44 , Issue: 2, pp 399-417

  • Abtract

This paper describes the evolution of the Jikes™ Research Virtual Machine project from an IBM internal research project, called Jalapeño, into an open-source project. After summarizing the original goals of the project, we discuss the motivation for releasing it as an open-source project and the activities performed to ensure the success of the project. Throughout, we highlight the unique challenges of developing and maintaining an open-source project designed specifically to support a research community.

Suresh Thummalapenta and Tao Xie, (2007), Parseweb: a programmer assistant for reusing open source code on the web, ASE '07, Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, Pages 204-213, doi>10.1145/1321631.1321663

  • Abstract

Programmers commonly reuse existing frameworks or libraries to reduce software development efforts. One common problem in reusing the existing frameworks or libraries is that the programmers know what type of object that they need, but do not know how to get that object with a specific method sequence. To help programmers to address this issue, we have developed an approach that takes queries of the form "Source object type → Destination object type" as input, and suggests relevant method-invocation sequences that can serve as solutions that yield the destination object from the source object given in the query. Our approach interacts with a code search engine (CSE) to gather relevant code samples and performs static analysis over the gathered samples to extract required sequences. As code samples are collected on demand through CSE, our approach is not limited to queries of any specific set of frameworks or libraries. We have implemented our approach with a tool called PARSEWeb, and conducted four different evaluations to show that our approach is effective in addressing programmer's queries. We also show that PARSEWeb performs better than existing related tools: Prospector and Strathcona

D. Krajzewicz, M. Bonert and P. Wagner, (2006), RoboCup 2006 Infrastructure Simulation Competition, Computer and Information Science › Miscellaneous Papers

  • Abstract

Since the year 2000, the Institute of Transportation Research (IVF) at the German Aerospace Centre (DLR) is developing a microscopic, traffic simulation package. The complete package is offered as open source to establish the software as a common testbed for algorithms and models from traffic research. Since the year 2003 the IVF also works on a virtual traffic management centre and in conjunction with this on traffic management. Several large-scale projects have been done since this time, most importantly INVENT where modern traffic management methods have been evaluated and the online-simulation and prediction of traffic during the world youth day (Weltjugendtag) 2005 in Cologne/Germany. This publication briefly describes the simulation package together with the projects mentioned above to show how SUMO can be used to simulate large- scale traffic scenarios. Additionally, it is pointed out how SUMO may be used as a testbed for automatic management algorithms with minor effort in developing extensions.

Alan MacCormack, John Rusnak and Carliss Y. Baldwin, (2006). Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code. Management Science, Vol. 52, No. 7, pp. 1015-1030

  • Abstract

This paper reports data from a research project which seeks to characterize the differences in design structure between complex software products. In particular, we adopt a technique based upon Design Structure Matrices (DSMs) to map the dependencies between different elements of a design then develop metrics that allow us to compare the structures of these different DSMs. We demonstrate the power of this approach in two ways: First, we compare the design structures of two complex software products – the Linux operating system and the Mozilla web browser – that were developed via contrasting modes of organization: specifically, open source versus proprietary development. We find significant differences in their designs, consistent with an interpretation that Linux possesses a more “modular” architecture. We then track the evolution of Mozilla, paying particular attention to a major “re-design” effort that took place several months after its release as an open source product. We show that this effort resulted in a design structure that was significantly more modular than its predecessor, and indeed, more modular than that of a comparable version of Linux.

Our findings demonstrate that it is possible to characterize the structure of complex product designs and draw meaningful conclusions about the precise ways in which they differ. We provide a description of a set of tools that will facilitate this analysis for software products, which should prove fruitful for researchers and practitioners alike. Empirically, the data we provide, while exploratory, is consistent with a view that different modes of organization may tend to produce designs possessing different architectural characteristics. However, we also find that purposeful efforts to re-design a product’s architecture can have a significant impact on the structure of a design, at least for products of comparable complexity to the ones we examine here.

Pierre Azoulaya, Andrew Stellmanb and Joshua Graff Zivinc, (2006), PublicationHarvester: An open-source software tool for science policy research, Research Policy, Volume 35, Issue 7, Pages 970–974

  • Abstract

We present PublicationHarvester, an open-source software tool for gathering publication information on individual life scientists. The software interfaces with MEDLINE, and allows the end-user to specify up to four MEDLINE-formatted names for each researcher. Using these names along with a user-specified search query, PublicationHarvester generates yearly publication counts, optionally weighted by Journal Impact Factors. These counts are further broken-down by order on the authorship list (first, last, second, next-to-last, middle) and by publication type (clinical trials, regular journal articles, reviews, letters/editorials, etc.) The software also generates a keywords report at the scientist-year level, using the medical subject headings (MeSH) assigned by the National Library of Medicine to each publication indexed by MEDLINE. The software, source code, and user manual can be downloaded at

Kevin Crowston, James Howison and Hala Annabi, (2006), Information systems success in free and open source software development: theory and measures, Software Process: Improvement and Practice Special Issue: Special Issue on Free or Open Source Software Development (F/OSSD) Projects, Volume 11, Issue 2, pages 123–148, March/April 2006 DOI: 10.1002/spip.259

  • Abstract

Abstract Information systems success is one of the most widely used dependent variables in information systems (IS) research, but research on free/libre and open source software (FLOSS) often fails to appropriately conceptualize this important concept. In this article, we reconsider what success means within a FLOSS context. We first review existing models of IS success and success variables used in FLOSS research and assess them for their usefulness, practicality and fit to the FLOSS context. Then, drawing on a theoretical model of group effectiveness in the FLOSS development process, as well as an on-line discussion with developers, we present additional concepts that are central to an appropriate understanding of success for FLOSS.

In order to examine the practicality and validity of this conceptual scheme, the second half of our article presents an empirical study that demonstrates operationalizations of the chosen measures and assesses their internal validity. We use data from SourceForge to measure the project's effectiveness in team building, the speed of the project at responding to bug reports and the project's popularity. We conclude by discussing the implications of this study for our proposed extension of IS success in the context of FLOSS development and highlight future directions for research. Copyright © 2006 John Wiley & Sons, Ltd.

Janos Demeter et al., (2007), The Stanford Microarray Database: implementation of new analysis tools and open source release of software, Nucleic Acids Research (2007) 35 (suppl 1): D766-D770. doi: 10.1093/nar/gkl1019

  • Abstract

The Stanford Microarray Database (SMD; is a research tool and archive that allows hundreds of researchers worldwide to store, annotate, analyze and share data generated by microarray technology. SMD supports most major microarray platforms, and is MIAME-supportive and can export or import MAGE-ML. The primary mission of SMD is to be a research tool that supports researchers from the point of data generation to data publication and dissemination, but it also provides unrestricted access to analysis tools and public data from 300 publications. In addition to supporting ongoing research, SMD makes its source code fully and freely available to others under an Open Source license, enabling other groups to create a local installation of SMD. In this article, we describe several data analysis tools implemented in SMD and we discuss features of our software release.

M. Neteler and H. Mitasova, (2008), Open Source GIS- A GRASS GIS Approach. (Hardcover). Originally published as volume 773 in the series: The International Series in Engineering and Computer Science 3rd ed., 2008, XX, 406 p. 80 illus.

  • Includes rich set of practical examples extensively tested by users with data and changes, related to software updates readily available on a related website
  • The vector data architecture in the third edition is completely new; database support added
  • Vector data covers both polygons, lines and sites in a new way and includes database management

With this third edition of Open Source GIS: A GRASS GIS Approach, we enter the new era of GRASS6, the first release that includes substantial new code developed by the International GRASS Development Team. The dramatic growth in open source software libraries has made GRASS6 development more efficient, and has enhanced GRASS interoperability with a wide range of open source and proprietary geospatial tools.

Dominic Widdows and Kathleen Ferraro, (2008), Semantic vectors: a scalable open source package and online technology management application, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), European Language Resources Association (ELRA), May, 2008

  • Abstract

This paper describes the open source SemanticVectors package that efficiently creates semantic vectors for words and documents from a corpus of free text articles. We believe that this package can play an important role in furthering research in distributional semantics, and (perhaps more importantly) can help to significantly reduce the current gap that exists between good research results and valuable applications in production software. Two clear principles that have guided the creation of the package so far include ease-of-use and scalability. The basic package installs and runs easily on any Java-enabled platform, and depends only on Apache Lucene. Dimension reduction is performed using Random Projection, which enables the system to scale much more effectively than other algorithms used for the same purpose. This paper also describes a trial application in the Technology Management domain, which highlights some user-centred design challenges which we believe are also key to successful deployment of this technology.

M. Quigley et al., (2009), ROS: an open-source Robot Operating System, Conference Paper - ICRA Workshop on Open Source Software

  • Abstract

This paper gives an overview of ROS, an opensource robot operating system. ROS is not an operating system in the traditional sense of process management and scheduling; rather, it provides a structured communications layer above the host operating systems of a heterogenous compute cluster. In this paper, we discuss how ROS relates to existing robot software frameworks, and briefly overview some of the available application software which uses ROS.

D. Nurmmi et al., (2009). The Eucalyptus Open-source Cloud-computing System, 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, Pages: 124-131

  • Abstract

Cloud computing systems fundamentally provide access to large pools of data and computational resources through a variety of interfaces similar in spirit to existing grid and HPC resource management and programming systems. These types of systems offer a new programming target for scalable application developers and have gained popularity over the past few years. However, most cloud computing systems in operation today are proprietary, rely upon infrastructure that is invisible to the research community, or are not explicitly designed to be instrumented and modified by systems researchers. In this work, we present Eucalyptus - an open-source software framework for cloud computing that implements what is commonly referred to as infrastructure as a service (IaaS); systems that give users the ability to run and control entire virtual machine instances deployed across a variety physical resources. We outline the basic principles of the Eucalyptus design, detail important operational aspects of the system, and discuss architectural trade-offs that we have made in order to allow EUCALYPTUS to be portable, modular and simple to use on infrastructure commonly found within academic settings. Finally, we provide evidence that EUCALYPTUS enables users familiar with existing grid and HPC systems to explore new cloud computing functionality while maintaining access to existing, familiar application development software and grid middleware.

Darren Kessner et al., (2009), ProteoWizard: open source software for rapid proteomics tools development, Bioinformatics Volume 24, Issue 21Pp. 2534-2536

  • Abstract from Bioinformatics

Summary: The ProteoWizard software project provides a modular and extensible set of open-source, cross-platform tools and libraries. The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard proteomics and LCMS dataset computations. The library contains readers and writers of the mzML data format, which has been written using modern C++ techniques and design principles and supports a variety of platforms with native compilers. The software has been specifically released under the Apache v2 license to ensure it can be used in both academic and commercial projects. In addition to the library, we also introduce a rapidly growing set of companion tools whose implementation helps to illustrate the simplicity of developing applications on top of the ProteoWizard library.

Availability: Cross-platform software that compiles using native compilers (i.e. GCC on Linux, MSVC on Windows and XCode on OSX) is available for download free of charge, at This website also provides code examples, and documentation. It is our hope the ProteoWizard project will become a standard platform for proteomics development; consequently, code use, contribution and further development are strongly encouraged.

Fowler, James E. (2000), QccPack: An open-source software library for quantization, compression, and coding, In Applications of Digital Image Processing XXIII, A. G. Tescher, Ed., San Diego, CA, August 2000, Proc. SPIE 4115, pp. 294-301.

  • Abstract

We describe the QccPack software package, an open-source collection of library routines and utility programs for quantization, compression, and coding of data. QccPack is being written to expedite data-compression research and development by providing general and reliable implementations of common compression techniques. Functionality of the current release includes entropy coding, scalar quantization, vector quantization, adaptive vector quantization, wavelet transforms and subband coding, error-correcting codes, image-processing support, and general vector-math, matrix-math, file-I/O, and error-message routines. All QccPack functionality is accessible via library calls; additionally, many utility programs provide command-line access. The QccPack software package, downloadable free of charge from the QccPack Web page, is published under the terms of the GNU General Public License and the GNU Library General Public License which guarantee source-code access and as well as allow redistribution and modification. Additionally, there exist optional modules that implement certain patented algorithms. These modules are downloadable separately and are typically issued under licenses that permit only non-commercial use.

Speech / Vocabulary / Ontology[edit | edit source]

Akinobu Lee, Kiyohiro Shikano and Tatsuya Kawahara. (2001). Julius - an Open Source Real-Time Large Vocabulary Recognition Engine. In Eurospeech 2001 - Scandinavia, 1691-1694.

  • Abstract

Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.

Willie Walker et al., (2004), Sphinx-4: a flexible open source framework for speech recognition, Technical Report

  • Abstract

Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a "researchready" system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source

Natalya F. Noy et al., (2003), Protégé-2000: An Open-Source Ontology-Development and Knowledge-Acquisition Environment, AMIA 2003 Open Source Expo

  • Abstract

Protégé-2000 is an open-source tool that assists users in the construction of large electronic knowledge bases. It has an intuitive user interface that enables developers to create and edit domain ontologies. Numerous plugins provide alternative visualization mechanisms, enable management of multiple ontologies, allow the use of inference engines and problem solvers with Protégé ontologies, and provide other functionality. The Protégé user community has more than 7000 members.

Evren Sirin et al., Pellet: A practical OWL-DL reasoner, Software Engineering and the Semantic Web, Volume 5, Issue 2, June 2007, Pages 51–53

  • Abstract

In this paper, we present a brief overview of Pellet: a complete OWL-DL reasoner with acceptable to very good performance, extensive middleware, and a number of unique features. Pellet is the first sound and complete OWL-DL reasoner with extensive support for reasoning with individuals (including nominal support and conjunctive query), user-defined datatypes, and debugging support for ontologies. It implements several extensions to OWL-DL including a combination formalism for OWL-DL ontologies, a non-monotonic operator, and preliminary support for OWL/Rule hybrid reasoning. Pellet is written in Java and is opensource.

J. Atserias, B. Casas, E. Comelles, M. Gonzalez, L. Padro, and M. Padro. TALP Research Center Universitat Politecnica de Catalunya Barcelona, Spain, FreeLing 1.3: Syntactic and semantic services in an open-source NLP library

  • Abstract

This paper describes version 1.3 of the FreeLing suite of NLP tools. FreeLing was first released in February 2004 providing morphological analysis and PoS tagging for Catalan, Spanish, and English. From then on, the package has been improved and enlarged to cover more languages (i.e. Italian and Galician) and offer more services: Named entity recognition and classification, chunking, dependency parsing, and WordNet based semantic annotation. FreeLing is not conceived as end-user oriented tool, but as library on top of which powerful NLP applications can be developed. Nevertheless, sample interface programs are provided, which can be straightforwardly used as fast, flexible, and efficient corpus processing tools.

A remarkable feature of FreeLing is that it is distributed under a free-software LGPL license, thus enabling any developer to adapt the package to his needs in order to get the most suitable behaviour for the application being developed

FreeLing Home

Zhifei Li et al., (2009), Joshua: an open source toolkit for parsing-based machine translation, Proceeding StatMT '09, Proceedings of the Fourth Workshop on Statistical Machine Translation, Pages 135-139

  • Abstract

We describe Joshua, an open source toolkit for statistical machine translation. Joshua implements all of the algorithms required for synchronous context free grammars (SCFGs): chart-parsing, n-gram language model integration, beam-and cube-pruning, and k-best extraction. The toolkit also implements suffix-array grammar extraction and minimum error rate training. It uses parallel and distributed computing techniques for scalability. We demonstrate that the toolkit achieves state of the art translation performance on the WMT09 French-English translation task.

General Open Source Science[edit | edit source]

AG González, (2005), Open science: open source licenses in scientific research, North Carolina Journal of Law & Technology., Vol. 7, Issue 2.

  • Abstract

In recent years, there has been growing interest in the area of open source software (OSS) as an alternative economic model. However, the success of the OSS mindshare and collaborative online experience has wider implications to many other fields of human endeavour than the mere licensing of computer programmes. There are a growing number of institutions interested in using OSS licensing schemes to distribute creative works, scientific research and even to publish online journals through open access licenses (OA).

There appears to be growing concern in the scientific community about the trend to fence and protect scientific research through intellectual property, particularly by the abuse of patent applications for biotechnology research. The OSS experience represents a successful model that demonstrates that IP licenses could eventually be used to protect against the misuse and misappropriation of basic scientific research. This would be done by translating existing OSS licenses to protect scientific research. Some efforts are already paying dividends in areas such as scientific publishing, evidenced by the growing number of OA journals. However, the process of translating software licenses to areas other than publishing has been more difficult. OSS and open access licenses work best with works subject to copyright protection because copyright subsists in an original work as soon as it is created. However, it has been more difficult to generate a license that covers patented works because patents are only awarded through a lengthy application and registration process. If the open science experiment is to work, it needs the intervention of the legal community to draft new licenses that may apply to scientific research. This work will look at the issue of such open science licenses, paying special care as to how the system can best be exported to scientific research based on OSS and OA ideals.

    • looks really interesting get a copy

Michael Woelfle, Piero Olliaro & Matthew H. Todd. (2011), Open science is a research accelerator. Commentary, Nature Chemistry, 3, 745–748, doi:10.1038/nchem.1149

  • Synopsis

An open-source approach to the problem of producing an off-patent drug in enantiopure form serves as an example of how academic and industrial researchers can join forces to make new scientific discoveries that could have a huge impact on human health. This Commentary describes a case study — a chemical project where open-source methodologies were employed to accelerate the process of discovery.

Andrea Bonaccorsi, Silvia Giannangeli and Cristina Rossi (2006),Entry strategies under competing standards: Hybrid business models in the open source software industry,Management Science, Vol. 52, No. 7, pp. 1085-1098

  • Abstract

The paper analyses the entry strategies of software firms that adopt the Open Source production model. A new definition of business model is proposed. Empirical evidence, based on an exploratory survey taken on 146 Italian software firms shows that firms adapted to an environment dominated by incumbent standards by combining Open Source and proprietary software. The paper examines the determinants of business models and discusses the stability of hybrid models in the evolution of the industry.

Jeffrey A. Roberts, Il-Horn Hann and Sandra A. Slaughter. (2006), Understanding the Motivations, Participation, and Performance of Open Source Software Developers: A Longitudinal Study of the Apache Projects. Management Science, Vol. 52, No. 7, pp. 984-999 DOI: 10.1287/mnsc.1060.0554

  • Abstract

Understanding what motivates participation is a central theme in the research on open source software (OSS) development. Our study contributes by revealing how the different motivations of OSS developers are interrelated, how these motivations influence participation leading to performance, and how past performance influences subsequent motivations. Drawing on theories of intrinsic and extrinsic motivation, we develop a theoretical model relating the motivations, participation, and performance of OSS developers. We evaluate our model using survey and archival data collected from a longitudinal field study of software developers in the Apache projects. Our results reveal several important findings. First, we find that developers' motivations are not independent but rather are related in complex ways. Being paid to contribute to Apache projects is positively related to developers' status motivations but negatively related to their use-value motivations. Perhaps surprisingly, we find no evidence of diminished intrinsic motivation in the presence of extrinsic motivations; rather, status motivations enhance intrinsic motivations. Second, we find that different motivations have an impact on participation in different ways. Developers' paid participation and status motivations lead to above-average contribution levels, but use-value motivations lead to below-average contribution levels, and intrinsic motivations do not significantly impact average contribution levels. Third, we find that developers' contribution levels positively impact their performance rankings. Finally, our results suggest that past-performance rankings enhance developers' subsequent status motivations.