Contents
About the author vii
Preface ix
Introduction xi
1. Polymers 1
2. Compression of data 11
3. Natural language compression 17
4. Formal language compression 23
5. Types of compression program 29
6. Algorithmic compression 33
7. Chemical formulae 39
8. Fischer projection 43
9. Compression of polymers 47
10. Line notation systems and compression 51
11. Current trends in research 55
12. Big data 59
Conclusion 63
Summary 67
References 71
Appendices 79
Appendix A: A new foundation for information 81
Appendix B: Compression and geometric data 88
Appendix C: The analysis of binary, ternary and 92
quaternary based systems for communications
theory
Appendix D: The use of a radix 5 base for transmission 99
and storage of information
Appendix E: A comparison of a radix 2 and a radix 5 107
based systems
Appendix F: Random and non-random sequential 117
strings using a radix 5 based system
Appendix G: A comparison of compression values of 122
binary and ternary base systems
Appendix H: Patterns within pattern-less sequences 126
Appendix I: A radix 4 based system for use in theoretical 129
genetics
Appendix J: A compression program for chemical, 134
biological and nanotechnologies
Appendix K: Statistical physics and the fundamentals of 144
minimum description length and minimum
message length
Index 151
Preface
The monograph addresses the use of algorithmic complexity to perform compression on polymer strings to reduce the redundant quality while keeping the numerical quality intact. A description of the types of polymers and their uses are followed by a chapter on various types of compression systems that can be used to compress polymer chains into manageable units. The work is intended for graduate and post-graduate university students in the physical sciences and engineering.
Introduction
The book examines algorithmic compression techniques for use on polymer chains. Because polymer chains are a linear sequence of units that have properties of redundancies of sub-groups of common or like-natured units that can be compressed to save space and present a perceptual change for analysis of the whole linear sequential segment. The author has previous publications on the area of algorithmic complexity and has addressed their use to polymers in this monograph (Tice, 2009 and 2010).
The book is arranged into chapters that address polymers, chemical processes, algorithmic complexity and then applied aspects of algorithmic complexity to polymer chains. Each chapter is a self-contained section that addresses that particular topic and lends itself well to understanding the application and theory of polymer compression techniques.
The book addresses large chain and multiple sequence polymers that are currently found in daunting amounts in ‘Big Data’ and a chapter and extensive appendix sections are added to address the ‘real world’ problems of massive data sets of polymer information.
Polymers
A polymer is a chemical material, or materials, that consists of repeating structural components that are formed through the process of polymerization (Wikipedia, ‘Polymer’, 2013: 1). The word ‘polymer’ comes from the Greek words ‘poly’ to mean ‘many’ and ‘Zeros’ to mean ‘parts’ (Borchardt, 1997: 1230).
Shapes of polymer chains, molecules linked together to form a sequence of connected molecules, have several types of geometries beyond the linear polymer chains (Hiemenz and Lodge, 2007: 7). Branched and cross-linked polymer chains are common and arise from the ‘backbone’ of a linear molecule (Hiemenz and Lodge, 2007: 7). The amount of such polymer branching structure is a branching upon branching of a molecule will result in a network type of geometry that is termed cross linked (Hiemenz and Lodge, 2007: 8). Some multi-branched molecules have discrete units and are termed hyper-branched polymers and other multi-branched polymers known as dendrimers, or tree-like molecules (Hiemenz and Lodge, 2007: 8).
Co-polymers are repeating units of polymers that have more than one type of repeating polymer unit and a polymer chain that has only a single type of repeating polymer unit is termed homo-polymers (Hiemenz and Lodge, 2007: 9). Also a co-polymer is a series of monomers that repeat in a chain and are bounded by each of their original monomer states (Hiemenz and Lodge, 2007: 9). The tertiary structure of polymer is the ‘overall’ shape of a molecule, and formal polymer nomenclature uses the structure of the monomer or repeat unit as a system of identification by the IUPAC or the International Union of Pure and Applied Chemistry (Hiemenz and Lodge, 2007: 18).
Monomers and repeat units of monomers are the primary descriptive quality of polymers and are categorized in the nomenclature according to the type of structures involved in the monomers (Wikipedia, ‘Polymer’, 2013: 6). Single type of repeating monomers are known as homopolymers, while a mixture of repeat monomers are known as co-polymers (Wikipedia, ‘Polymer’, 2013: 6).