Synthetic Data and Generative AI

Welcome! Click any timeline box for multiple links documenting years of computing, machine learning, and AI. Uniquely framed to highlight generative AI and synthetic data. Math, demos, research, and more! The Timeline is best viewed from a computer, but all boxes are clickable from your device. 3D option is available on computer.

Click "Continue" to hide this box. You can come back to this intro box by clicking "ABOUT THIS TIMELINE" in your top menu bar. ;xNLx;;xNLx;The 'Special Box' on this timeline contains links to more resources: ;xNLx;- Synthetic data and code;xNLx;- How-to's for building your own site like this one;xNLx;- More source collections at multiple levels, across areas of interest;xNLx;;xNLx;Visit www.chalkboardwisdom.org/cyber-synth for guidance, documentation, and more resource materials. ;xNLx;;xNLx;Details: ;xNLx;1. Each box within the timeline is a self-contained learning module containing external links. ;xNLx;2. The lower right hand corner of the timeline has a circular tool icon. Click it to perform a keyword search on the timeline. For keyword hints, visit www.chalkboardwisdom.org.;xNLx;3. Stick to basic keywords as seen on the accompanying site page at chalkboardwisdom.org/cyber-synth;xNLx;4. The lower left of the screen has a circle that toggles 3d and 2d views. ;xNLx;5. Try "Game Mode" to quiz yourself on what year an event happened in this timeline!;xNLx;---;xNLx;The Designated Community of Interest (i.e. intended audience) is Grade 5 and above for basic comprehension., with a difficulty scalable from everyday user to advanced domain specialist. ;xNLx;;xNLx;This is a living timeline with twice yearly scheduled audits to refresh, modify, add, and remove data sources as needed. To report an error on this timeline, please visit chalkboardwisdom.org/about;xNLx;;xNLx;Thank you to University of Maryland DCIP Program, ECS Federal internal Alpha Theta Data group, and the innumerable sources of information curated into the space. I am humbled by the immensity of knowledge, curiosity, and innovation across the minds keen to use technology to benefit society. ;xNLx;

1943-04-04 15:34:10

1943: A Logical Calculus

A logical calculus is a formalization of a meaningful logical theory. The derivable objects of a logical calculus are interpreted as statements, formed from the simplest ones by means of propositional connectives and quantifiers. Logical calculus deals with validity and satisfiability rather than truth or falsity, which are at the root of formal systems.

1945-02-26 15:59:57

Rise of Alan Turing

Alan Turning, famed creator of the Turing Test, assisted during WWII with the Enigma, enabling German code deciphering. His Turing Test is also called "The Imitation Game" and assesses a machine's ability to exhibit intelligent behavior.

1948-01-01 00:00:01

N-Grams foundational research

Claude Shannon, considered "The Founder of Information Theory" was among the first to put some math calculations behind letters. He called them N-grams

1950-01-01 00:00:01

Terry Winograd's SHRDLU system

This first-in-kind system was built to respond to natural language commands was named before the normal keyboard, and line two was S-H-R-D-L-U!

1950-01-01 00:00:01

Computing Machinery and Intelligence

Alan Turing publishes this foundational paper, laying the groundwork for the concept of AI and language processing

1956-06-17 06:16:53

The Dartmouth Conference (Workshop)

Math guys formally meet up and geek out for a month. The result? Artificial Intelligence recognized as a Field of Study

1960-01-01 00:00:01

A "very general new theory" of Inductive Inference

What is "thinking" and what is the difference between a living thing and a thinking machine? Ray Solomonoff, the father of Algorithmic Probability, asked questions for which we are still seeking answers.

1962-11-05 03:41:57

ENIAC: The First Computer

Learn about the History of the computer (and the secret "computers") with another wonderful computer history-focused timeline!

1965-01-01 00:00:01

Karen Spärck Jones

Karen Spärck Jones contributed to fields of information retrieval and natural language processing fields, and later representation in AI.

1965-01-23 21:08:15

The Primal Sketch

David Marr publishes "Primal Sketch: On Computational Mechanisms for Perceptual Organization"

1968-03-08 22:24:04

The Art of Computer Programming

Donald Knuth was all about the bits and bytes in his famed published work, "The Art of Computer Programming" which helped set the stage for modern computing

1969-02-06 17:41:06

Preceptrons

Marvin Minsky and Seymour Papert publish a book exploring neural networks to set the foundations for "AI" moving forward

1980-02-01 23:00:10

DARPA Research Funding Ramps up

Government + Defense = $$$! DARPA's Strategic Computing Initiative was an ambitious beginning. The Defense Advanced Research Projects Agency DARPA has been a leader in generating groundbreaking research and development (R&D) that facilitated the advancement and application of rule-based and statistical-learning based AI technologies.

1980-04-10 12:14:56

Back Propagation

A key algorithm for training neural networks appears across many published works by David Rumelhart, Geoffrey Hinton, and Ronald Williams. 1982 paper linked, and her is a 1986 paper : https://www.nature.com/articles/323533a0

1992-06-22 14:32:30

Recommendations for you: GroupLens

Recommender systems emerged in the early 1990s, and you can still see GroupLens! But more often you'll see it in Netflix :)

1996-05-01 00:00:00

Ask Jeeves!

Later Ask.com, this natural language query (i.e., how we ask questions in full sentences) is more a product engine than a knowledge engine.

1997-05-01 00:00:01

IBM Deep Blue

The true power of computing, and the ever humble qualities of chess champions, were on display as a computer developed by IBM defeated chess world champion, Garry Kasparov.

1997-05-01 00:00:01

Stanford Parser

A full guide to the Stanford Parser, conjunction-junction style! To "Parse" means take clumps of words apart to quantify them and improve language modeling.

1997-07-03 18:39:37

Deep Learning is Published

Ian Goodfellow, Yoshua Bengio, and Aaron Courville publish an open source text ~15 years before GANs made their debut with Goodfellow as the face of the innovation

1999-02-01 23:00:10

The Automated Postal Recognition System

Zip Code Recognition using Backpropagation (pardon the 10 sec ad when you click) enabled faster mail delivery...win for USPS!

2005-06-06 14:07:15

Fortune's Formula eBook summary!

This wonderful book sample is a taste of the compilation of the historical figures and tech developments that enable such quantifiable predictions that can be used to "hack" the market

2005-11-08 21:45:59

Amazon Mechanical Turk

Amazon Mechanical Turk (MTurk) is a marketplace for completion of virtual tasks that requires human intelligence.

2006-04-08 08:03:31

Geoffrey Hinton

The resurgence of Back-Prop for deep learning! Geoffrey Hinton first presented his work on training deep neural networks at the NIPS (Neural Information Processing Systems) conference in 2006. This presentation marked a significant milestone in the resurgence of interest in deep learning techniques. Learn more about Hinton's work on Capsule Networks linked here.

2006-08-30 22:28:29

Dimensionality Reduction

Geoffrey Hinton, Ruslan Saladhutdinov, Josh Tenenbaum publish: Reducing the Dimensionality of Data with Neural Networks" Here's a recent application step-by-step: https://ujangriswanto08.medium.com/a-step-by-step-guide-to-blind-visualization-of-gene-expression-data-using-t-sne-7f92e8a7fcd4

2006-11-29 10:19:54

Google Translate Launches

With its initial launch in 2006, this Google service improved significantly over the ensuing10 years. See More for link to using it.

2010-06-17 04:21:50

Siri as we have come to know and love her

Siri is an integral part of Apple's ecosystem. It set the stage for the wider adoption of voice-based AI interactions in our daily lives. Link is the Apple release from 2011, introducing Siri!

2011-11-29 10:19:54

IBM Watson wins Jeopardy!

It might not come as any surprise that the same developers for Deep Blue would create a model to beat the Jeopardy champion!

2012-01-16 22:24:14

ImageNet Classification with Deep Convolutional Neural Networks

From the Authors, Alex Krizhevsky, Ilya Sutzkever, and Geoffrey Hinton, "A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

2013-04-01 23:00:20

Word2Vec

A "neural network" based model for word representation, revolutionizing language processing and powering subsequent advancements in large language models (LLMs)

2013-11-18 01:49:21

VAEs: Variational Inference

Diedrik Kingma and Max Welling publish Variational Inference and Deep Learning: A New Synthesis. This helps advance the field of VAEs and Bayesian Deep Learning

2014-01-01 00:00:01

HMM Hidden Markov Models

HMM's play a fundamental role in language models today, first introduced in 1960s, they have broad applicability across ML.

2014-01-01 00:00:01

Generative Adversarial Networks (GAN) Published

Ian Goodfellow et al., those behind 1997's Deep Learning, publish a framework for training generative models (GAN) using a game-theoretic approach.

2014-06-07 00:30:06

Synthetic Data: The early days and onwards

To augment the synthetic data contribution, linked is a good curated timeline of events about the rise of synth data!

2014-11-04 10:19:54

Alexa? Aaallleexxaa??

Amazon releases Alexa, an AI-powered virtual assistant. This integrates LLMs into consumer devices on a massive scale. As far as the name... Alexa has grown in popularity, with an estimated 3.71 people for every 100,000 Americans with the name.

2015-01-01 00:00:01

Overcoming Troubles with Language Models

Maybe GANs can help your LLM! A human must "make them fail" to strengthen language models. Thoughtful article.

2015-01-01 00:00:01

The Alan Turing Institute

AI was added in 2017 to the Institutes primary goal, and their resources contain abundant research, learning, and collaboration for every level.

2015-12-11 21:01:09

OpenAI is founded

Created to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return...Hence, "Open!" OpenAI’s co-chairs are Sam Altman and Elon Musk

2016-05-10 01:41:46

Tesla Autopilot

Tesla's Autopilot technology primarily relies on neural networks and deep learning algorithms, which are part of the broader field of artificial intelligence.

2017-11-19 09:28:35

ImageNet Challenges

ImageNet is a large database of over 14 million images that was designed for computer vision research. check out the ImageNet challenges here!

2018-01-01 00:00:01

The Transformers Arrive: GPT 1

Generative Pre-trained Transformer GPT makes its debut! A breakthrough in LLMs, GPT can generate coherent and contextually relevant text.

2018-07-02 10:47:20

Promising models for sign language AI

The researchers focus on integrating CNNs in a HMM because sign language is a hybrid form of language and images, there might be ways to elevate the improvements on communication via sign, because it draws on LLMs as well as computer vision, but includes gestures to convey sentiment and tone.

2018-08-28 06:43:15

NVIDIA StyleGAN

First proposed in this 2018 research, StyleGAN led to Contrastive-Language-Image-Pre-training (CLIP) models: a text-driven method that allows shifting a generative model to new domains, without having to collect even a single image.

2019-01-03 02:26:42

BERT (no Earnie, just Google)

Google Brain introduces Pre-training (the kindergarten version) of Deep Bidirectional Transformers for Language Understanding.

2020-01-03 01:17:23

Get image datasets!

Source for images to use when playing with GANs and other programmer toys

2020-02-01 19:46:03

GPT and COVID Come Together

In January, 2020, the first cases of COVID-19 were reported, and in the same month, OpenAI announces use of PyTorch to speed its own advancements. By September of 2020, both were in full swing!

2020-04-08 21:46:28

The Utility and Risk of Synthetic Data

GitHub documentation and datasets for using microdata to assess the risk and utility for use of synthetic datasets

2020-06-08 19:46:03

Turing Project from Microsoft

Microsoft releases it's own LLM to compete with GPT and contribute to the field as a whole

2020-06-23 22:45:43

Something to Consider: Instead of loss functions, amplify rewards for ML!

This relatively recent paper from a Swiss author discusses "upside-down reinforcement learning"

2020-08-01 18:30:38

What's the difference?

It's intimidating to "parse out" (pun intended) the meaning of all this. A helpful article and infographic is here.

2020-09-14 05:37:40

Anonos and Statice for Full-Service Synthetic

These two merged companies focus on providing custom, full-service synthetic datasets for multiple industries. This is on the opposite end of the generative AI monetized spectrum; the other end being using your own data and a code notebook and going to town!