Project14 | Vision Thing: Build Things Using Graphics, AI, Computer Vision, & Beyond!

Enter Your Project for a chance to win an Oscilloscope Grand Prize Package for the Most Creative Vision Thing Project!

Back to The Project14 homepage

Project14 Home

Monthly Themes

Monthly Theme Poll

In the Comments Below: What Are Your Ideas for Vision Thing Projects?

We're Giving Away Beaglebone AI Boards for Project Proposals that Use Them!

The Most Creative Vision Thing Project Wins a Keysight DSOXO11G Oscilloscope!

Buy NowBuy Now

We are offering up to 20 FREE Beaglebone AI Boards in exchange for Vision Thing projects that use them!

Beaglebone AI Cooling Cape Addon Available from mayermakes: BB AI cooling Addon board available

The theme this month is Vision Thing and it comes from suggestions from dougw , vimarsh_ , and aabhas. We're also challenging you to Project14 | Vision Thing: Beaglebone AI Your Vision Thing Project! There's a lot of variety with how you choose to implement your project. It's a great opportunity to do something creative that stretches the imagination of what hardware can do. Your project can be either a vision based project involving anything that is related to Computer Vision and Machine Learning , Camera Vision and AI based projects, Deep Learning, using hardware such as the Nvidia Jetson Nano, Pi with Intel Compute Stick, Edge TPU, etc. as vimarsh_ and aabhas suggested. Or, it can be a graphics project involving something graphical such as adding a graphical display to a microcontroller, image processing on a microcontroller, image recognition interface a camera to a microcontroller, or FPGA - camera interfacing/image processing/graphical display as dougw suggested. While this is an intimidating subject, it's a great learning opportunity and you can use the latest cutting edge to create something beautiful. The grand prize for this competition is a Keysight DSOX011G Oscillocope. It will be awarded for the most creative Vision Thing project. The Oscilloscope as you will see has played a prominent role in the history of computer graphics. Also, 3 First Place winners will win a Beagleboard Blue plus a $100 Shopping Cart for their project!

The Most Creative Vision Thing Wins a Keysight DSOXO11G Oscilloscope!

This theme is open ended and explores how all these different technologies are interconnected. Artificial Intelligence (AI) is a broad term that includes both Machine Learning (ML) and Deep Learning (DL). AI involves any technique that enables computers to mimic human behavior. Machine Learning is a subset of Artificial Intelligence, consisting of more advanced techniques and models to enable computers to figure out things from data and deliver AI applications. Deep learning is a subset of Machine Learning that makes the computation of multi-layer neural networks possible, delivering high accuracy in tasks such as speech recognition, language translation, object detection, and many other breakthroughs. The CPU (central processing unit) has often been called the brains of the PC. Increasingly, that brain is being enhanced by another part of the PC, the GPU. The GPU goes well beyond basic graphics controller functions, and is a programmable and powerful computational device in its own right. While the GPU’s advanced capabilities were originally used primarily for 3D game rendering, those capabilities are being harnessed more broadly to accelerate computational workloads. Computer Vision is a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. It could broadly be considered a subset of AI and machine learning.

A vision processing unit (VCU) is an emerging class of microprocessors that aims to accelerate machine learning and artificial intelligence technologies. The Intel Neural Compute stick is an example of a piece of hardware that you can add to the Raspberry Pi to give it this type of processing power. Vision processing units are distinct from video processing units (which are specialized for video encoding and decoding) in their suitability for running machine vision algorithms such as CNN (convolutional neural networks), SIFT (Scale-invariant feature transform) and similar. They are distinct from GPUs, which contain specialized hardware for rasterization and texture mapping (for 3D graphics), and whose memory architecture is optimised for manipulating bitmap images in off-chip memory (reading textures, and modifying frame buffers, with random access patterns). The Embedded Vision/Vector Engine (EVE) is a specialized fully programmable processor to accelerate computer vision algorithms. The architecture’s principal aim is to enable low latency, low power, and high performance vision algorithms in cost sensitive embedded markets. EVE’s memory architecture is unique and differentiated relative to standard processor architectures, allowing for a high degree of sustained internal memory bandwidth for compute intensive algorithms. The presence of custom pipelines and units, allows for accelerating and harnessing the high levels of data parallelism found in computer vision algorithms.

{tabbedtable} Tab Label	Tab Content
Graphics and Image Processing	The Oscilloscope Creates the First Computer Graphics In 1950, Ben F. Laposky, created the first computer graphic using an electronic (analog) machine, an Oscilloscope. His electronic oscilloscope imagery was produced by manipulating electronic beams that were displayed across the fluorescent face of the oscilloscope's cathode-ray tube and then recorded onto high speed film using special lenses. He later added tinted filters to imbue the photographs with striking colors in 1957. He would set up as many as 70 controls on up to 60 oscilloscopes to create his designs. The resulting mathematical curves were similar to the lissajous wave form and basic waves were manipulated to create elegantly rhythmic designs called "oscillons", in a process he described as "analogous to the production of music by an orchestra." Oscillion photographs were often accompanied by electronic synth music that were made popular by Robert Moog, a contemporary of Laposky. Laposky's musical art was published at more than 200 exhibitions in the US and abroad before the emergence of computer graphics in the mid-1960s. "Electronic Abstractions are a new kind of abstract art. They are beautiful design compositions formed by the combination of electrical wave forms as displayed on a cathode-ray oscilloscope. The exhibit consists of 50 photographs of these patterns . A wide variety of shapes and textures is included. The patterns all have an abstract quality, yet retain a geometrical precision . They are related to various mathematical curves, the intricate tracings of the geometric lathes and pendulum patterns, but show possibilities far beyond these sources of design." — Sanford Museum, Gallery notes for Electronic Abstractions, 1952 The Whirlwind, first demonstrated in 1951, was the first computer capable of displaying real time text and graphics, using a large oscilloscope screen. Development of the Whirlwind began in 1945 under the leadership of Jay Forrester at MIT, as part of the Navy’s Airplane Stability and Control Analyzer (ASCA) project. Whirlwind received positional data related to an aircraft from a radar station in Massachusetts. The Whirlwind programmers had created a series of data points, displayed on the screen, that represented the eastern coast of Massachusetts, and when data was received from radar, a symbol representing the aircraft was superimposed over the geographic drawing on the screen of a CRT. In the early 50s, Robert Everett designed an input device, which was called a light gun or light pen, to give the operators a way of requesting identification information about the aircraft. When the light gun was pointed at the symbol for the plane on the screen, an event was sent to Whirlwind, which then sent text about the plane’s identification, speed and direction to also be displayed on the screen. The Whirlwind computer was ultimately adopted by the U.S. Air Force for use in its new SAGE (Semi-Automatic Ground Environment) air defense system, which became operational in 1958 with more advanced display capabilities. The oscilloscope was also used to create the world's first interactive video game, Tennis for Two. It created on an oscilloscope by William Higinbotham in 1958 and used to entertain guests at the Brookhaven National Laboratory by simulating a tennis game. It was not largely unknown outside of research or academic settings. Three years later, Steve Russel, a student at MIT, invented SpaceWars, on a PDP-1 and because it was the first game to get mainstream success, it is popularly referred to as the first video game, despite the fact that Tennis for Two came out first. The first CRT display was a converted oscilloscope used to play SpaceWar. The first trackball (and thus, the first mouse) was a SpaceWar control at MIT. It is said that Ken Thompson salvaged a PDP-1 and created a new operating system, now called UNIX, so that he could play SpaceWar. The Rise of Graphics Processing Units (GPU) A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. It is specifically designed to perform the complex mathematical and geometric calculations necessary to render graphics. In 1994, Sony coined the term GPU for its PlayStation to describe the Toshiba designed 32-bit chip used to handle graphics, control of frame buffer, and drawing of polygons and textures. In 1999, NVIDIA popularized the term GPU as an acronym for graphics processing unit. The foundation for what we know as 3D graphics was laid in the latter half of the 1970s. The first Atari computers, the 8-bit Atari 400 and Atari 800, introduced special integrated circuits for the display and acceleration of 2D graphics. ANTIC processed 2D display instructions using direct memory access (DMA). Like most video co-processors, it could generate playfield graphics (background, title screens, scoring display), while the CTIA generated colors and moveable objects. CTIA was later replaced by GTIA (George's Television Interface Adapter). Jay Miner, who designed the ANTIC and CTIA, later led chip development for the Commodore Amiga. The Amiga was the first mass-produced computer equipped with a special 2D accelerator (called blitter). In 1984 IBM introduced its first GPU called Professional Graphics Controller (PGC) or Professional Graphics Adapter (PGA). In essence, it was basically an expansion card that could accelerate 3D graphics as well as 2D graphics. It consisted of three separate boards that were connected together, and it had its own CPU along with dedicated RAM (an Intel 8088 CPU and 320KB RAM). The PGC supported resolutions of up to 640 x 480 pixels, with 256 colors simultaneously shown on the display and a refresh rate of 60 frames per second. Its price was $4,290 when it was first introduced. This specific GPU didn't manage to achieve notable commercial success, however, the PGC is still considered an important milestone in the history of GPUs. In 1985 Amiga revolutionized the graphics market with its advanced design and circuitry. The specially designed boards that fully handled the creation and acceleration of graphics in Amiga not only relieved the CPU from this task, but also offered the home computer very high graphics capabilities. You could say the Commodore Amiga was one of the first commercial computers equipped with what is now considered a GPU. Later, he fifth generation gaming consoles, PlayStation and Nintendo 64, were both equipped with 3D GPUs. In 1999 Nvidia introduced the successor to the RIVA, TNT2. The GeForce 256 supported the Transform and Lighting (T&L) engine and took the burden off of the main CPU for the creation of complex graphics effects. The GeForce 256 was significantly faster than the previous generation, with the performance difference reaching 50 percent in most games. In addition, the GPU was the first to fully support Direct3D. The integration of the T&L engine in GPUs allowed Nvidia to enter the professional CAD market as well, with the professional Quadro GPU line. The modern era of GPU began in 2007 with both Nvidia and ATI (since acquired by AMD) packing graphics cards with ever-more capabilities. The two companies took different tracks to general purpose computing GPU (GPGPU). In 2007, Nvidia released its CUDA development environment, the earliest widely adopted programming model for GPU computing. Two years later, OpenCL became widely supported. This framework allows for the development of code for both GPUs and CPUs with an emphasis on portability. Thus, GPUs became a more generalized computing device. In 2010, Nvidia collaborated with Audi. They used Tegra GPUs to power the cars’ dashboard and increase navigation and entertainment systems. These advancements in graphics cards in vehicles pushed self-driving technology. You don't have permission to edit metadata of this video. Edit media Dimensions x Subject (required) Brief Description Tags (separated by comma) Video visibility in search results Parent content Poster Upload Preview The CPU (central processing unit) has often been called the brains of the PC. Increasingly, that brain is being enhanced by another part of the PC, the GPU. The GPU goes well beyond basic graphics controller functions, and is a programmable and powerful computational device in its own right. While the GPU’s advanced capabilities were originally used primarily for 3D game rendering, those capabilities are being harnessed more broadly to accelerate computational workloads in areas such as financial modeling, cutting-edge scientific research and oil and gas exploration.
Artificial Intelligence, ML, and DL	Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) Artificial Intelligence (AI) is a broad term that includes both Machine Learning (ML) and Deep Learning (DL). AI involves any technique that enables computers to mimic human behavior. During the dawn of computing in 1950, Alan Turing proposed a thought experiment known as the Turing Test which was a method of inquiry in Artificial Intelligence (AI) to determine whether or not a computer was capable of thinking like a human being. He suggested that it was. If humans were able to use available information and reason to solve problems then why wouldn't a machine be able to do the same thing? In his paper entitled "Computing Machinery and Intelligence" in which he discusses building intelligent machines and how to test their intelligence. What stopped him from following through with his hypothesis was the technology of the the time had not caught up with his ideas. Before 1949, computers could not store commands, they could only execute them. The proof of concept for Turing's thesis was funded by Research and Development (RAND) corporation and initialized five years later with the program Logic Theorist by Allen Newell, Cliff Shaw, and Herbert A. Simon. This program was considered the first AI program as it was designed to mimic the problem solving skills of humans. The term "AI" was not coined till 1955 by John McCarthy. McCarthy who along with Marvin Minsky hosted the Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI) in 1956 where the Logic Theorist was presented during an open ended discussion with top researchers. Alan Turing, Marvin Minsky, John McCarthy, Allen Newell, and Herbert A. Simon are widely considered to be the "founding fathers" of AI. Although, you'll sometimes hear the terms AI used interchangeably with Machine Learning and Deep Learning, they are not the same thing. Machine Learning is a subset of Artificial Intelligence, consisting of more advanced techniques and models to enable computers to figure out things from data and deliver AI applications. It has been described as the science of getting a computer to act without being explicitly programmed. An Artificial neural networks (ANN) is one of the main tools used in machine learning. It is a computational model that is based on the structure and functions of biological neural networks and it is intended to replicate the way humans learn. While the concepts such as deep learning are relatively new, the mathematical theory behind them dates back to 1943, even before Turing Test, and the work of Warren McCulloch and Walter Pitts. It was then that they created the first mathematical model of a neural network in their publication, "A Logical Calculus of Ideas Immanent in Nervous Activity", where they proposed a combinations of mathematics and algorithms aimed at mimicking human thought processes. To understand how this works, requires a vast over simplification, and not entirely accurate understanding of how a biological neuron works. This is ok because on a higher level, this is what more or less happens. A neuron receives input from dendrite(s), processes it (similar to a CPU) using its soma, and passes the output through its axon (similar in shape and function to a cable) to the synapse which is the point of connection to other neurons. McCulloch-Pitts neurons is still the standard, even as it has evolved past its original limitations. One of those evolutions, that allowed the original neural network to "learn" was the concept of the perceptron, which was invented by Frank Rosenblatt in 1958 at the Cornell Aeronautical Laboratory and funded by the United States Office of Naval Research. The perceptron was intended to be a machine, rather than a program. It was first implementation as software for the IBM 704 but subsequently implemented in custom-built hardware as the "Mark 1 perceptron". Iwas designed for image recognition: it consisted of an array of 400 photocells, randomly connected to the "neurons". Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors. You don't have permission to edit metadata of this video. Edit media Dimensions x Subject (required) Brief Description Tags (separated by comma) Video visibility in search results Parent content Poster Upload Preview Just as Machine Learning is a subset of AI, Deep Learning is a subset of Machine Learning. Deep learning is a subset of Machine Learning that makes the computation of multi-layer neural networks possible, delivering high accuracy in tasks such as speech recognition, language translation, object detection, and many other breakthroughs. Deep learning can automatically learn/extract/translate the features data sets such as images, videos, or text without introducing hand-coded code or rules.
ML, DL, and Computer Vision	Teaching Computers to "See" and "Understand" Computer Vision is a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. It could broadly be considered a subset of AI and machine learning. The world we live in full of cameras and video. Nearly everyone has smartphones with a camera that they can use to take pictures and post them on Instagram, Facebook, or YouTube. YouTube may be the second largest search engine and with it, hundreds of hours of video are uploaded every minute, billions of videos are watched everyday. The Internet is comprised of text and images. While indexing and searching text are straightforward, to index and search images requires algorithms that know what images contain. For a long time, indexing images for search was dependent on the meta descriptions of the person that uploaded them. The goal of computer vision is to understand the content of digital images. This involves developing methods that attempt to reproduce the capability of human vision. You'll need to get machines to see and understand the content of digital images and extract a description for it. Computer vision is the automated extraction of information from images. Information can mean anything from 3D models, camera position, object detection and recognition to grouping and searching image content. There's also the complexity inherent in the visual world. Computer vision remains one of the most popular applications of artificial intelligence. Computer vision-based AI techniques includes image classification, object detection and object segmentation. It is used for everything from face recognition-based user authentication to inventory tracking in warehouses to vehicle detection on roads. Computer vision uses advanced neural networks and deep learning algorithms such as Convolutional Neural Networks (CNN), Single Shot Multibox Detector (SSD) and Generative Adversarial Networks (GAN). Applying these algorithms requires a thorough understanding of neural network architecture, advanced mathematics and image processing techniques. Computer Vision and Machine Learning Machine learning and computer vision are two fields that have become closely associated with one another. Machine learning has been effectively used in computer vision for acquisition, image processing, and object focusing. Computer vision can be broken down into something that involves a digital image or a video, a sensing device, an interpreting device, and the interpretation. Machine learning come into focus during the interpreting device and interpreting stages. Analysis of digital recordings is done using machine learning techniques. For the average Machine Learning developer, CNN remains a complex branch of AI. Apart from the knowledge and understanding of algorithms, CNNs demand high end, expensive infrastructure for training the models, which is out of reach for most of the developers. Even after managing to train and evaluate the model, developers find model deployment as a challenge. Trained CNN models are often deployed in edge devices that don’t have the required resources to perform inferencing - the process of classification and detection of images at run-time. Innovation Edge devices are complemented by purpose-built AI chips that accelerate inferencing which come with their own software drivers and an interfacing layer. Microsoft and Qualcomm have partnered to simplify training and deploying computer vision-based AI models with their Vision AI Developer Kit. . Developers can use Microsoft’s cloud-based AI and IoT services on Azure to train models while deploying them on the smart camera edge device powered by a Qualcomm’s AI accelerator. Recognition in Computer Vision Recognition in computer vision involves object recognition, identification, and detection. Some of the specialized tasks of recognition include optical character recognition, image retrieval, and facial recognition. Object Recognition – most commonly applied to face detection and recognition, this involves finding and identifying objects in a digital image or video. This can be approached by computer vision through either machine learning or deep learning. Machine Learning Approach – When Object recognition in machine learning requires you to define features before classification. This is commonly done through scale-invariant feature transform (SIFT) where SIFT uses key points of objects and stores them in a database. When the image is categorized, SIFT checks key points of the image with matches with found in a database. Deep Learning Approach – this does not require features to be specifically defined. A common approach here is through convolutional neural networks. A convolutional neural network (CNN) is a type of a deep learning algorithm which takes in an input image, assigns importance (learnable weights and biases), to various aspects/objects in an image to differentiate from one another. It is inspired by the biological neural network in the brain. ImageNet, a visual database designed for object recognition, is the best example of this. Its performance is said to be close to that of humans. Motion Analysis in Computer Vision Motion Analysis in computer vision involves a digital video that is processed to produce information. Simple processing can detect motion of an object. More complex processing tracks an object over time and can determine the direction of the motion. It has applications in motion capture, sports, and gait analysis. Motion capture – (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. Markers are worn near joints to identify motion. It has applications in animation, sports, military, roboticscomputer vision, and gait analysis. Typically, visual appearance is not included and only the movements of the actors are recorded. Motion Capture was used in Star Wars by Andy Serkis for Supreme Leader Snoke and Lupita Nyong for Maz Kanata. Gait analysis – involves the systematic study of animal locomotion, more specifically the study of human motion, using the eye and the brain of observers, augmented by instrumentation for measuring body movements, body mechanics, and the activity of the muscles. A typical gait analysis laboratory has several cameras (video or infrared) placed around a walkway or a treadmill, which are linked to a computer. The subject wears markers at various reference points of the body and as they move, a computer calculates the trajectory of each marker in three dimensions. It can be applied in sports biomechanics. You don't have permission to edit metadata of this video. Edit media Dimensions x Subject (required) Brief Description Tags (separated by comma) Video visibility in search results Parent content Poster Upload Preview Computer vision is used in sports to improve broadcast experience, athlete training, analysis and interpretation, and decision making. Video tracking and object recognition are ideal for tracking the movement of players. Motion analysis methods are also used to assist in motion tracking. Deep learning using convolutional neural networks is used to analyze the data. It is also used in autonomous vehicles such as a self-driving car. Cameras are placed on top of the car to provide 360 degrees field of vision for up to 250 meters of range. The cameras aid in lane finding, road curvature estimation, obstacle detection, traffic sign detection, and much more.

{tabbedtable} Tab Label

Tab Content

Graphics and Image Processing

The Oscilloscope Creates the First Computer Graphics

In 1950, Ben F. Laposky, created the first computer graphic using an electronic (analog) machine, an Oscilloscope. His electronic oscilloscope imagery was produced by manipulating electronic beams that were displayed across the fluorescent face of the oscilloscope's cathode-ray tube and then recorded onto high speed film using special lenses. He later added tinted filters to imbue the photographs with striking colors in 1957. He would set up as many as 70 controls on up to 60 oscilloscopes to create his designs. The resulting mathematical curves were similar to the lissajous wave form and basic waves were manipulated to create elegantly rhythmic designs called "oscillons", in a process he described as "analogous to the production of music by an orchestra."

Oscillion photographs were often accompanied by electronic synth music that were made popular by Robert Moog, a contemporary of Laposky. Laposky's musical art was published at more than 200 exhibitions in the US and abroad before the emergence of computer graphics in the mid-1960s.

"Electronic Abstractions are a new kind of abstract art. They are beautiful design compositions formed by the combination of electrical wave forms as displayed on a cathode-ray oscilloscope. The exhibit consists of 50 photographs of these patterns . A wide variety of shapes and textures is included. The patterns all have an abstract quality, yet retain a geometrical precision . They are related to various mathematical curves, the intricate tracings of the geometric lathes and pendulum patterns, but show possibilities far beyond these sources of design." — Sanford Museum, Gallery notes for Electronic Abstractions, 1952

The Whirlwind, first demonstrated in 1951, was the first computer capable of displaying real time text and graphics, using a large oscilloscope screen. Development of the Whirlwind began in 1945 under the leadership of Jay Forrester at MIT, as part of the Navy’s Airplane Stability and Control Analyzer (ASCA) project. Whirlwind received positional data related to an aircraft from a radar station in Massachusetts. The Whirlwind programmers had created a series of data points, displayed on the screen, that represented the eastern coast of Massachusetts, and when data was received from radar, a symbol representing the aircraft was superimposed over the geographic drawing on the screen of a CRT.

In the early 50s, Robert Everett designed an input device, which was called a light gun or light pen, to give the operators a way of requesting identification information about the aircraft. When the light gun was pointed at the symbol for the plane on the screen, an event was sent to Whirlwind, which then sent text about the plane’s identification, speed and direction to also be displayed on the screen. The Whirlwind computer was ultimately adopted by the U.S. Air Force for use in its new SAGE (Semi-Automatic Ground Environment) air defense system, which became operational in 1958 with more advanced display capabilities. The oscilloscope was also used to create the world's first interactive video game, Tennis for Two. It created on an oscilloscope by William Higinbotham in 1958 and used to entertain guests at the Brookhaven National Laboratory by simulating a tennis game. It was not largely unknown outside of research or academic settings. Three years later, Steve Russel, a student at MIT, invented SpaceWars, on a PDP-1 and because it was the first game to get mainstream success, it is popularly referred to as the first video game, despite the fact that Tennis for Two came out first. The first CRT display was a converted oscilloscope used to play SpaceWar. The first trackball (and thus, the first mouse) was a SpaceWar control at MIT. It is said that Ken Thompson salvaged a PDP-1 and created a new operating system, now called UNIX, so that he could play SpaceWar.

The Rise of Graphics Processing Units (GPU)

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. It is specifically designed to perform the complex mathematical and geometric calculations necessary to render graphics. In 1994, Sony coined the term GPU for its PlayStation to describe the Toshiba designed 32-bit chip used to handle graphics, control of frame buffer, and drawing of polygons and textures. In 1999, NVIDIA popularized the term GPU as an acronym for graphics processing unit.

The foundation for what we know as 3D graphics was laid in the latter half of the 1970s. The first Atari computers, the 8-bit Atari 400 and Atari 800, introduced special integrated circuits for the display and acceleration of 2D graphics. ANTIC processed 2D display instructions using direct memory access (DMA). Like most video co-processors, it could generate playfield graphics (background, title screens, scoring display), while the CTIA generated colors and moveable objects. CTIA was later replaced by GTIA (George's Television Interface Adapter). Jay Miner, who designed the ANTIC and CTIA, later led chip development for the Commodore Amiga. The Amiga was the first mass-produced computer equipped with a special 2D accelerator (called blitter).

In 1984 IBM introduced its first GPU called Professional Graphics Controller (PGC) or Professional Graphics Adapter (PGA). In essence, it was basically an expansion card that could accelerate 3D graphics as well as 2D graphics. It consisted of three separate boards that were connected together, and it had its own CPU along with dedicated RAM (an Intel 8088 CPU and 320KB RAM). The PGC supported resolutions of up to 640 x 480 pixels, with 256 colors simultaneously shown on the display and a refresh rate of 60 frames per second. Its price was $4,290 when it was first introduced. This specific GPU didn't manage to achieve notable commercial success, however, the PGC is still considered an important milestone in the history of GPUs.

In 1985 Amiga revolutionized the graphics market with its advanced design and circuitry. The specially designed boards that fully handled the creation and acceleration of graphics in Amiga not only relieved the CPU from this task, but also offered the home computer very high graphics capabilities. You could say the Commodore Amiga was one of the first commercial computers equipped with what is now considered a GPU. Later, he fifth generation gaming consoles, PlayStation and Nintendo 64, were both equipped with 3D GPUs. In 1999 Nvidia introduced the successor to the RIVA, TNT2. The GeForce 256 supported the Transform and Lighting (T&L) engine and took the burden off of the main CPU for the creation of complex graphics effects. The GeForce 256 was significantly faster than the previous generation, with the performance difference reaching 50 percent in most games. In addition, the GPU was the first to fully support Direct3D. The integration of the T&L engine in GPUs allowed Nvidia to enter the professional CAD market as well, with the professional Quadro GPU line.

The modern era of GPU began in 2007 with both Nvidia and ATI (since acquired by AMD) packing graphics cards with ever-more capabilities. The two companies took different tracks to general purpose computing GPU (GPGPU). In 2007, Nvidia released its CUDA development environment, the earliest widely adopted programming model for GPU computing. Two years later, OpenCL became widely supported. This framework allows for the development of code for both GPUs and CPUs with an emphasis on portability. Thus, GPUs became a more generalized computing device. In 2010, Nvidia collaborated with Audi. They used Tegra GPUs to power the cars’ dashboard and increase navigation and entertainment systems. These advancements in graphics cards in vehicles pushed self-driving technology.

The CPU (central processing unit) has often been called the brains of the PC. Increasingly, that brain is being enhanced by another part of the PC, the GPU. The GPU goes well beyond basic graphics controller functions, and is a programmable and powerful computational device in its own right. While the GPU’s advanced capabilities were originally used primarily for 3D game rendering, those capabilities are being harnessed more broadly to accelerate computational workloads in areas such as financial modeling, cutting-edge scientific research and oil and gas exploration.

Artificial Intelligence, ML, and DL

Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL)

Artificial Intelligence (AI) is a broad term that includes both Machine Learning (ML) and Deep Learning (DL). AI involves any technique that enables computers to mimic human behavior. During the dawn of computing in 1950, Alan Turing proposed a thought experiment known as the Turing Test which was a method of inquiry in Artificial Intelligence (AI) to determine whether or not a computer was capable of thinking like a human being. He suggested that it was. If humans were able to use available information and reason to solve problems then why wouldn't a machine be able to do the same thing? In his paper entitled "Computing Machinery and Intelligence" in which he discusses building intelligent machines and how to test their intelligence. What stopped him from following through with his hypothesis was the technology of the the time had not caught up with his ideas. Before 1949, computers could not store commands, they could only execute them. The proof of concept for Turing's thesis was funded by Research and Development (RAND) corporation and initialized five years later with the program Logic Theorist by Allen Newell, Cliff Shaw, and Herbert A. Simon. This program was considered the first AI program as it was designed to mimic the problem solving skills of humans. The term "AI" was not coined till 1955 by John McCarthy. McCarthy who along with Marvin Minsky hosted the Dartmouth Summer Research Project on Artificial Intelligence (DSRPAI) in 1956 where the Logic Theorist was presented during an open ended discussion with top researchers. Alan Turing, Marvin Minsky, John McCarthy, Allen Newell, and Herbert A. Simon are widely considered to be the "founding fathers" of AI.

Although, you'll sometimes hear the terms AI used interchangeably with Machine Learning and Deep Learning, they are not the same thing. Machine Learning is a subset of Artificial Intelligence, consisting of more advanced techniques and models to enable computers to figure out things from data and deliver AI applications. It has been described as the science of getting a computer to act without being explicitly programmed. An Artificial neural networks (ANN) is one of the main tools used in machine learning. It is a computational model that is based on the structure and functions of biological neural networks and it is intended to replicate the way humans learn. While the concepts such as deep learning are relatively new, the mathematical theory behind them dates back to 1943, even before Turing Test, and the work of Warren McCulloch and Walter Pitts. It was then that they created the first mathematical model of a neural network in their publication, "A Logical Calculus of Ideas Immanent in Nervous Activity", where they proposed a combinations of mathematics and algorithms aimed at mimicking human thought processes. To understand how this works, requires a vast over simplification, and not entirely accurate understanding of how a biological neuron works. This is ok because on a higher level, this is what more or less happens. A neuron receives input from dendrite(s), processes it (similar to a CPU) using its soma, and passes the output through its axon (similar in shape and function to a cable) to the synapse which is the point of connection to other neurons. McCulloch-Pitts neurons is still the standard, even as it has evolved past its original limitations.

One of those evolutions, that allowed the original neural network to "learn" was the concept of the perceptron, which was invented by Frank Rosenblatt in 1958 at the Cornell Aeronautical Laboratory and funded by the United States Office of Naval Research. The perceptron was intended to be a machine, rather than a program. It was first implementation as software for the IBM 704 but subsequently implemented in custom-built hardware as the "Mark 1 perceptron". Iwas designed for image recognition: it consisted of an array of 400 photocells, randomly connected to the "neurons". Weights were encoded in potentiometers, and weight updates during learning were performed by electric motors.

Just as Machine Learning is a subset of AI, Deep Learning is a subset of Machine Learning. Deep learning is a subset of Machine Learning that makes the computation of multi-layer neural networks possible, delivering high accuracy in tasks such as speech recognition, language translation, object detection, and many other breakthroughs. Deep learning can automatically learn/extract/translate the features data sets such as images, videos, or text without introducing hand-coded code or rules.

ML, DL, and Computer Vision

Teaching Computers to "See" and "Understand"

Computer Vision is a field of study that seeks to develop techniques to help computers “see” and understand the content of digital images such as photographs and videos. It could broadly be considered a subset of AI and machine learning. The world we live in full of cameras and video. Nearly everyone has smartphones with a camera that they can use to take pictures and post them on Instagram, Facebook, or YouTube. YouTube may be the second largest search engine and with it, hundreds of hours of video are uploaded every minute, billions of videos are watched everyday. The Internet is comprised of text and images. While indexing and searching text are straightforward, to index and search images requires algorithms that know what images contain. For a long time, indexing images for search was dependent on the meta descriptions of the person that uploaded them. The goal of computer vision is to understand the content of digital images. This involves developing methods that attempt to reproduce the capability of human vision. You'll need to get machines to see and understand the content of digital images and extract a description for it.

Computer vision is the automated extraction of information from images. Information can mean anything from 3D models, camera position, object detection and recognition to grouping and searching image content. There's also the complexity inherent in the visual world.

Computer vision remains one of the most popular applications of artificial intelligence. Computer vision-based AI techniques includes image classification, object detection and object segmentation. It is used for everything from face recognition-based user authentication to inventory tracking in warehouses to vehicle detection on roads. Computer vision uses advanced neural networks and deep learning algorithms such as Convolutional Neural Networks (CNN), Single Shot Multibox Detector (SSD) and Generative Adversarial Networks (GAN). Applying these algorithms requires a thorough understanding of neural network architecture, advanced mathematics and image processing techniques.

Computer Vision and Machine Learning

Machine learning and computer vision are two fields that have become closely associated with one another. Machine learning has been effectively used in computer vision for acquisition, image processing, and object focusing. Computer vision can be broken down into something that involves a digital image or a video, a sensing device, an interpreting device, and the interpretation. Machine learning come into focus during the interpreting device and interpreting stages. Analysis of digital recordings is done using machine learning techniques.

For the average Machine Learning developer, CNN remains a complex branch of AI. Apart from the knowledge and understanding of algorithms, CNNs demand high end, expensive infrastructure for training the models, which is out of reach for most of the developers. Even after managing to train and evaluate the model, developers find model deployment as a challenge. Trained CNN models are often deployed in edge devices that don’t have the required resources to perform inferencing - the process of classification and detection of images at run-time. Innovation Edge devices are complemented by purpose-built AI chips that accelerate inferencing which come with their own software drivers and an interfacing layer. Microsoft and Qualcomm have partnered to simplify training and deploying computer vision-based AI models with their Vision AI Developer Kit. . Developers can use Microsoft’s cloud-based AI and IoT services on Azure to train models while deploying them on the smart camera edge device powered by a Qualcomm’s AI accelerator.

Recognition in Computer Vision

Recognition in computer vision involves object recognition, identification, and detection. Some of the specialized tasks of recognition include optical character recognition, image retrieval, and facial recognition.

Object Recognition – most commonly applied to face detection and recognition, this involves finding and identifying objects in a digital image or video. This can be approached by computer vision through either machine learning or deep learning.
- Machine Learning Approach – When Object recognition in machine learning requires you to define features before classification. This is commonly done through scale-invariant feature transform (SIFT) where SIFT uses key points of objects and stores them in a database. When the image is categorized, SIFT checks key points of the image with matches with found in a database.
- Deep Learning Approach – this does not require features to be specifically defined. A common approach here is through convolutional neural networks. A convolutional neural network (CNN) is a type of a deep learning algorithm which takes in an input image, assigns importance (learnable weights and biases), to various aspects/objects in an image to differentiate from one another. It is inspired by the biological neural network in the brain. ImageNet, a visual database designed for object recognition, is the best example of this. Its performance is said to be close to that of humans.

Motion Analysis in Computer Vision

Motion Analysis in computer vision involves a digital video that is processed to produce information. Simple processing can detect motion of an object. More complex processing tracks an object over time and can determine the direction of the motion. It has applications in motion capture, sports, and gait analysis.

Motion capture – (sometimes referred as mo-cap or mocap, for short) is the process of recording the movement of objects or people. Markers are worn near joints to identify motion. It has applications in animation, sports, military, roboticscomputer vision, and gait analysis. Typically, visual appearance is not included and only the movements of the actors are recorded. Motion Capture was used in Star Wars by Andy Serkis for Supreme Leader Snoke and Lupita Nyong for Maz Kanata.
Gait analysis – involves the systematic study of animal locomotion, more specifically the study of human motion, using the eye and the brain of observers, augmented by instrumentation for measuring body movements, body mechanics, and the activity of the muscles. A typical gait analysis laboratory has several cameras (video or infrared) placed around a walkway or a treadmill, which are linked to a computer. The subject wears markers at various reference points of the body and as they move, a computer calculates the trajectory of each marker in three dimensions. It can be applied in sports biomechanics.

Computer vision is used in sports to improve broadcast experience, athlete training, analysis and interpretation, and decision making. Video tracking and object recognition are ideal for tracking the movement of players. Motion analysis methods are also used to assist in motion tracking. Deep learning using convolutional neural networks is used to analyze the data. It is also used in autonomous vehicles such as a self-driving car. Cameras are placed on top of the car to provide 360 degrees field of vision for up to 250 meters of range. The cameras aid in lane finding, road curvature estimation, obstacle detection, traffic sign detection, and much more.

Your Chance to Win

Be Original	Stick to the Theme
You could come up with a clever name that make's your project memorable! This project is your baby! Part of the fun of bringing something new into the world is coming up with a name. Your project could introduce something new or that is not commercially available or affordable!	If you have an idea for a project that doesn't fit the current theme then submit your idea in the comments section of the monthly poll.
List the Steps	Submit Video Proof
Provide the steps you took to complete your project (text, video, or images). This could be a step by step how-to-guide, vlog, schematics, coding, napkin drawings, voice narration, or whatever you think will be useful!	If it doesn't work that's fine, this is more about the journey than the end product. A short video is all that is required but you can shoot as much video as you like. You are encouraged to be creative and have as much fun as possible!

Your Project Examples

Vision Thing
Traffic Predictor #5 - Machine Learning and Building a case for the kit	Spider-Man: Into the Maker-Verse
You don't have permission to edit metadata of this video. Edit media Dimensions x Subject (required) Brief Description Tags (separated by comma) Video visibility in search results Parent content Poster Upload Preview	You don't have permission to edit metadata of this video. Edit media Dimensions x Subject (required) Brief Description Tags (separated by comma) Video visibility in search results Parent content Poster Upload Preview

Vision Thing

Traffic Predictor #5 - Machine Learning and Building a case for the kit

Spider-Man: Into the Maker-Verse

Your Prizes

One Grand Prize Winner Wins a Keysight DSOX110G Oscilloscope!	Three First Place Winners Win a $100 Shopping Cart
One Grand Prize Winner Wins a Keysight DSOX110G Oscilloscope!	3 First Place Winners Win Beagleboard Blue + a $100 Shopping Cart!

Your Project, Your Ideas!

About Project14	Directions
Every month you'll have a new poll where you'll get to decide an upcoming project competition, based on your interests, that will take place a couple of months in advance. Themes are broad in scope so that everyone can participate regardless of skill set. What are Monthly Themes? Every month (around the 14th of each month) a new theme will be posted on Project14. Submit your ideas (proposals) for your projects to get feedback from the rest of the community. Submit a project entry in the Theme space once you start working on it. What are Monthly Theme Polls? Every month (around the 14th of each month) there is a project theme poll. Vote on which project competition you want to see for the following upcoming theme. The themes voted on during the previous poll decided the upcoming theme. If you submit an idea for a theme that is not used then it can still be used in a future poll. Themes comments and ideas from the comments section of the project theme poll.	Step 1: Log in or register on element14, it's easy and free. Step 2: Post in the comments section below to begin a discussion on your idea. Videos, pictures and text are all welcomed forms of submission. Step 3: Submit a blog post of your progress on your project by the end of the month. You are free to submit as many blog entries as you like until the beginning of the next theme. Be sure to include video proof of your project! Visit: Vision Thing or tag your project blog VisionThingCH You have until November 18th End of Day to submit your completed project! A jury consisting of your peers will judge project submissions!

About Project14

Directions

Every month you'll have a new poll where you'll get to decide an upcoming project competition, based on your interests, that will take place a couple of months in advance. Themes are broad in scope so that everyone can participate regardless of skill set.

What are Monthly Themes?

Every month (around the 14th of each month) a new theme will be posted on Project14.
Submit your ideas (proposals) for your projects to get feedback from the rest of the community.
Submit a project entry in the Theme space once you start working on it.

What are Monthly Theme Polls?

Every month (around the 14th of each month) there is a project theme poll.
Vote on which project competition you want to see for the following upcoming theme.
- The themes voted on during the previous poll decided the upcoming theme.
- If you submit an idea for a theme that is not used then it can still be used in a future poll.
Themes comments and ideas from the comments section of the project theme poll.

Step 1: Log in or register on element14, it's easy and free.

Step 2: Post in the comments section below to begin a discussion on your idea. Videos, pictures and text are all welcomed forms of submission.

Step 3: Submit a blog post of your progress on your project by the end of the month. You are free to submit as many blog entries as you like until the beginning of the next theme.

Be sure to include video proof of your project!

Visit: Vision Thing or tag your project blog VisionThingCH

You have until November 18th End of Day to submit your completed project!

A jury consisting of your peers will judge project submissions!

In the Comments Below: What Are Your Ideas for Vision Thing Projects?

Use Whatever Boards You Like but We're Giving Away Beaglebone AI Boards for Project Proposals that Use Them!

Attachments:

Project14_Vision_Thing_TermsandConditions.pdf

Top Comments

Parents

dubbie over 6 years ago

I have the desire to have a go at this, but sadly, seem to be lacking in the technical and programming capability. I'll have to have a think to see if there is anything I can come up with that I can actually do.

Dubbie
- Cancel
- Vote Up +7 Vote Down
- Sign in to reply
- More
- Cancel
shabaz over 6 years ago in reply to dubbie

Hi Dubbie,

Your projects are always super-interesting! An SBC might be a good way for doing cool video or graphics related stuff. With reduced programming, Python + PyGame could be an option. There's information here about how to do animation, and control via keyboard, in case it helps devise some project (it's written for Pi, but would work on BBB and BBB-AI too, and could be tested with Linux on a PC):
Working with Sprites: Building Street Fighter with the Pi
Another cool thing could be to use the new BBB-AI classification functionality, to control your robot projects : )
There's some kids toys that follow lines drawn on paper, changing direction when they see a certain colour, but the BBB-AI could take it further maybe (e.g. a train set that can actually see level crossings being down, or lego characters on the track : ) There's a hackster.io link to an example of classification that looks really impressive here: Any info on using the BeagleBone AI's Embedded Vision Engines?
- Cancel
- Vote Up +9 Vote Down
- Sign in to reply
- More
- Cancel
dubbie over 6 years ago in reply to shabaz

Shabaz,

Thanks for this, these are all good ideas. I have been thinking I might have to learn Python, everyone says it is easy (maybe it is). If I can get some Lego involved that would be great. Many possibilities to think about.

Dubbie
- Cancel
- Vote Up +8 Vote Down
- Sign in to reply
- More
- Cancel
Fred27 over 6 years ago in reply to dubbie

Why has nobody developed a system that will let you vacuum up Lego pieces and sort them by image recognition? Lego themselves announced it as a April Fool but it sounds plausible. Not sure I could do it, but those with more image processing experience might.
- Cancel
- Vote Up +11 Vote Down
- Sign in to reply
- More
- Cancel
genebren over 6 years ago in reply to Fred27

David,

Interesting idea. One of my industrial/medical projects involved building cell sorters. These devices would create a travel path for cells (in fluid), that would pass by a camera or other sensors to examine the cells and sort them (more of a go/no go test) and divert the cells (flow diverter or electrostatic attraction/repulsion) into different flows. Sort of the same idea, just on a larger scale.

Gene
- Cancel
- Vote Up +8 Vote Down
- Sign in to reply
- More
- Cancel
vimarsh_ over 6 years ago in reply to Fred27

I think it is possible.
Basically, I think we need a fixed camera like system where lego pieces are there on the floor and a vacuum robotic hand had that can move in x,y,z directions. The system will identify different lego brick (4x2, 3x2, 2x2,8x2, etc and all characters and wheels) and map its x,y,z direction in the video frame to the (i think) stepper motor contraption so it will move down pick it up and keep it in the appropriate bin.
I would say it is possible, though complicated it is very interesting and fun.
- Cancel
- Vote Up +6 Vote Down
- Sign in to reply
- More
- Cancel
dubbie over 6 years ago in reply to Fred27

What a great idea. Still, it might lead to more Lego being distributed around the flow, so for the fun of seeing it all vacuumed up again.

Dubbie
- Cancel
- Vote Up +4 Vote Down
- Sign in to reply
- More
- Cancel
weiwei2 over 6 years ago in reply to Fred27

I am a Lego fan and think to the sort lego is fun....haha
but I do think of other application
- Cancel
- Vote Up +4 Vote Down
- Sign in to reply
- More
- Cancel

Comment

weiwei2 over 6 years ago in reply to Fred27

I am a Lego fan and think to the sort lego is fun....haha
but I do think of other application
- Cancel
- Vote Up +4 Vote Down
- Sign in to reply
- More
- Cancel

Children

No Data