Speak the AI Language

Sep 3

"This article is a summarization of a 40-minute presentation addressing the first major hurdle for enterprises in Moving Forward with AI, the knowledge and communication gap."

Gaps between business and IT are common in transformation projects, but they become even more pronounced with artificial intelligence (AI). As a revolutionary and complex technology, AI can quickly become overwhelming for non-technical stakeholders. This article aims to bridge the knowledge gap by introducing key concepts and terminology, empowering decision-makers to better understand the potential and limitations of AI. With this knowledge, they can confidently drive AI strategy and collaborate more effectively with IT teams.

I. Artificial intelligence is a new advanced way of programming.

Since the introduction of computers in the 50’s, mathematicians and statisticians have been hard at work applying logic and probability theories to allow machines to perform tasks like a human with intelligence, e.g., acquire and apply knowledge, learn from experience, understand natural languages, recognize patterns, and make decisions. But it would take research breakthroughs since the 1980’s, and since the 2010’s, advances in processing and memory power (e.g., Graphic Processing Units by Nvidia) and big data on the internet, scientific and public databases, e-commerce and sensors, for researchers to finally be able to test their theories and improve techniques.

Unlike traditional computing that relies on explicit step-by-step codes to process data and output as instructed, AI computing uses algorithms to train AI models to identify patterns and process data to perform more complex and variable tasks that the traditional approach cannot handle, e.g., recommend movies based on preferences and behaviors.

II. Machines learn from algorithms processing training data to identify patterns for AI models to gain the ability to generalize with new data to solve a given problem.

I like to look at AI models as recipes, with AI engineers as the chefs who use algorithms to create a dish: the model’s performance. The quality of this “dish” depends heavily on the algorithmic design and the quality of the ingredients (data).

AI models are trained using large datasets to recognize patterns and relationships in the data. During training, the model processes data through multiple iterations. Initially, a training dataset is used to identify the underlying patterns and relationships in the data. Then, the remaining validation and testing datasets are used by AI engineers to fine tune the model's performance by adjusting internal parameters, gathering more data, or changing the model structure, until the models generalize well enough reaching a target accuracy level.

The quality and diversity of the training data is critical to help models generalize effectively across a range of scenarios. After the models are deployed, performance needs to be monitored constantly because the real-world data can degrade performance.

To enable AI/ML applications, companies are increasingly investing in new data technologies such as lakehouse, data mesh, graph databases and vector databases. I will discuss these in more detail in a future article.

III. Machine learning techniques continue to evolve to overcome limitations and improve efficiency.

Researchers have created various training paradigms to handle different types of data and overcome operational limitations. Supervised learning trains models with labeled data, e.g., an image is a dog or not a dog, and is suitable for classification, as in image recognition, and regression for prediction and diagnosis.

Unsupervised Learning paradigm trains models on unlabeled data by finding hidden patterns or intrinsic structures in the data using techniques such as clustering and association, expanding the scope of problems that AI can solve.

Semi-supervised paradigm combines a small amount of labeled data with a large amount of unlabeled data to reduce costs and shorten training time.

Reinforcement Learning (RL) trains models by receiving feedback of rewards or penalties through trial and error. By reinforcing strategies that lead to wins, RL models such as AlphaGo by DeepMind defeated the world champion in the game of Go in 2016.

Transfer Learning allows models to leverage knowledge learned from one task or domain, often through supervised learning, to adapt to a different but related task or domain, enhancing performance in situations where there is less data for the new task. It has been used in fine-tuning specific natural language processing tasks like sentiment analysis.

Recently, different learning paradigms have been combined to expand the capabilities of AI to solve even more complex problems.

IV. Machine learning is deeply rooted in statistical methods and data science.

AI engineers select algorithms from three categories with unique strengths and limitations depending on the nature of the problem and available data as outlined below.

Popular Algorithms by Category with Strengths and Limitations - summarized by Ichun

Classic Statistical Methods are simple to understand and execute with structured data but can only solve simple problems such as Linear Regression (baseline for numeric predictions), Time Series (observing data points over time to analyze trends, patterns), and logistic regression (used in categorical classification.)

Traditional machine learning can handle more complex problems such as optimization and decision tree, but struggles with unstructured data and can overfit, i.e., becoming too specific to the training dataset and capturing patterns that are not generalizable to other data.

Neural Network enables more complex problem-solving, esp. with unstructured data, but requires significant computing power and extensive data. This category is behind the powerful AI we see today.

Depending on the nature of the problem and the data available for the solution, the optimal solution often deploys multiple machine learning techniques from different categories.

V. Neural Network and Deep Neural Network (DNN)

Most AI models today adopt different forms of Neural Network architecture as inspired by the human brain. It consists of layers of nodes, called “neurons”, each performing a simple computation on its input.

As the graph shows, a neural network has an input layer and an output layer, and a few layers in between called hidden layers that process input data and produce output data. Any model with three or more layers is called Deep Neural Network (DNN).

Hidden layers are interconnected layers of neurons that take the outputs of the previous layer and apply weights and bias factors until the loss function, which measures the difference between predicted and actual results, is minimized to reach a target accuracy level. Each additional hidden layer enhances the model's ability to identify more intricate patterns and representations in the data, enabling the model to solve more complex problems.

However, the decision-making processes in complex neural networks are difficult to explain. This “black box” has been the key concern by regulators and businesses especially in regulated industries and drives ongoing research into explainable AI (XAI) which aims to make neural networks more interpretable.

VI. Recurrent Neural Network (RNN), Transformers, and Convolutional Neural Network (CNN)

Traditional neural networks process each input independently, making them ineffective for tasks where input order matters, like language processing or time series prediction. Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTMs) have addressed this by incorporating memory, allowing the network to retain and evolve information from previous inputs in a sequence and maintain a hidden state that evolves over time, capturing the dependencies between different time steps, thereby capturing dependencies across time steps.

Transformers improved on RNNs and LSTMs by allowing parallel processing, which is crucial for understanding context over long sequences, such as a lengthy paragraph of text, a series of time-stamped events in a financial market, or a long audio recording, more efficiently.

To handle spatial relationships, Convolutional Neural Network (CNN) was developed and used primarily for image recognition, object detection, and other tasks involving visual data.

As techniques continue to improve, models continue to evolve to handle even larger datasets and longer-range dependencies more efficiently.

Today’s AI domains of applications include:

Natural Language Processing (NLP): Chatbots, translation, content generation
Computer Vision: Image recognition, medical imaging, autonomous driving
Speech Recognition: Virtual assistants, transcription, language translation
Gaming and Simulation: learn and adapt in complex environments

VII. Foundation Models and Generative AI on the cloud and on premise

Training powerful models requires substantial computational resources, large datasets, and specialized expertise; born “foundation” models commercialized by big tech companies that typically have over 50 hidden layers and billions of parameters. These powerful models scale multiple domains such as processing text and images to much larger datasets and more complex tasks in the same model. Most popular foundation model categories include:

Large Language Models (LLMs) can understand, generate, and process human language
Vision Models can analyze and interpret visual data from images and videos.
Speech Models can recognize speech and covert text to speech.
Action Models that can understand and execute complex actions in dynamic environments, demonstrating a high degree of autonomy and adaptability.

“Generative AI” are highly versatile foundation models that can generate new content across different domains and applications based on the patterns and information they're trained for. GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers) are well-known gen-AI models.

AI models are designed to generate responses even when uncertain - known as AI hallucinations. Given the power of generative AI, companies must be acutely aware of this limitation. Beyond assessing the quality and diversity of training data and implementing post-processing checks like confidence scoring and cross-referencing with trusted sources, companies should embrace a "human-in-the-loop" approach, leveraging human judgment to mitigate risks. Establishing automated and human feedback loops, especially in high-stakes or sensitive scenarios, will reduce the risk of errors and improve the model's accuracy over time.

Given the extensive resources and expertise required, most companies will opt to integrate commercially available pre-trained models or fine tune with a smaller, task-specific dataset to create smaller models for more focused application. While AI platforms are usually hosted on cloud services, we have started to see on-premise solutions in response to highly regulated industries having to prioritize information security and regulatory compliance.

VIII. Champion Model and Challenger Models are run in parallel for the same purpose to ensure effectiveness and continuous improvement.

AI models can degrade over time due to changes in data relevance and external factors such as new regulations and new trends. More mature MLOps (Machine Learning Operations) will build a Champion model and run Challenger Models, which are new or alternative models, in parallel. Both the Champion and Challenger models are deployed in production and receive the same data, and performance is monitored with the same metrics such as accuracy, response time, and data drift. If a challenger model consistently outperforms the champion model, it may be promoted to become the new champion. This approach allows continuous improvement of the models, ensuring they remain effective as business conditions change.

Final Notes

AI technology enables problem-solving through pattern recognition and massive complex calculations, providing probabilistic judgments that can enable humans to achieve tasks previously impossible. Machines excel at complex tasks like processing large amounts of data and identifying patterns, but struggle with intuitive tasks that humans perform easily, such as applying knowledge flexibly, understanding context, and using common sense. Therefore, the optimal approach combines human insight with AI’s computational power.

AI models rely heavily on the quality of their training data. “Garbage in, garbage out” remains a guiding principle. Companies with strong data governance, integration, and security are best positioned to harness AI's potential.

AI adoption isn’t just a technical challenge; it’s also an organizational and societal shift. AI’s impact on ways of working and jobs cannot be over-emphasized. For example, David Mataciunas, Chairman of the Board of AI Association of Lithuania, talks frequently about programming fast moving to AI-assisted coding. Forward-looking companies will invest heavily in employee training and re-skilling and in fostering a culture of adaptability and embracing change.

AI deployment is not a one-time exercise but requires ongoing monitoring, updates, and refinement. Non-tech executives need to grasp AI lifecycle management - model deployment, monitoring, retraining, decommissioning - to ensure solutions remain effective and relevant. Understanding AI's long-term implications, such as ongoing costs, maintenance, and iterative improvements, is crucial for strategic planning and resource management.

The potential of AI to cause harm – such as biased representations of data in training, the “black boxes” in model’s decision-making, and security vulnerabilities – pose significant risks to business as numerous research have discussed. A growing global consensus on robust governance and ethical standards in responsible AI development and deployment has led to the introduction of key initiatives such as the EU’s AI Act, OECD AI Principles, NIST AI Risk Management Framework.

Finally, AI’s massive data processing demands have accelerated the interest in quantum computing, which promises unprecedented speed, optimization, and energy efficiency. The race to scale quantum computing is now a major focus at the nation-state level.

Interested in the full presentation? Get in touch.

* Ichun Lai founded Propel Global Advisory LLC focusing on accelerating the thoughtful and responsible adoption of AI technology in financial services

Courtney Darling