Book A.I. Consulting Appointments on Vital.ai

Lately we’ve been fielding a lot of requests for short consulting & advisory services in artificial intelligence from expert networks and other sources.

Topics often include Vector Databases, Large Language Models (LLMs), Agents, and Knowledge Graphs.

Requests come from a variety of sources: Private Equity and Hedge Funds looking to better understand the A.I. market, start-ups looking for advice or a reality check, and enterprises looking to validate A.I. strategy and implementation goals.

We wanted to reduce any friction arranging such appointments.

You can now book an appointment with Vital.ai’s founder Marc Hadfield straight from the Vital.ai website: https://www.vital.ai/#consulting

More information about Marc Hadfield is available from: https://www.vital.ai/about.html#team

We also provide longer term consulting and advisory services, as well as A.I. Agent implementations utilizing the Vital Agent Ecosystem: https://www.vital.ai/agent-ecosystem.html

For longer term engagements, please continue to contact us via the information on our website here: https://www.vital.ai/about.html#contact

Hope to hear from you soon!

Reasoning, LLMs, Neuro-Symbolic AI, and Defeasible Logic (with Python Example)

I recently had the above interaction with ChatGPT (GPT-4).

I asked:

There is a standard 3 color traffic light that is displaying a color. The color is not green or blue. What color or colors could it be?

And got the response:

If a standard 3-color traffic light is displaying a color that is not green or blue, then it could only be displaying red. Standard traffic lights consist of red, yellow (amber), and green colors, and since blue is not a color in standard traffic lights, if it’s not displaying green, it must be red.

So, a wrong answer, and somewhat nonsensical as it had the correct information (yellow and red) a few words earlier in the same sentence, and yet then concluded the wrong answer, red.

This response was a bit of the luck of the draw. I asked the same question later in the day and got the correct response of “yellow and red”. And, there are techniques of prompting such as “Chain of Thought” that trigger LLMs to generate a response in a more stepwise fashion, leading to more accurate and reliable results. However, for such a simple case we should not have to carefully construct a prompt or hope to be lucky.

If you forgive the LLM by it being “tricked” by the prompt mentioning “blue” then you are removing one of the main use-cases of LLMs: “RAG” (retrieval augmented generation) applications. These applications perform a query of various data sources to collect potentially relevant information and then depend on the LLM to sort through it to pick out the relevant information from unstructured and messy query results to use as supporting information when generating a response. RAG Applications couldn’t work without the LLM being able to separate out and ignore irrelevant facts like a Traffic Light is not displaying blue, or pink, or purple.

By contrast, using classic symbolic artificial intelligence, with a reasoning logic language like prolog, we could define a “TrafficLight” and infer the possible colors in a couple lines of code, with no ambiguity.

Of course symbolic artificial intelligence has its own limitations, including brittleness and an inability to scale well, which is why we’ve moved on to machine learning and generative models like LLMs.

But, we should not have to give up what was good about the symbolist approach to use the new developments of artificial intelligence.

There are efforts underway to combine the symbolist approach with the newer forms of artificial intelligence. There are various names for this effort, but a popular one is Neuro-Symbolic AI.

Let’s say we are creating an application to recommend movies. A symbolist approach might define a relationship:

EnjoysGenre(Person, Genre)

and use that relationship to define facts like:

EnjoysGenre(john, scifi)

and then a further relationship could be defined by composing relationships:

LikeMovie(Person, Movie) :- EnjoysGenre(Person, Genre), HasGenre(Movie, Genre)

with this new relation then able to “predict” enjoying a movie if that movie happens to be in the genre that you like, such as:

LikeMovie(john, starwars) :- EnjoysGenre(john, scifi), HasGenre(starwars, scifi)

Neuro-Symbolic AI extends the symbolic model by learning to perform predictive tasks such as:

  • Predict instances of relations such as LikeMovie based on training with known examples. In the context of Knowledge Graphs this is known as Knowledge Graph Completion as it fills in a Knowledge Graph with predicted relationships based on existing relationships.
  • Assign weights to components of rules which would learn how much influence “Genre” should have in the relation LikeMovie compared to other components.
  • Generate new kinds of relations which could then factor into other relations, and so on, recursively. For instance, ReleaseYear or MovieCountryOfOrigin could be learned to be relations of interest and factor into relations such as LikeMovie. ForeignFilm could be learned to be the relation between MovieCountryOfOrigin and the logical NOT of PersonCountryOfOrigin and be included as a factor in LikeMovie (i.e. a foreign film to you is a film from any other country but your own country of origin). We could ask the model to come up with a relationship for DateNightMovies which it could learn to be a composition of the partners’ preferences and perhaps something more light-hearted, influenced by previous DateNights.

These tasks may use classic feature driven machine learning models and recommendation systems or may use newer techniques taking advantage of deep learning, embeddings, and transformer models. Some examples of the latter include Graph Neural Networks (see PyG), Probabilistic Soft Logic, and Logic Tensor Networks.

One aim of using Neuro-Symbolic AI vs machine learning is to make the reasoning explainable. The output can include a trace of its reasoning why it thinks you should watch the movie “Miller’s Crossing” based on the genre, director, being similar to a movie you watched and liked recently, and so forth whereas machine learning is more of a black box without much explanation possible.

Future LLMs may have Neuro-Symbolic AI modules as components, similar to how Mixture-of-Expert models combine multiple component models into one melded LLM.

Currently such Neuro-Symbolic models can be used in combination with an existing LLM, taking advantage of such techniques as “function calling”. In function calling, the LLM composes a request to an external resource (a “function”) and that function returns some information that can help the LLM complete its task. So, as example, if the LLM can generate a function call in the form of a query like:

LikeMovie(john, ?Movie)

Then the Neuro-Symbolic AI Model can take over and do the reasoning to generate ?Movie results for john, and then the LLM can use those results to complete its task. This is essentially just another “RAG” query to get contextual information to complete a task.

If we used our LLM to generate logical statements from the prompt, something like:

traffic_light(green, false)

And then used a function calling to “run” those logical statements within a logical reasoner (Neuro-Symbolic or just symbolic), we can use the LLM for what it is good at and use the reasoner for what it is good at to come to our answer.

One aspect of our simple Traffic Light question is that it rests on a finite enumerated list: green, yellow, and red. Our reasoning system must use a process of elimination. If we know that the traffic light is not green and not red, then reasoning can infer that it is yellow, even without that fact explicitly stated. This is easily accomplished in a symbolic system, but as with our example at the start, LLMs can struggle with this.

One important feature of symbolic systems that I have not seen replicated in a Neuro-Symbolic context as of yet is Defeasible Reasoning. Defeasible Reasoning allows certain knowledge to “defeat” other knowledge as part of a reasoning process. This allows new knowledge to override old knowledge, more specific knowledge to override more general knowledge, and knowledge of a higher rank to override less ranked knowledge.

Defeasible Reasoning solves the problem of an inference system coming into conflict by having rules that generate conflicting conclusions. Consider a rule such as:

All Birds Fly

which classifies all instances of the Bird class into a CanFly class. Now consider adding a rule such as:

Penguins Can Not Fly

which classifies all instances of the Penguin class into the CanNotFly class.

Now we have Penguins that are classified as both CanFly (as Birds) and CanNotFly (as Penguins) creating a logical contradiction, which, for a logical inference system, is very bad. Having A and not A both be true simultaneously grinds everything to a halt.

Defeasible Reasoning solves this by having the more specific rule for Penguins defeat the more general rule for all Birds.

Another example of this is the so-called “Nixon Diamond” problem because by one path of reasoning U.S. President Nixon was a pacifist as a Quaker (Society of Friends) and by another path of reasoning was a non-pacifist based on his Republican policies of the Vietnamese War. Defeasible Reasoning provides a tie-breaker between the pacifist and non-pacifist conclusion to avoid a logical contradiction when determining Nixon’s classification for Pacifism.

So in this case, based on ranking of rules or by supporting evidence, the path through the Republican policies “defeats” the Quaker pacifism causing Nixon to be classified as non-pacifist.

One inference engine that implements Defeasible Reasoning is the open-source Ergo Engine (https://github.com/ErgoAI/ErgoEngine). Ergo is based on frame logic making it a cross-over between a logic language and an object oriented language (via “frames” in place of objects). Besides defeasible reasoning it has other advanced features including a convenient way of expressing negative knowledge, as we’ll see in the example below.

Ergo has a python interface, and an example using python is in the repo:
https://github.com/vital-ai/vital-logic-python

:- use_argumentation_theory.
////////////////////////////////////////////////////
// defeasible reasoning example
Human::Thing.
Mortal::Thing.
Immortal::Thing.
Undead::Thing.
MagicUser::Human.
Mortality::AbstractThing.
Mortal:Mortality.
Immortal:Mortality.
Undead:Mortality.
@{default} \neg ?P:Immortal :- ?P:Human.
@{default} ?P:Mortal :- ?P:Human.
@{magical} ?X:Immortal :- ?X:MagicUser.
@{magical} \neg ?X:Mortal :- ?X:MagicUser.
\overrides({magical},default).
// Instance Data
Socrates:Human.
Merlin:Human.
Merlin:MagicUser.
// Rules
mortality(?Human, ?Mortal) :- ?Human:Human,
    ?Human:?Mortal, ?Mortal:Mortality.

Above is a screenshot from PyCharm for the vital-logic-python project and some example rules from the “test_rules.ergo” file in the project.

The classic example from Logic 101 is:

The inference from the premises “all men are mortal” and “Socrates is a man” to the conclusion “Socrates is mortal” is deductively valid.

https://en.wikipedia.org/wiki/Deductive_reasoning

The example Ergo rules above extend this Logic 101 classic to define Defeasible Rules for the class Human as being Mortal but the class Magic User as being Immortal with magical rules overriding (defeating) the default ones. We define two instances of Human, Socrates and Merlin, with Merlin being a Magic User. The rule mortality(?Human, ?Mortality) allows listing out the humans and how they classify as mortal or immortal, with the results being:

?Human = Socrates, ?Mortality = Mortal
?Human = Merlin, ?Mortality = Immortal

The rule:

\neg ?P:Immortal :- ?P:Human.

is an example of a negative rule, encoding negative information, where Humans are not classified as Immortal, unless some rule can “defeat” this.

The result of the query mortality(?Human, ?Mortality) changes for Merlin when the fact:

Merlin:MagicUser.

is added into the database. This is an example of non-monotonic reasoning as the conclusion Merlin is mortal is retracted and a new inference is added for Merlin is immortal when the fact is added. The inference engine must keep track of what conclusions to remove and which to add when facts and rules change. Being able to handle changing facts and conclusions as knowledge changes is a critical component of an AI application.

The repo contains sample python code like:

   for row in pyergo_query('?C::Thing@logic, Merlin:?C@logic.'):
        print("row", row[0])

which runs a query that uses the reasoning rules to generate results, and prints them out. So, integrating python and Ergo is pretty simple. The above prints out the classes assigned to Merlin that are also subclasses of Thing within the database called “logic”.

There is also some sample code for the Traffic Light case mentioned at the start represented as symbolic rules.

Given the python interface, it is straightforward to combine Ergo queries with python code for LLMs, using LLM libraries such as LangChain to access models like OpenAI’s GPT-4 and Anthropic’s Claude. With the function call approach mentioned above, Python can be used to integrate symbolic reasoning with LLMs. If you are a developer, hope you give it a try! We’ll have some examples of using Neuro Symbolic AI using PyG for Graph Neural Networks coming along too. These examples can be used with Agents in the Vital AI Agent Ecosystem and with Agents deployed on Chat.ai.

To learn more about the Agent Ecosystem, check out: https://www.vital.ai/agent-ecosystem.html

To learn more about deploying agents on Chat.ai check out: https://chat.ai/developers.html

If you are interested in Vital.ai helping your organization build and deploy agents, please contact us: https://www.vital.ai/about.html#contact

Running Whisper Speech-to-Text Model in the Browser

In Chat.ai, we’re looking to improve voice access to artificial intelligence. Converting Speech-to-Text is a critical component of interacting with people. Once speech is converted to text it can be fed into subsequent steps to understand the meaning of the text and then generate a response.

The article will discuss using the Whisper Speech-to-Text model within the browser including a demo application.

Applications of Artificial Intelligence (AI) make use of various kinds of models including Transformer models such as Large Language Models (LLMs) like GPT-4 or Speech-to-Text models like Whisper.

An important aspect of an AI application is how the models are deployed on servers and devices.

How models are deployed affects the flow of information and the latency — how quickly the AI application can respond. ChatGPT found great success in part by streaming incremental output back to users, giving the experience of activity and low latency while the model was completing its task.

In general, the closer a model can be deployed to the end user the better, as this reduces latency and improves responsiveness. The ideal deployment is on “edge” devices that users are directly interacting with, whether desktops, laptops, or mobile devices.

On such edge devices, web browsers such as Google Chrome and Apple Safari are the most common user interfaces.

So, running models within browsers is ideal for deployment. But, browsers are running on limited hardware and there may be privacy and security constraints. Therefore, a balance must be struck between what can be deployed with the browser and what should run in the cloud with more significant infrastructure and with higher security and privacy standards. An application can be designed to have certain activity happen on the edge device and other activity happening in the cloud in one unified and seamless user experience.

There is rapid and ongoing development of software libraries that support running models within browsers, and browsers are adopting standards such as WebGL and WebGPU to provide APIs to help optimize running the models.

One such library is Transformers.js ( https://github.com/xenova/transformers.js ) which added support for WebGPU in January 2024 and supports an interchange standard for models called ONNX (Open Neural Network Exchange).

The Whisper model comes in various sizes. The “Whisper Large” model has around 1.5 Billion parameters. Several service providers including OpenAI provide API access to the Whisper Large model in the cloud using significant server and GPU hardware which is necessary for models of that size. However, the “Whisper Tiny” model has around 39 Million parameters and is much more suitable for being deployed within a browser.

Fewer parameters means less accuracy and coverage, but that may be an acceptable trade-off, depending on the particular application. The application can choose to use the edge device model when it can and roll-over to the larger model in the cloud as necessary. This is in part a cost consideration as it’s cheaper to use the edge device hardware when possible and roll-over to the infrastructure in the cloud as necessary, with costs increasing with the amount of infrastructure deployed to support the application.

The Transformers.js library can use the Whisper Tiny model and there is a great demo of it for Speech-to-Text here: https://huggingface.co/spaces/Xenova/whisper-web

This demo uses a web application written using the React web framework, whereas it would be nice to have a separate JavaScript library to drop into any web application.

We created such a library here: https://github.com/vital-ai/vital-stt-js

The library is open source and needs some cleanup and improvement, but sufficient for some usage testing!

In order to test the capability of the model + browser combination, we created a demo by pairing the Whisper model with a “wake word” to activate transcribing speech. The application is available in the repo: https://github.com/chat-ai-app/chat-ai-assistant-demo

and the demo is deployed here: https://demo-voice.chat.ai/

The “wake word” is the phrase “Hey Haley”. “Haley” is the name of our AI Assistant.

The demo displays the text that was transcribed from the speaker. The text is not further processed to generate a response from the AI Assistant. We’re just testing the transcription part in this demo.

I’ll post a separate blog entry on developing the wake word model, which uses an open source library called OpenWakeWord for training: https://github.com/dscripka/openWakeWord

and an open-source JavaScript library for deployment: https://github.com/vital-ai/vital-wakeword-js

In some initial tests on laptops, transcribing a short phrase like “What’s the weather tomorrow in Brooklyn” takes about 1 second, but this should improve with some testing of different configuration settings and enabling further optimizations to utilize the resources of the underlying hardware like WebGPU.

The “wake word” also needs additional training to make it more robust. It may take a few attempts to trigger the wake word which should sound a “ding” when activated.

If you are a developer, you may wish to open up the JavaScript console to see some logging of activity.

When using the demo application, it should request access to the microphone and you should slide the toggle to the right to have the demo listen for the wake word.

It loads up the wake word model after sliding the toggle, and then loads the Whisper model when the first transcription is attempted, so the first few interactions may have some delays while these steps are occurring.

Please let us know any feedback on the demo in comments, and if you are a developer please consider contributing to the libraries linked above!

Search GPTs using AgentShop.ai

There was much excitement in the A.I. Community at the recent OpenAI Developer Day and the announcement of Custom GPTs (and much excitement at the drama that followed with the OpenAI board, but that’s another topic).

As part of our initiative with AgentShop.ai we began to collect Custom GPTs and index them.

The GPT Search Engine is now available publicly at: https://www.agentshop.ai

We plan to add many features including ratings and reviews and would love your feedback.

Here is a quick tour:

The GPT search can use a topic, keywords, or a combination of both.

The search results:

Clicking on “Agent Details” goes to a details page for that GPT.

We’ll be adding reviews, rating, tagging, publisher pages, and similar features.  Please provide feedback as to what we should add!

Clicking on the “Agent” link goes straight to using the GPT on ChatGPT:

You can submit your GPT on the submit page: https://www.agentshop.ai/registeragent

A little information about the implementation:

The GPT information is stored as a Knowledge Graph in a Graph Database.  The particular Graph Database used here is Virtuoso which is open-source and available from OpenLink Software ( https://github.com/openlink/virtuoso-opensource ).

The GPT information is also indexed using a Vector Database.  The vector database used here is Weaviate which also is open-source ( https://weaviate.io/ ).

The infrastructure is managed on AWS, taking advantage of some of the newer GPU options to use with Weaviate.

The “topic” search uses Weaviate, the “keyword” search uses Virtuoso, and “both” uses both.

As we get additional signals from rating and reviews we’ll be able to tune search to include ranking metrics and we’ll be able to tune the vector embeddings to best fit the content of GPTs.

AgentShop.ai will contain many kinds of A.I. Agents, not just OpenAI Custom GPTs, but we couldn’t resist putting this early release out there!

Also, we’ll be adding an AgentShop.ai GPT so you can search, use, rate, and review GPTs directly from within ChatGPT.  We’re hoping to get a few more features from OpenAI such as getting unique user identifiers so that we can associate ratings and reviews with a unique ChatGPT user.

Please provide feedback on AgentShop.ai in comments here or in the forum at: https://forum.chat.ai/

 

Speak with Chat GPT just like Amazon Alexa or Google Home

Here’s a quick video demo of using GPT-3 (text-davinci-003) with a voice interface

I have found it quite interesting to experiment with the ChatGPT model since OpenAI released it recently.

I thought it would be quite fun to connect it up to a spoken interface, just like Amazon Alexa and Google Home AI Assistants.

I decided to go with the following approach: listen for a wake word, record audio until the speaker stopped speaking, transcribe that to text, use GPT-3 to generate output text, and then use Amazon Polly to generate speech, and then “play” the resulting sound in the browser.

Fortunately, the Haley.ai platform enables composing workflows that include models and other functionality. For transcribing audio, the Whisper model was selected. To use the current OpenAI API interface, the latest GPT-3 (text-davinci-003) model was used with a prompt similar to the ChatGPT prompt (since ChatGPT is not yet released for API access). The Amazon Polly voice “Joanna” was selected, which is one of the “Neural” voices which support a limited subset of the SSML speak tags.

The models were composed together with the following workflow:

The screenshot shows the Haley Workflow editor. The 3 models are composed together with the result of the Polly model being sent back to the browser.

Speaking of Polly, the prompt used with GPT-3 shows some examples like the “prosody” tag which affects the Polly output, such as the haiku below:

haiku Screen Shot 2022-12-12 at 8.12.29 PM

Recent chat interactions are included in the prompt to give GPT-3 a degree of memory and the history of the interaction.

The Haley.ai platform takes care of messaging and running the workflow, as well as the embedded user interface displaying the chat messages.

Within the browser, we needed a wake-word to start the voice recording, and a way to track voice activity so that we can stop recording and send the audio recording to Haley.ai to process with the workflow.

Fortunately, some open-source projects do the heavy lifting for these tasks.

For wake word detection, I used: https://github.com/jaxcore/bumblebee-hotword

And to detect voice activity, including when speaking has stopped, I used: https://github.com/solyarisoftware/WeBAD

I’m hoping to make the voice detection and recording a bit more robust and then publicly release the result.