Navigating the Future of Data: A User’s Perspective on Palantir’s AI-Enhanced Foundry Platform
This week, the team and I attended training provided by Palantir on their Artificial Intelligence Platform (AIP) on Foundry. I’d like to share a bit of what we learned. To do so, I’d like to walk you through a notional use-case. I’m going to use my own words to describe different Foundry-specific capabilities. It’s not concise — sorry! — but if you want to know a little more about this somewhat mysterious technology, here you go:
Before describing the use-case, the fundamental thing you need to understand about Foundry is the “Ontology.”
The Ontology consists of all of the derived data objects on Foundry, described in business terms. So, for example, two notable objects in the Ontology for the aviation safety domain are the Aircraft object and the Event object. The Aircraft object consists of fields particular to an aircraft, such as registration number, event history, maintenance history, certification date, engine type, etc. When you materialize a specific instance of the Aircraft object, it instantiates these properties from various disparate data sources. The Event object, similarly, may consist of various types of safety event reporting — service difficulty reports, mechanical interruption summary reports, Aviation Herald posts, etc.
Objects in the Ontology are related to each other in a graph. Objects may contain Actions — specific ways in which users can interact with the Objects. More on that later.
AIP is Foundry’s integration with Large Language Models. Primarily, it integrates by default with ChatGPT 4 but different models can be interchanged. There are several ways for users to interact with AIP on Foundry:
AIP Assist is a chatbot that interacts with an instance of ChatGPT 4 that (I think) has been finetuned on Foundry documentation or, at least, engages Foundry documentation via Retrieval Augmented Generation (RAG) methodology. AIP Assist basically helps the user work with Foundry. You can ask it questions like, “How do I build apps on Foundry?” and it will, for example, lead the user to Slate or Workshop, the app-building tools native on Foundry, and provide helpful steps for utilizing these tools.
Foundry has a low-code/no-code functionality for creating and executing data transformations called Pipeline Builder.
The user can engage AIP when using Pipeline Builder by asking natural language questions via a “Generate” button that AIP turns into programming/query code and then executes on the data. So, for example, you can tell the prompt, “Join the array in the column titled ‘Airplanes’ into a single string and put in a new column”, and it will do exactly that, and in a debugger, display the coding steps it took to execute the action.
AIP Logic allows the user to apply Large Language Model intuition to data from the Ontology. So, for example, if you have safety event data as an object in the Ontology and the Federal Aviation Regulations (FARs) as an Object in the Ontology, you can use AIP Logic to run a similarity score amongst an event and the FARs and see which FAR is applicable to the specific event and know the percentage of similarity. Data that is in the Ontology can be vectorized easily on Foundry to support this functionality. Output from AIP Logic has type safety. So, if you expect the output to be a string, you can define it as such, and then save the AIP Logic as a block and add to automated data pipelines. The type safety will help ensure the integrity of the AIP Logic output in the automated data pipeline.
Use-Case:
You are an analyst responsible for reviewing documents that are responsive to a Freedom of Information Act (FOIA) request. You must determine what data from the documents must be redacted before being provided to the requestor. You are not provided rules for redaction. Instead, you are provided labeled data that consists of aviation voice transmissions between pilots and Air Traffic Control in PDF files. Your mission is to train a model on the labeled data such that the model learns what types of things need to be redacted from such transmissions and then publish the model as an endpoint that can support this functionality being implemented in an automated fashion.
Note- we actually executed a very rudimentary example of this use-case during our training, using publicly available data we found on the internet. We created our own notional training data. Our results were certainly not production-ready but, in a few hours, it was clear how this use-case could be executed on Foundry.
We uploaded our PDFs containing the voice transmissions to Pipeline Builder. Pipeline Builder has automated functionality for parsing PDFs via OCR into relational data. When we executed this, all of the text from the PDF was placed as an array of strings in a column, along with various metadata from the PDF in other columns.
Finding the array of strings difficult to work with, we engaged AIP and told it via prompt to “Combine all of the strings in the column containing the text from the PDF into a single string and put the derived data in a new column”. AIP executes this.
We then tell the AIP prompt in Pipeline Builder, “The new column of derived data contains several lines of voice transmissions. Each new line of transmission begins with something similar to the format, ‘PILOT:’ or ‘ATC:’. Extract all of the new lines and place them in two new columns, as appropriate, as ‘Pilot Transmissions’ or ‘ATC Transmissions’.” AIP executes this.
Our data has been prepared via the Pipeline Builder — without writing any code. We save the derived dataset as an object in the Ontology. Let’s call it ‘Voice Transmissions’. We can utilize one of the various code environments on Foundry, whether Notebooks or Code Repositories, and invoke functions provided by Foundry to easily vectorize the data in the Voice Transmissions object. We can also do the same for the notional labeled redacted voice transmission data that we upload to Foundry.
Once vectorized, we open AIP Logic in which we can engage the Large Language Model. As inputs, we select our two objects from the Ontology — Voice Transmissions and Redacted Training Data. We can walk through the AIP Logic GUI and enter a prompt like, “Based on the patterns of redaction used in the Redacted Training Data object, appropriately redact the data in the Voice Transmissions data”. When you are working through AIP Logic in development, it will ask you to select a single instance of the Voice Transmissions object from a dropdown. In production, it would run through voice transmissions in a streaming manner, but for testing purposes, you need to give it a single example to work through. Ultimately, to get to the desired result, the user inevitably has to iterate through some prompt engineering but, ultimately, we found AIP Logic capable of doing what we asked.
That said, in the real world, we likely wouldn’t use AIP Logic for this use-case. We’d prefer to use a different type of transformer, more appropriate to the use-case, that we could train on the vectorized redacted training data. Foundry provides a functionality called Modeling Objectives that supports this. Users can upload models (for example, from Hugging Face, or custom designed) from their own computers, online, or from containers. Users can select computing resources (i.e., GPUs) and train the models via the Modeling Objective functionality, in capability similar to AWS SageMaker. Users can train several models via Modeling Objectives and the GUI provides a chart for comparing accuracy scores. Modeling Objectives also supports deploying models to production and standing up endpoints in which the models can be tapped in production.
Going way back to the beginning of this post, I want to reiterate the magic of Actions from the Ontology. Users can create applications on Foundry that can be shared with other users for engaging with the data. Let’s say a user creates a table in an application that displays all of the events from the Event object. This is easy to do via Foundry’s Workshop tool in a no-code way. Because each Object in the Ontology already has user-defined Actions associated with it, something really magical is unlocked in Workshop. Using drag-and-drop tools, the user can add a form that allows application users to add events to the events table. Because the Action is already defined, the form is automatically designed. Workshop knows which fields the user needs to complete to add an event, which are required, which are optional, the validation rules, etc. Actions abstract away all of this development work; the form is just automatically generated, based on the knowledge of the Action. As a former web developer, myself, I know just how much tedious time and effort this saves. It’s the little things that add up to make Foundry an awesome user experience — in my opinion.
This is just scratching the surface. This is what I learned in a day of training, and a half-day of hands-on experience with AIP. I’m excited to dig further into it and learn more about this powerful capability. What most thrills me is that none of this required any coding once-so-ever (except vectorizing the data), so the bar of entry between subject matter experts and data science is significantly lowered. The type safety in AIP Logic is really useful because then you can utilize prompts to output responses from interactions with the Large Language Model in a desired format and, in an automated way, integrate the outputs from the AI intuition directly into applications and visualizations that can also be built on Foundry without code.
Much of this functionality can be done with various tools on AWS but, if you’ve ever worked with AWS in GovCloud, you might appreciate having this all stitched together in a user-friendly way that orchestrates the governance such that all of the functionality isn’t restricted.
- Steve B (https://twitter.com/devbogoodski)