As AI models become more capable, companies are looking for ways to balance performance, privacy, and the rising cost of compute. While cloud-based models offer greater processing power, they require data to be sent to remote servers. On-device AI can keep information local, but is often constrained by hardware limitations. Determining which workloads should run locally and which should be handled in the cloud has emerged as what the industry increasingly describes as an “orchestration problem.”
What is an orchestration problem?
An orchestration problem is the challenge of deciding which AI model should do which part of a task, where it should run, and when. In Perplexity’s case, imagine you’re asking an AI to analyse your bank statement and create a financial summary.
Some parts of the task involve sensitive personal data that should ideally stay on your laptop, while other parts may require the reasoning power of a larger cloud-based AI model. The orchestration problem is figuring out how to split the work between the local and cloud models efficiently.
What is hybrid agentic inference?
Perplexity’s hybrid agentic inference system is designed to automatically determine where AI tasks should be processed. A compact model running on a user’s device handles sensitive information and decides whether certain data should remain local, while more demanding tasks can be routed to powerful AI models in the cloud.
The company mentioned that the approach is particularly useful for tasks involving personal information such as financial records, health data, and private documents. Rather than requiring users to manually choose between local and cloud processing, the system makes those decisions automatically for each request.
Why Perplexity is pushing local AI
The announcement comes as AI companies increasingly explore running models directly on consumer devices. Improvements in processors, graphics chips, and dedicated AI hardware have made it possible to perform a growing number of AI tasks locally rather than relying entirely on cloud infrastructure.
Perplexity argues that keeping more workloads on-device can improve privacy and reduce the amount of computing power required from remote servers. The company stated that its hybrid approach allows local and cloud models to work together, with each handling the tasks best suited to its capabilities.
The company said, “People would rather own a data centre in their laptop than build one they don’t control.” Perplexity is arguing that modern PCs are becoming powerful enough to handle a growing share of AI workloads locally. This gives users greater control over their data, reduces the need to send sensitive information to remote servers, and lessens dependence on large centralised data centres operated by technology companies.
Partnership and support for other hardware
Perplexity unveiled the technology alongside Intel and said the system is designed to work across multiple hardware platforms. The company also highlighted support for NVIDIA’s RTX Spark platform, adding that its orchestration layer is model-agnostic and can operate across different AI chips and local computing environments.
How it compares with rival approaches
Perplexity’s announcement follows a wider industry push toward hybrid AI systems. Apple uses a combination of on-device processing and its Private Cloud Compute infrastructure through Apple Intelligence, while Google offers Gemini Nano for local AI tasks alongside larger cloud-based Gemini models. Microsoft is also expanding on-device AI capabilities with its new Aion model family.
When will it be available?
According to Perplexity, a personal computer with local inference support will begin rolling out in July. The company has not yet shared details about hardware requirements, supported devices, or whether the feature will be available to all users at launch.