This article is a write up of my experience on hosting Qwen 2.5 the 0.5B model in Raspberry Pi 2 using Llama.cpp. Qwen 2.5 is one of the small SLM with 0.5B parameters so a small development board like Raspberry Pi can hold it. RPi 2 Model B comes with 900Mhz speed and only 1GB of memory. But to be honest, setting up the project might take 1-2 hours, and the prompt execution is only some 1-2 tokens per second. So you need to be patient. Let us begin! Step 0: Pick up the Raspberry Pi 2 from the attic.…
-
-
Over the weekend, I was trying do a learning project using Microsoft’s latest Microsoft Fara-7B model, which is a “Computer Use Agent”. Microsoft Fara-7B is a small, efficient, open-weight AI model (with 7 billion parameters) designed to act as a Computer Use Agent (CUA), allowing it to perform tasks on a computer by visually understanding the screen (aka screenshots) and using mouse/keyboard actions (clicks, typing, scrolling) to automate web tasks like booking travel, shopping, or filling forms, offering speed, privacy (runs locally on devices), and lower cost compared to larger models. Currently the model is still in experimental mode, and…
-
-
Takeaways Read the Whitepaper here: https://www.linkedin.com/posts/ninethsense_the-deterministic-ai-agent-a-dual-brain-activity-7402527472975568896-xW1k?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAEqPm8Bief48CxwsnTrzIyprD5rdLx_zjU
-
How should a technical leader respond when a customer asks to add AI to an existing application? The answer requires structure and clear thinking. 1. First, clarify the Actual ProblemNever assume that AI is the right solution. I would start by understanding the business objective. Many requests framed as AI needs turn out to be workflow issues, reporting gaps, or rule based automation opportunities. Accurate problem definition prevents unnecessary complexity. 2. Evaluate Data/App ReadinessAI depends completely on data quality. Assess what data exists, how clean it is, and whether privacy or compliance concerns limit its use. If data foundations are…
-
The term stochastic parrot was introduced in a 2021 paper by Bender, Gebru, and colleagues (ref: Wikipedia). It highlights a fundamental limitation of large language models. These systems generate text by predicting the next token based on statistical patterns. They do not possess grounded understanding of the world. This can lead to convincing output that is incorrect, biased, or superficial. What the metaphor captures is simple: the probabilistic, statistically driven nature of these models. Parrot evokes an entity that mimics language without real understanding. The critique is not about style. It is about reliability. When a model draws from vast…
-
Large Language Models are probabilistic. They predict the next most likely word. When you ask them to “critique,” you populate their context window with high quality reasoning and negative constraints (eg. what not to do). The final generation is then statistically more likely to follow that higher standard because the logic is now part of the immediate conversation history. Try this: Draft: Ask for your content as usual. “Write a cold email to a potential client about our new web design services.” Critique: Dont just ask for a better version. Ask the AI to analyze its draft against specific criteria.…
-
This code snippet demonstrates a sample code which uses Azure OpenAI endpoint to execute an LLM call. # pip install agent-framework python-dotenv import asyncio import os from dotenv import load_dotenv from agent_framework.azure import AzureOpenAIChatClient # Load environment variables from .env file load_dotenv() api_key = os.getenv("AZURE_OPENAI_API_KEY") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT") endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") api_version = os.getenv("AZURE_OPENAI_API_VERSION") agent = AzureOpenAIChatClient( endpoint=endpoint, api_key = api_key, deployment_name=deployment_name, api_version=api_version ).create_agent( instructions="You are a poet", name="Poet" ) async def main(): result = await agent.run("Write a two liner poem on nature") print(result.text) asyncio.run(main()) .env file sample AZURE_OPENAI_API_KEY={paste your api key} AZURE_OPENAI_ENDPOINT=https://{your enpoint}.openai.azure.com/ AZURE_OPENAI_DEPLOYMENT=o4-mini AZURE_OPENAI_API_VERSION=2024-12-01-preview
-
Most of us are in a transition phase from AI prototypes to production systems. Many frameworks that appeared impressive on demo servers have failed badly in real production environments. It is important to consider all architectural pillars and aspects during the design stage. Delaying these decisions only adds time and cost later. Agentic/AI consumes tokens, and token usage directly translates to monetary cost. This course offers a clear explanation of AI/agent caching techniques and shows how to evaluate the effectiveness of different caching strategies. Attend the course here: https://www.deeplearning.ai/short-courses/semantic-caching-for-ai-agents/
-
Below source code are for Creating a basic plugin (testplugin.cs), and see how it is being called using prompt from Program.cs. This example uses Weather finding, such as “What is the weather in London on 18th June 2024?”, and it will always return a hard coded value of “29” degrees celsius. You can modify the function to do complex logic. testplugin.cs This is a very basic plugin, which I purposefully did not include any logic. Comments are added inline to explain what each line/function does. Program.cs appsettings.json I used this file to avoid hardcoding sensitive information in the source code…