Grafana Loki is an open-source log aggregation system that indexes labels instead of full log text, which is the single design choice that makes it cheaper to run than Splunk or Datadog. It works, it scales, and it is free to self-host. The harder question for a managed services provider is whether running it across a fleet of client environments earns back the operational time it costs.
This review answers that, not whether Loki is a competent logging tool. It is. We look at multi-tenancy, the AGPLv3 license, the real cost math, and where Loki belongs in an MSP stack.
TL;DR
| Question | Short answer |
|---|---|
| What is it? | Open-source, label-indexed log aggregation - "like Prometheus, but for logs" |
| What does it cost? | Free self-hosted (AGPLv3); Grafana Cloud from $0.45/GB ingested, 50 GB free |
| Multi-tenant? | Yes, native tenant isolation, but you configure and run it |
| Who it fits | MSPs with Kubernetes or Grafana skills who want to cap logging spend |
| Who should skip | Teams wanting turnkey logging with no infrastructure to babysit |
What Grafana Loki Is and How It Works
Loki collects logs from your systems, stores them cheaply, and lets you query them with a language called LogQL. Grafana Labs describes it as "like Prometheus, but for logs," and that comparison is the whole point. Prometheus changed metrics by being simple and cheap. Loki applies the same idea to log data.
The trick is what Loki does not do. Splunk and Elasticsearch index the full content of every log line, which is what makes search fast and storage expensive. Loki indexes only a small set of labels - things like the host, the job, or the environment - and keeps the log body compressed in cheap object storage like S3 or MinIO. You search by label first, then filter through the matching streams. Less index, less cost, slightly more thinking about how you query.
A Loki deployment has a few moving parts. An agent (Grafana Alloy, or the older Promtail) ships logs off each machine. A distributor receives them, an ingester batches and writes them to object storage, and a querier reads them back. Grafana itself is the dashboard you already know, sitting on top to visualize everything. Together with Tempo (traces) and Mimir (metrics), Loki forms what Grafana calls the LGTM stack.
Recent versions have closed real gaps. Loki 3.0 added structured metadata, so you can attach key-value pairs to a log line without spawning a new stream. Native OpenTelemetry support landed, which matters if your clients are standardizing on OTel. Loki 3.7, released in late 2025, kept the pace going and moved the Helm chart to a separate community-managed repository. This is an alive project with one of the larger open-source observability communities behind it.
Querying Logs With LogQL
LogQL is where the label-first model shows its hand. A query starts by selecting a log stream with labels, then layers on filters, parsers, and aggregations. Something like {job="nginx", client="acme"} |= "error" | json | status >= 500 reads as: pull the nginx stream for the Acme tenant, keep lines containing "error," parse the JSON, and surface 5xx responses. If you have written PromQL, the shape feels familiar, which is part of why teams already on Grafana adopt Loki without much retraining.
The honest limitation is search speed on unindexed text. Because Loki does not index log bodies, a broad query across a wide time range with no useful label filter can crawl. Loki 3.0 added Bloom filters, still experimental, to speed up needle-in-a-haystack lookups for specific strings like an error code or a request ID, and the structured-metadata feature gives you more to filter on without exploding stream counts. In practice the fix is discipline: label your streams well, and your queries stay fast. Lazy labeling is how MSPs end up with slow dashboards and blame the tool.
How Loki Handles Multi-Tenancy Across Clients
This is the section the generic Loki tutorials skip, and it is the one that decides whether the tool fits a managed services model.
Loki is multi-tenant by design. Every request can carry an X-Scope-OrgID header that scopes the data to a single tenant. One Loki cluster can hold logs for dozens of clients, each isolated from the others, each with its own retention window and ingestion limits. For an MSP, that maps cleanly onto reality: one platform, many clients, no data bleed between them.
The catch is that "supports multi-tenancy" and "multi-tenancy that runs itself" are different things. You set the tenant boundaries. You configure per-tenant rate limits so one noisy client cannot starve the others. You decide retention per tenant, which is where compliance lives, a healthcare client on a 12-month window, a retail client on 30 days. Loki gives you the knobs. It does not give you a billing-ready, client-facing portal out of the box.
That is the trade. You get genuine isolation and granular control for free, in exchange for owning the configuration. If your team is comfortable in YAML and Kubernetes, this is a fair deal. If it is not, the multi-tenancy that looks like a feature on paper becomes a project.
There is also a quieter benefit around client churn. Because each tenant is logically separated, offboarding a client is a scoped delete rather than an archaeology project across shared indexes, and onboarding a new one is a new tenant ID plus a retention policy. That clean separation is hard to retrofit onto a single-tenant tool you stretched to cover many clients, which is the trap MSPs fall into when they start with a homelab-grade setup and grow into a real book of business.
Deployment Options: Monolithic, Simple Scalable, Microservices
Loki ships in three deployment shapes, and picking the wrong one is the most common way MSPs get burned.
- Monolithic. All components in one binary or container. Simple to stand up, fine up to roughly 20 GB of logs per day. Good for a single client or a lab.
- Simple Scalable. Splits read, write, and backend paths so each scales on its own. The right default for most multi-client MSP deployments.
- Microservices. Every component runs as its own service. Maximum control and scale, meant for multi-terabyte-per-day volumes and dedicated platform teams.
Most MSPs land on Simple Scalable running in Kubernetes via the Helm chart. Docker works for smaller setups. The thing to plan for is object storage, since Loki leans on S3-compatible storage, so budget for that and for someone who understands how chunk and index retention behave under load. Get the mode right early, because migrating between them later is real work.
Alerting, Retention, and Day-Two Operations
Logs are only half the job. Loki ships a ruler component that evaluates alerting and recording rules against incoming log data, then hands firing alerts to Prometheus Alertmanager for routing. That means you can alert on log patterns, a spike in 500s, a specific auth-failure string, or a client backup job that quietly stopped writing logs, and route it into the same on-call flow you already use for metrics. For an MSP running a NOC, that consolidation matters: one alerting pipeline for logs and metrics instead of two that drift apart.
Retention is set globally and overridden per tenant, which is the compliance lever. A client under HIPAA might need 12 months of retention while a small retail account is fine at 30 days, and Loki's compactor enforces those windows and deletes expired chunks from object storage automatically. The piece that gets forgotten is that retention drives storage cost directly, so a longer window for one client should map to how you bill them. Day two is where the operational reality lands: you monitor the monitor, watch ingester memory, keep an eye on object-storage growth, and plan capacity as client log volume climbs. Loki rewards teams that treat it as infrastructure, not as a set-and-forget appliance.
What Loki Costs and Where the Bill Hides
The license fee is zero. Self-hosted Loki is free under AGPLv3, full stop. That headline is true, and it is also where the total gets underestimated.
If you run Grafana Cloud instead of self-hosting, logs start at $0.45 per GB ingested ($0.40 to write, $0.05 to process), with 50 GB free each month, per Grafana's published pricing. Retention beyond the 30-day default runs about $0.10 per GB for each extra 30 days.
Compared with Splunk's ingest-based licensing, which has driven more than one MSP to look for the exit, that is cheap. The Reddit r/homelab crowd that argues Splunk versus ELK versus Graylog versus Loki keeps landing on Loki for exactly this reason: the cost curve stays flat as volume grows.
Self-hosting moves the cost from a license line to a labor line. Someone patches Loki. Someone scales the ingesters when a client doubles their log volume overnight. Someone gets paged when object storage fills up. None of that shows up on an invoice, which is precisely why it gets missed in the build-versus-buy math. A rough rule: if you do not already run Kubernetes for clients, the self-hosted savings can evaporate into engineering hours. If you do, Loki rides along on infrastructure you already maintain.
Put rough numbers on it. A mid-size MSP pushing 200 GB of logs a day across its clients blows past most managed free tiers fast, and at $0.45 per GB ingested that volume turns into real monthly money on Grafana Cloud. Self-hosted, the same volume costs object storage plus the compute to run Simple Scalable, often a fraction of the managed bill, provided you already have the Kubernetes footprint and the person-hours to keep it healthy. The break-even is less about the data and more about whether the ops capacity already exists. That is the calculation to run before the free license talks you into anything.
For a wider look at how monitoring tools price the same way, cheap to start and expensive to operate, the PRTG Network Monitor review walks through the same hidden-cost pattern with sensor-based licensing.
The AGPLv3 Licensing Question for MSPs
Grafana Labs relicensed Grafana, Loki, and Tempo from Apache 2.0 to AGPLv3 back in April 2021, and that license still governs Loki in 2026. For most MSPs running stock Loki, this changes nothing. You can deploy it, operate it for clients, and charge for the service.
The clause that matters is the network-use trigger. AGPLv3 says that if you modify the software and let users interact with your modified version over a network, you have to make that modified source available to them. Running unmodified Loki as a backend service does not trip this. Forking Loki, changing its code, and exposing the changed version to clients as a hosted product could. Few MSPs ever touch the source, so they never hit the clause, but if your differentiation plan involves customizing Loki internals and reselling that, loop in someone who reads licenses for a living before you build on it.
This is worth understanding rather than fearing. AGPLv3 is a strong copyleft license, not a landmine, and Grafana Loki remains genuinely open source under it. The point is to know the boundary before you cross it, not to avoid the tool.
Grafana Loki vs Splunk vs ELK vs Datadog
Loki does not compete with everything that calls itself observability. It is a logging tool. Prometheus, which people sometimes name in the same breath, handles metrics, not logs, so the two are siblings in the same stack rather than alternatives. The real comparison set is Splunk, the ELK (Elasticsearch) stack, and Datadog.
| Tool | Pricing model | Self-host | License | Index approach | Best fit |
|---|---|---|---|---|---|
| Grafana Loki | Free self-hosted; Cloud from $0.45/GB | Yes | AGPLv3 (open source) | Labels only, cheap | Cost-conscious teams with K8s skills |
| Splunk | Ingest/workload-based, premium | Yes | Proprietary | Full-text, fast, costly | Enterprises with big budgets |
| ELK / Elasticsearch | Free OSS core; paid tiers | Yes | Elastic / SSPL | Full-text, heavy to run | Teams wanting full search control |
| Datadog | Per-GB plus per-host | No | Proprietary (SaaS) | Full-text, managed | Teams paying to skip ops entirely |
The pattern is clear. Loki trades fast full-text search and turnkey convenience for low cost and operational control. Splunk and Datadog trade money for not having to think about infrastructure. ELK sits in the middle and asks for the most hands-on tuning of the group. Grafana Labs was named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms, which tells you the broader Grafana stack is enterprise-credible. Loki is the logging piece of that story, not a weekend experiment.
For MSPs already comparing infrastructure monitoring options, the Nagios alternatives guide covers the metrics-and-uptime side of the same decision that Loki handles for logs.
Who Grafana Loki Fits and Who Should Skip It
Here is the call, by situation.
- Fits: MSPs already running Kubernetes for clients, teams with Grafana or Prometheus experience, and shops watching logging spend grow faster than revenue.
- Maybe: MSPs willing to put Loki on Grafana Cloud to skip the ops burden and treat logging as a managed line item rather than a self-hosted build.
- Skip: Teams that want logging to be a checkbox, with no YAML, no object storage to manage, and a vendor to call at 2 a.m.
The deciding factor is not the tool. Loki is good. The factor is whether log aggregation is something your team wants to operate or something you want operated for you.
This is also where stack strategy comes in. Bolting Loki onto a pile of separate RMM, PSA, and security tools adds one more thing to run. The other direction is consolidation: fewer vendors, one platform, less glue code. OpenFrame, the AI-native all-in-one MSP and IT platform from the team behind OpenMSP, takes that path, with native PSA included, RMM, and automation in one place, priced to avoid the lock-in that pushes MSPs toward open-source escapes like Loki in the first place. It is not the only way to run an MSP, and Loki may still earn a spot for client-facing log retention. But if the reason you are eyeing self-hosted logging is vendor cost, the consolidation question is worth asking first. The network management software roundup lays out how the monitoring layer fits into that larger stack decision.
Run Loki if you want control of your logs and your bill, and you have the team to back it. Buy managed if your techs' hours are worth more than the license you would save. The worst move is standing up Loki because it is free and discovering the price later.
Marketing Manager
Ohayo! I'm Kristina, and I'm doing good things with content, SEO, social, and community at Flamingo. Before IT, I worked as a correspondent for Ukraine's Public Broadcasting Company and have a Master's in journalism.
