You deployed your app or service to a kubernetes cluster and with multiple instances, and you want to quickly search some logs by keyword. But the service is running on multiple instances in the cluster, so we need a way to gather the relevant logs from all instances instead of pod by pod.

Fake "Solutions"

By googling you might get some answers like 

kubectl logs service/your-service-name | grep "xxx"

But this command is not you want, because it just search the log from one pod among the multiple instances. So if you get empty result it doesn't mean there were no relevant logs.

Or someone encourage you execute the following:

kubectl logs -l app=your-app-name | grep "xxx"

This might or might not work, it depends on how the app name was defined in the deployment file, if the app names are different across the instances, because for example, an unique suffix will be appended to the common readable name for each pod. In this case you will never get any logs by the previous command.

Better way

If your deployment file gives unique names to each instance, then you need to first get which labels are fixed across different instances. For example, you can randomly choose a pod, and execute:

kubectl get pod/your-pod -o template --template='{{.metadata.labels}}'

This command will give you result similar to 

map[branch:develop namespace:jeff-tian pod-template-hash:54ff684c58 role:cool-app run:cool-app update-timestamp:1608695126059]

From the result we can see that the role is fixed, so we can get the logs in the following way:

kubectl logs --selector role=cool-app | grep "xxx"

Even better way

Send logs to Elasticsearch, and query the logs through Kibana. But this way requires the deployment and integration with ELK, and if you send the logs in real time (for example, using pinojs/pino-elasticsearch), then there is no log latency, only need to change a few application code. If you don't want to change the code, then you can use, filebeats for example, to gather the logs, but in this way you need to solve the log latency challenge.

Comparing to get the logs by the command line, ELK way is obvious more powerful yet need more work to do, which worth a dedicate post to share it in the future.

This post is also available on DEV.