Skip to main content

No project description provided

Project description

README

Parse and slice hadoop logs

Yarn RM

alt

Dataset

from khadoop.yarn import logrm

Parse all files that look like a regular Ressource Manager log with default name.

logrm.FILEPATTERN is a unix-like pattern file to help glob them.

parsed = []
for filelog in LOGFOLDER.glob(logrm.FILEPATTERN):
    print(filelog)
    parsed += logrm.process(filelog.open())

logrm.process will parse each line and produce a list of dict with sensible information

each dict look like :

 {
   'accepted_to_running': 6,  # nb sec between ACCEPT to RUNNING
   'id_application': 'application_1596547077642_6854',
   'accept_to_running_ts':'2020-08-06 14:59:59,119' # timestamp set for log line 'FROM accepted to RUNNING'
   }

the accepted_to_running represent here the number between these two timestamps on yarn aggregated RM log:

2020-08-06 14:59:52,756 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1596547077642_6854 State change from SUBMITTED to ACCEPTED
...
2020-08-06 14:59:59,119 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1596547077642_6854 State change from ACCEPTED to RUNNING

Related

Setup dev

Env variables:

HIVESERVER_TEST= #raw hiveserver log file
YARNLOG #folder with RM logs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khadoop-1.4.0.tar.gz (9.1 kB view hashes)

Uploaded Source

Built Distribution

khadoop-1.4.0-py3-none-any.whl (10.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page