Data/Data Engineering 6

[Udemy] Data Engineering 101: The Beginner's Guide - Data Pipeline architecture(2) ~ Trend

ML stackAI / ML / DLAI : machines with human like intelligenceprogramming : useing software(=many function)AI : don’t build function ourselvesgiving an answer(labels, expected output)training → inferencetraining : need a lot of data and computeML : field within AI, learn from data rather than handwritingstructured, tabular datasmall data, less compute requireddownstream consumer of the data engi..

[Udemy] Data Engineering 101: The Beginner's Guide - Data Pipeline architecture(1)

data architecturewhat is good data architectureperformance : using computing and storage resources efficientlytrade-off between performance and complexityscalability : data volumes = fluctuateupstream system fail → increasing data volumesscale up/down should be automatical : scale-down can save a lot of moneyreliability : available system & avoid failureAutomate as much as possible → reduce huma..

[Udemy] Data Engineering 101: The Beginner's Guide - Undercurrents

DataOpsDevOps for dataDevOps : deploy software in a more iterative & robust mannerbuild, manage cloud infraobservability of cloud infrabuild automated CI(Continuous Integration)/CD(Continuous Deployment) PipelineDataOps : data product deployments more iterative and robustbuild, manage cloud infra for data toolsobservability of data systems(incident reporting and notifications of problems)automat..

[Udemy] Data Engineering 101: The Beginner's Guide - End-to-end data pipeline in-depth(2)

IngestionIngestion = moving or ingesting datafrequencybatch vs streamingbatch : slower = daily or hourlystreaming : faster = seconds to sub-seconds. real-timemicro-batch : combination of batch and streamingBatch ingestionconvenientless latencymore forgiving TypeETL : Extract → Transform → Loadtraditional data warehouse : clean → put DWwhy ETL needs cleaning? DW is expensive!most commonELT : Extr..

[Udemy] Data Engineering 101: The Beginner's Guide - End-to-end data pipeline in-depth(1)

Generation of source datastructured / unstructured : differences in store, search..structured data : tabular, 2-demensional(rows and columns)use SQLBI, classical MLunstructured data : filesuse Deep Learning(Neural Networks)database : if choose wrong database, suffer from performanceRDBMS : Relationaltransactional data, tabular formatrelation between tablesinflexible, strict, normalizedsingle mac..

[Udemy] Data Engineering 101: The Beginner's Guide - Intro

입사한 지 벌써 6개월 차다.데이터 엔지니어링을 직접 하지는 못하더라도 데이터 엔지니어링이 무엇인지, 무슨 일을 하는지, 무엇을 중요하게 여기는지 정도는 알아둬야 할 것 같다는 생각이 들었다. 그래서 udemy에서 Data Engineering 101 강의를 듣기 시작했다. 복습 겸 써보는 포스팅! Data EngineeringWhy Data Engineering is important? : Big data requires efficient data handlingData Engineerwithin data team : bridge between data producers and data consumersdata producer : software engineers and DevOps engineers ..

반응형