<em id="rw4ev"></em>

      <tr id="rw4ev"></tr>

      <nav id="rw4ev"></nav>
      <strike id="rw4ev"><pre id="rw4ev"></pre></strike>
      合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

      代做Computer Architecture、代寫Gem5 編程

      時間:2024-06-08  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



      Computer Architecture
      2024 Spring
      Final Project Part 2Overview
      Tutorial
      ● Gem5 Introduction
      ● Environment Setup
      Projects
      ● Part 1 (5%)
      ○ Write C++ program to analyze the specification of L1 data cache.
      ● Part 2 (5%)
      ○ Given the hardware specifications, try to get the best performance for more 
      complicated program.
      2Project 2
      3In this project, we will use a two-level cache 
      computer system. Your task is to write a 
      ViT(Vision Transformer) in C++ and optimize it. 
      You can see more details of the system 
      specification on the next page.
      Description
      4System Specifications
      ● ISA: X86
      ● CPU: TimingSimpleCPU (no pipeline, CPU stalls on every memory request)
      ● Caches
      * L1 I cache and L1 D cache connect to the same L2 cache
      ● Memory size: 8192MB
      5
      I cache 
      size
      I cache 
      associativity
       D cache 
      size
      D cache 
      associativity
      Policy Block size
      L1 cache 16KB 8 16KB 4 LRU **B
      L2 cache – – 1MB 16 LRU **BViT(Vision Transformer) – Transformer Overview
      6
      ● A basic transformer block consists of 
      ○ Layer Normalization
      ○ MultiHead Self-Attention (MHSA) 
      ○ Feed Forward Network (FFN)
      ○ Residual connection (Add)
      ● You only need to focus on how to 
      implement the function in the red box
      ● If you only want to complete the project 
      instead of understanding the full 
      algorithm about ViT, you can skip the 
      section masked as redViT(Vision Transformer) – Image Pre-processing
      7
      ● Normalize, resize to (300,300,3) and center crop to (224,224,3)ViT(Vision Transformer) – Patch Encoder
      8
      ● In this project, we use Conv2D as Patch 
      Encoder with kernel_size = (16,16), stride = 
      (16,16) and output_channel = 768
      ● (224,224,3) -> (14,14, 16*16*3) -> (196, 768)ViT(Vision Transformer) – Class Token
      9
      ● Now we have 196 tokens and each 
      token has 768 features
      ● In order to record global information, we 
      need concatenate one learnable class 
      token with 196 tokens
      ● (196,768) -> (197,768)ViT(Vision Transformer) – Position Embedding
      10
      ● Add the learnable position information 
      on the patch embedding
      ● (197,768) + 
      position_embedding(197,768) -> 
      (197,768)ViT(Vision Transformer) – Layer Normalization
      11
      T
      # of tokens
      C
      embedded dimension
      ● Normalize each token
      ● You need to normalize with the formulaAttention
      ViT(Vision Transformer) – MultiHead Self Attention (1)
      12
      ● Wk
      , Wq
      , Wv 
      ∈ RC✕C
      ● b
      q
       , bk
      , bv
      ∈ RC
      ● W

      ∈ RC✕C
       
      ● b
      o
       ∈ RC
      Input
      Linear
      Projection
      X Attention
      split 
      into 
      heads
      merge 
      heads
      Output
      Linear
      Projection
      Y
      Wk
      , Wq
      , Wv W

      b
      q
       , bk
      , bv b
      o
       ViT(Vision Transformer) – MultiHead Self Attention (2)
      13
      T
      # of tokens
      C
      embedded dimension
      ● Get Q, K, V ∈ RT✕(NH*H) after input linear projection
      ● Split Q, K, V into Q1
      , Q2
      , Q3
      ,..., QNH K1
      , K2
      , K3
      ,..., KNH V1
      , V2
      , V3
      ,..., VNH 
      ∈ RT✕H
      H
      hidden dimension
      Linear Projection and split into heads
      Linear Projection
      Q = XWq
      T
       + b
      q
      K = XWk
      T
       + bk
      V = XW
      v
      T
       + b
      v
      NH
      # of head C = H * NHViT(Vision Transformer) – MultiHead Self Attention (2)
      14
      ● For each head i, compute Si
       = QiKi
      T
      /square_root(H) ∈ RT✕T
      ● Pi = Softmax(Si
       ) ∈ RT✕T
      , Softmax is a row-wise function
      ● Oi = Pi Vi ∈ RT✕H
      Matrix
      Multiplication
      and scale
      Qi
      Ki
      Softmax
      Matrix
      Multiplication Vi
      Oi
      SoftmaxViT(Vision Transformer) – MultiHead Self Attention (3)
      15
      T
      # of tokens
      C
      embedded dimension
      ● Oi ∈ RT✕H
      , O = [O1
      , O2
      ,...,O2
       ]
      H
      hidden dimension
      merge heads and Linear Projection
      Linear Projection
      output = OWo
      T
       + b
      o
      NH
      # of headViT(Vision Transformer) – Feed Forward Network
      16
      ● Get Q, K, V ∈ RT✕(h*H) after input linear projection
      ● Split Q, K, V into Q1
      , Q2
      , Q3
      ,..., Qh
       K1
      , K2
      , K3
      ,..., Kh V1
      , V2
      , V3
      ,..., Vh ∈ RT✕H
      T
      # of tokens
      C
      embedded dimension
      Input
      Linear
      Projection
      T
      # of tokens
      OC
      hidden dimension
      GeLU
      output
      Linear
      ProjectionViT(Vision Transformer) – GeLU
      17ViT(Vision Transformer) – Classifier
      18
      ● Contains a Linear layer to transform 768 features to 200 class
      ○ (197, 768) -> (197, 200)
      ● Only refer to the first token (class token)
      ○ (197, 200) -> (1, 200)ViT(Vision Transformer) – Work Flow
      19
      Pre-pocessing
      Embedder
      Transformer x12
      Classifier
      m5_dump_init
      Load_weight
      m5_dump_stat
      Argmax
      layernorm
      MHSA
      layernorm
      FFN
      matmul
      attention
      matmul
      matmul
      layernorm
      matmul
      Black footed Albatross
      +
      +
      gelu
      matmul
      gelu
      $ make gelu_tb
      $ make matmul_tb
      $ make layernorm_tb
      $ make MHSA_tb
      $ make feedforward_tb
       $ make transformer_tb
      $ run_all.sh
      layernorm
      layernorm
      MHSA
      residualViT(Vision Transformer) – Shape of array
      20
      layernorm token 1 token 2 …… token T
      C
      input/output [T*C]
      MHSA input/output/o [T*C]
      MHSA qkv [T*3*C] q token 1
      C
      k token 1 v token 1 …… q token T k token T v token T
      feedforward input/output [T*C]
      feedforward gelu [T*OC] token 1
      OC
      token 2 …… token TCommon problem
      21
      ● Segmentation fault
      ○ ensure that you are not accessing a nonexistent memory address
      ○ Enter the command $ulimit -s unlimited All you have to do is
      22
      ● Download TA’s Gem5 image
      ○ docker pull yenzu/ca_final_part2:2024
      ● Write C++ with understanding the algorithm in ./layer folder
      ○ make clean
      ○ make <layer>_tb
      ○ ./<layer>_tbAll you have to do is
      23
      ● Ensure the ViT will successfully classify the bird
      ○ python3 embedder.py --image_path images/Black_Footed_Albatross_0001_796111.jpg 
      --embedder_path weights/embedder.pth --output_path embedded_image.bin
      ○ g++ -static main.cpp layer/*.cpp -o process
      ○ ./process
      ○ python3 run_model.py --input_path result.bin --output_path torch_pred.bin --model_path 
      weights/model.pth
      ○ python3 classifier.py --prediction_path torch_pred.bin --classifier_path 
      weights/classifier.pth
      ○ After running the above commands, you will get the following top5 prediction.
      ● Evaluate the performance of part of ViT, that is layernorm+MHSA+residual
      ○ Need about 3.5 hours to finish the simulation
      ○ Check stat.txtGrading Policy
      24
      ● (50%) Verification
      ○ (10%) matmul_tb
      ○ (10%) layernorm_tb
      ○ (10%) gelu_tb
      ○ (10%) MHSA_tb
      ○ (10%) transformer_tb
      ● (50%) Performance
      ○ max(sigmoid((27.74 - student latency)/student latency))*70, 50)
      ● You will get 0 performance point if your design is not verified.Submission
      ● Please submit code on E3 before 23:59 on June 20, 2024.
      ● Late submission is not allowed.
      ● Plagiarism is forbidden, otherwise you will get 0 point!!!
      25
      ● Format
      ○ Code: please put your code in a folder 
      named FP2_team<ID>_code and compress 
      it into a zip file.
      2
      2
      2FP2_team<ID>_code folder 
      26
      ● You should attach the following documents
      ○ matmul.cpp
      ○ layernorm.cpp
      ○ gelu.cpp
      ○ attention.cpp
      ○ residual.cpp

      請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp



















       

      掃一掃在手機打開當前頁
    1. 上一篇:代做QBUS3600、代寫Python設計程序
    2. 下一篇:哪些人可以辦理菲律賓團簽呢(跟團簽的材料)
    3. 無相關信息
      合肥生活資訊

      合肥圖文信息
      挖掘機濾芯提升發動機性能
      挖掘機濾芯提升發動機性能
      戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
      戴納斯帝壁掛爐全國售后服務電話24小時官網
      菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
      菲斯曼壁掛爐全國統一400售后維修服務電話2
      美的熱水器售后服務技術咨詢電話全國24小時客服熱線
      美的熱水器售后服務技術咨詢電話全國24小時
      海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
      海信羅馬假日洗衣機亮相AWE 復古美學與現代
      合肥機場巴士4號線
      合肥機場巴士4號線
      合肥機場巴士3號線
      合肥機場巴士3號線
      合肥機場巴士2號線
      合肥機場巴士2號線
    4. 幣安app官網下載 短信驗證碼 丁香花影院

      關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

      Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
      ICP備06013414號-3 公安備 42010502001045

      成人久久18免费网站入口