代做CS 259、Java/c++設計程序代寫 - 合肥網

<em id="rw4ev"></em>

<tr id="rw4ev"></tr>

<nav id="rw4ev"></nav>

<strike id="rw4ev"><pre id="rw4ev"></pre></strike>

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

健康合肥汽車體育文旅企業動態企業推廣網站推廣外鏈推廣

代做CS 259、Java/c++設計程序代寫
代做CS 259、Java/c++設計程序代寫

時間：2024-10-12 來源：合肥網hfw.cc 作者：hfw.cc 我要糾錯

Fall 2024
CS 259 Lab 1
Accelerating Convolutional Neural Network (CNN) on FPGAs using
Merlin Compiler
Due October 9 11:59pm
Description
Your task is to accelerate the computation of two layers in a convolutional neural network
(CNN) using a high-level synthesis (HLS) tool on an FPGA. We encourage you to start with
using the Merlin Compiler. For an input image with 228 × 228 pixels and 256 channels, you
are going to calculate the tensor after going through a 2D convolution layer and a 2D max
pooling layer. The convolution layer has 256 filters of shape 256 × 5 × 5, uses the ReLU
activation relu(x) = max{x, 0} with a bias value for each output channel. The 2D maxpooling
layer operates on 2 × 2 non-overlapping windows. You will need to implement this
function using HLS:
void CnnKernel(const float* input, const float* weight, const float* bias, float*
output)
where input is the input image of size [256][228][228], weight stores the weights of the
convolution filters of size [256][256][5][5], bias stores the offset values of size [256] that
will be added to the output channels, and output should be written to by you as defined
above to store the result of maxpool(relu(conv2d(input, weight) + bias)). The output
size is [256][112][112].
How-To
FPGA accelerator compilation typically involves three (3) stages: high-level synthesis (HLS),
bitstream generation, and onboard execution. The last two stages can take days to
complete. Therefore, in this lab, we only focus on the first stage: HLS. Your performance will
only be assessed using the estimation in the HLS reports, which is usually accurate.
However, you are welcome to try out the last two steps if you are interested.

Connecting to the Server: Method 1
In this method, you won’t be able to run Merlin directly from your /home directory, so you’ll need
to copy files back and forth.
1. Connect to the server (VPN may be required). You can find VPN details here:
https://www.it.ucla.edu/it-support-center/services/virtual-private-network-vpn-clients
ssh <username>@brimstone.cs.ucla.edu

2. Start the Docker container and share your home with –v:

docker run -v /d0/class/:/home -it vitis2021 /bin/bash

3. Source Vitis, navigate to the desired directory and clone the repository:

source /tools/Xilinx/Vitis_HLS/2021.1/settings64.sh
cd /opt
git clone https://github.com/UCLA-VAST/cs-259-f24.git
cd cs-259-f24/lab1

4. Copy the necessary file to your home directory:

cp /opt/cs-259-f24/lab1/cnn-krnl.cpp /home/<username>
Connecting to the Server: Method 2
In this method, you can run Merlin directly from your /home directory, but make sure to export your
home directory.

1. Connect to the server (VPN may be required). You can find VPN details here:
https://www.it.ucla.edu/it-support-center/services/virtual-private-network-vpn-clients

ssh <username>@brimstone.cs.ucla.edu

2. Start the Docker container and share your home with –v:

docker run --user $(id -u):100 -v /d0/class/:/home -it vitis2021 /bin/bash

3. Export your home directory:

export HOME=/home/<username>

4. Source Vitis, navigate to your home directory and clone the repository:

source /tools/Xilinx/Vitis_HLS/2021.1/settings64.sh
cd /home/<username>
git clone https://github.com/UCLA-VAST/cs-259-f24.git
cd cs-259-f24/lab1
Build and Run Baseline with Software Simulation
We have prepared the starter kit for you. Please run: make
This command will perform a software simulation of the provided starter FPGA HLS kernel. It
should show “PASS”. You need to use FPGA Developer AMI in this lab unless you are using
a computer with Xilinx Vitis HLS installation. However, you are still suggested to develop code
and run software simulation locally to test the correctness. You can move to AWS once you
enter the tuning stage.
Understand the automatic Merlin’s optimization
Before modifying the kernel and adding pragmas, synthesize the CNN kernel with Merlin and
describe in your report the automatic optimizations made by Merlin and how this reduces
latency.
Modify the HLS CNN Kernel
If you have successfully built and run the baseline HLS CNN kernel, you can now optimize
the code to design your CNN kernel. Your task is to implement a fast, parallel version of the
CNN kernel on FPGA. You should start with the provided starter kit. You should edit cnnkrnl.cpp
for this task. When editing, please use the given types input_t, weight_t, bias_t,
and output_t for the corresponding data, and compute_t for your intermediate values.
You can use them as if they are float numbers.
Parallelism should be exploited by using Merlin pragmas and tiling. You are encouraged to
focus on Merlin pragmas (#pragma ACCEL parallel, #pragma ACCEL pipeline and #pragma
ACCEL tile). You can explicitly modify the code (tiling, loop permutation, …) but make sure
the code modified is correct.
In the starter kit, we simply wrap a sequential CNN code with #pragma ACCEL kernel, and
Merlin automatically performs data caching, memory coalescing, pipelining and
parallelization, which yield about 10 GFLOPs.
Although the skeleton kernel is provided, you are also free to create your own by removing
the header file inclusion of “lib/cnn-krnl.h” and implement the basic kernel from scratch.
However, this would require specific expertise in Xilinx FPGA architecture and is not
recommended for this course.
Test Your HLS CNN Kernel with Software Simulation
To perform software emulation of your FPGA implementation of CNN kernel:
make
If you see something similar to the following message, your implementation is incorrect.
Found 21201** errors
FAIL Since the software simulation step uses the CPU to emulate the hardware behavior, it only
serves as correctness test and its execution time doesn’t reflect that of actual hardware. Your
estimated execution time should be retrieved using the command below:
make estimate
This command will print out the estimated latency and resource usage of your kernel:
+---------------------------+------------------------+----------+----------+---------+--------+-------+------+
| Kernel | Cycles | LUT | FF | BRAM | DSP | URAM |Detail|
+---------------------------+------------------------+----------+----------+---------+--------+-------+------+
|CnnKernel (cnn-krnl.cpp:12)|4179564052 (16718.256ms)|49558 (4%)|49381 (2%)|810 (18%)|202 (2%)|25 (2%)|- |
+---------------------------+------------------------+----------+----------+---------+--------+-------+------+
The time highlighted in yellow is the estimated execution time of your FPGA kernel. You can
get the performance by “kNum*kNum*kImSize*kImSize*kKernel*kKernel*2/latency”, or
164.4/latency (in s) to get the performance in GFLOPS.
IMPORTANT: Please make sure that all your loops have fixed loop bounds. If any of the loop
bounds are variable, a performance estimation will not be shown and you will receive no
performance grade.
IMPORTANT: The “make estimate” command should finish in 30 minutes, or in two hours
with highly-complex optimizations. Our recommendation is to halt your estimation using
Ctrl-C when the time exceeds 30 minutes, except for your last step (after you reach ~100
GOPS). More than 12 hours in the estimation will result in zero for the performance score.
As your kernel design becomes more complex, the software simulation and the estimation
will start to take a significantly longer time.
IMPORTANT: As you apply more optimizations, your resource usage will also increase.
Ideally, you should keep applying optimization until your kernel occupies about 80% of these
resources. The remaining 20% should be reserved for the interfaces (DRAM/PCI-e controller)
and the downstream flows. Please make sure that resource utilization is less than 80% for all
FPGA resources. If any of the resources are over this limit, you will receive no performance
grade.
IMPORTANT: You can check the HLS report by opening merlin.rpt with a text editor. This
file will be generated with the command make estimate. You must submit this file with your
final submission. You should not modify this file in your submission, and it will be all verified
after submission due. Any modification to this file in your submission constitutes academic
misconduct and will be reported.
Advanced Tips for HLS
Kernel Profiling: If you want to “profile” your kernel, you can open merlin.rpt with a text
editor and scroll down to Performance Estimate. You can see the trip count, accumulated
cycles and cycles per call, as well as pipeline initiation interval and parallel factor for each
loop in the table. For resource usage, you can go to Resource Estimate. No loop level
information is available, though. If you want to check the resource usage of a code region,
you can wrap it with a function then run again.
Kernel after transformation: If you want to see the kernel after being transformed by Merlin,
you can look for that in .merlin_prj/run/implement/exec/hls/kernel. Annotation for Profiling: If you find the loops in your report hard to read, you can name the
loops you are interested in using a goto label. For example, this_loop: for (int i = 0;
i < n; i++);
Debugging Pipelining: If you are not sure about why you cannot achieve a specific initiation
interval as you expected, you can open the file below and read the logs. HLS usually gives out
a reason.
.merlin_prj/run/implement/exec/hls/_x/logs/CnnKernel/CnnKernel/vitis_hls.log
Long Synthesis Time In Pipelining: You will experience long HLS synthesis time (for
generating the estimation) if you pipeline a loop with a large loop body. Besides, please note
that as all loops inside a pipeline will be unrolled, it may be automatically a large loop body.
In this case, you may want to exchange the order of pipelining and unrolling and see if the time
can get improved.
Use Functions for Shorter Synthesis Time: If you experience long synthesis time, you may try
wrapping some loops into a function and specify #pragma HLS inline off inside the
function body. However, this may lead to inaccurate dependency analysis or memory port
analysis and cause lower performance sometimes. There might be some workarounds, or
not. For example, if you have access to A[k + i][j] inside the function, passing A + k to
the function and accessing A’[i][j] can allow HLS to understand the array partitioning
better than passing A. You need to do experiments.
General Tips
● When you develop on AWS, to resume a session in case you lose your connection, you
can run screen after login. You can recover your session with screen -DRR. You should
stop your AWS instance if you are going to come back and resume your work in a few
hours or days. Your data will be preserved but you will be charged for the EBS storage
for $0.10 per GB per month (with default settings). You should terminate your instance
if you are not going to come back and resume your work. Data on the instance will be
lost.
● You are recommended to use private repositories provided by GitHub to backup your
code. Never put your code in a public repo to avoid potential plagiarism. To check in
your code to a private GitHub repo, create a repo first.
git branch -m upstream
git checkout -b main # skip these two lines if you are reusing the folder in Lab 1
... // your modifications
git add cnn-krnl.cpp merlin.rpt
git commit -m "lab1: first version" # change commit message accordingly
# please replace the URL with your own URL
git remote add origin git@github.com:YourGitHubUserName/your-repo-name.git
git push -u origin main
● You are recommended to git add and git commit often so that you can keep track of
the history and revert whenever necessary.
● Make sure your code produces correct results!
(Optional) Modify the HLS CNN Kernel using Vitis Pragmas
You are encouraged to use mainly Merlin pragmas. If needed, you can use Vitis pragmas for
finer-grained control and optimization. The list of pragmas in Vitis can be found here. You can simply write Vitis pragmas and Merlin pragmas in the same file (cnn-krnl.cpp), but note
that, to apply an HLS pragma to a loop, you need to put the pragma inside the loop body
instead of before it.
Submission
You need to report the estimated performance results of your FPGA-based implementation on
a Xilinx Ultrascale+ VU9P FPGA (the FPGA we are using, specified in the makefile). Please
express your performance in GFLOPS and the speedup compared with the starter-kit version.
Your report should also include:
● Please run the input C file through the Merlin Compiler, identify the code
transformation and HLS pragmas that Merlin added, and discuss why.
● Please explain the parallelization and optimization strategies you have applied for
each loop in the CNN program (convolution, max pooling, etc) in this lab. Include the
pragmas (if any) or code segments you have added to achieve your strategy.
● Please incrementally evaluate each parallelization/optimization that you have applied
and explain why it improves the performance.
● Please report the FPGA resources (LUT/FF/DSP/BRAM) usages, in terms of resource
count and percentage of the total. Which resource has been used most, in terms of
percentage?
● Optional: The challenges you faced, and how you overcame them.
● (Bonus +5pts): Analyze your code and check if the DSP/BRAM resource usage
matches your expectation. Only the adders, multipliers, and size of arrays need to be
considered. Please attach related code segments to your report and show how you
computed the expected number. Provide a discussion on possible reasons if they
differ significantly.
You also need to submit your optimized kernel code. Do not modify code in the lib directory.
Please submit on Gradescope. Your final submission should contain and only contain these
files individually:
├ cnn-krnl.cpp
├ merlin.rpt
└ lab**report.pdf
File lab**report.pdf must be in PDF format.
Grading Policy
Your submission will only be graded if it complies with the formatting requirements.
Missing reports/code or compilation errors will result in 0 for the corresponding
category(ies).
Correctness (40%)
Please check the correctness using the command “make”. Performance (40%)
Your performance will be evaluated based on the estimation report generated using the
command “make estimate”. The performance point will be added only if you have the
correct result, so please prioritize the correctness over performance. Your performance will
be evaluated based on the ranges of throughput (GOPS). Ranges A+ and A++ will be defined
after all the submissions are graded:
● Range A++, better than Range A+ performance: 40 points + 20 points (bonus)
● Range A+, better than Range A performance: 40 points + 10 points (bonus)
● Range A GFLOPS [200, 280]: 40 points
● Range B GFLOPS [120, 200): 30 points
● Range C GFLOPS [60, 120): 20 points
● Range D GFLOPS [30, 60): 10 points
● Lower than range F [0, 30): 0 points

Report (20%)
Points may be deducted if your report misses any of the sections described above.
Academic Integrity
All work is to be done individually, and any sources of help are to be explicitly cited. You must
not modify the HLS report merlin.rpt in your submission. Any instance of academic
dishonesty will be promptly reported to the Office of the Dean of Students. Academic
dishonesty includes, but is not limited to, cheating, fabrication, plagiarism, copying code from
other students or from the internet, modifying the software-generated report, or facilitating
academic misconduct. We’ll use automated software to identify similar sections between
different student programming assignments, against previous students’ code, or against
Internet sources. We’ll run HLS on all submissions and compare the reproduced HLS
report with the submitted report. Students are not allowed to post the lab solutions on public
websites (including GitHub). Please note that any version of your submission must be your
own work and will be compared with sources for plagiarism detection.
Late policy: Late submission will be accepted for 24 hours with a 10% penalty. No late
submission will be accepted after that (you lost all points after the late submission time).

請加QQ：99515681 郵箱：99515681@qq.com WX：codinghelp

掃一掃在手機打開當前頁

上一篇:代寫ECE4016、Python設計編程代做

下一篇:DDA3020代做、代寫Python語言編程

注：此文是出于傳遞更多信息之目的。所轉載的內容，其版權均由原作者和資料提供方所擁有！若侵犯了您的合法權益，請聯系我們，將及時更正、刪除，謝謝。

無相關信息

合肥生活資訊

·合肥汽車客運網上售票

·合肥汽車客運

·合肥校外培訓機構“白名單”

·合肥市人民政府征兵辦公室電話

·合肥市中小學教師招聘考試網

·合肥市醫療保險管理中心電話查詢（合肥市醫保

·2023合肥市住房公積金查詢指南

·合肥市住房租賃交易服務平臺（官方網站）

·合肥市消防救援支隊聯系電話

·合肥露營地推薦給你！合肥有哪些露營地？

·2023年合肥具備學歷教育辦學資質的中等職業學

·合肥淮河路步行街

·廬江縣各單位常用電話號碼

·合肥市廬江縣湯池鎮百花村

·安徽省美術館

·安徽創新館 - 安徽科技大市場

·安徽省2023年普通高等學校體育專業課統一考試

·安徽肥東管灣國家濕地公園

·安徽廬陽董鋪國家濕地公園

·肥東大劇院

·廬陽區文化館

·安徽這70個村落擬列入中國傳統村落名錄

·合肥市非機動車安全管理條例，非機動車這些行

·合肥信易貸平臺，為中小微企業融資

·合肥市公管局

·安徽省征地信息公開平臺

·安徽省教育招生考試院，安徽高招咨詢熱線開通

·合肥最新義務教育學區劃分

·成績錄取查詢

·合肥市區2022年高考各分考區考點安排

·合肥交警民意熱線開通

·安徽學習技能可獲補貼

·合肥市各縣區救助站聯系電話地址

·合肥市婚姻登記機構電話地址

·合肥城鄉居民最低生活保障標準和特困人員救助

·合肥熱電，合肥供暖

·合肥24小時核酸檢測服務機構名單，合肥核酸檢

·合肥城鄉居民基本養老保險個人參保信息查詢

·2022年合肥市區中考報名方案發布

·2022屆安徽畢業生求職創業補貼1500元發放申請

·合肥市人社部門聯系電話

·合肥市生育相關服務指南（2021年）

·合肥市公共就業人才服務

·合肥市2021年義務教育招生入學政策

·合肥市2021年中小學幼兒園暑假安排

·合肥教育局各部咨詢電話

·合肥最新展會計劃

·合肥市公共就業人才服務管理中心

·合肥市醫療保障局

·合肥市2021年中小學幼兒園寒假安排

·安徽省政府定價的經營服務性收費目錄清單

·合肥市“互聯網+不動產登記”一體化平臺

·四種合肥通卡要年審

·2020合肥城鄉居民養老保險待遇與繳費標準

·合肥市住房保障和房產管理局

·合肥市殯儀館電話

·合肥招生考試網

·合肥辦理的社�？I務指南

·合肥市社會保障卡業務經辦窗口地址（人社部門

·合肥市最低工資標準2019

合肥圖文信息

急尋熱仿真分析？代做熱仿真服務+熱設計優化

出評開團工具

挖掘機濾芯提升發動機性能

海信羅馬假日洗衣機亮相AWE 復古美學與現代科技完美結合 — 海信羅馬假日洗衣機亮相AWE 復古美學與現代

合肥機場巴士4號線

合肥機場巴士3號線

合肥機場巴士2號線

合肥機場巴士1號線

推薦信息

欄目更新

熱點信息

·代做CS2810、代寫Python/Java程序

·SEHH2042代做、c/c++程序設計代寫

·SEHH2042代做、代寫c++，Java編程

·COMP3009J代做、代寫Python程序設計

·代寫CS3026、代做Virtual Disk

·ISOM3028代做、Python/c++編程語言代寫

·COMP2011代寫、C++編程設計代做

·代寫ECON0013、代做Python/c++語言程序

·COSC2276代做、C/C++語言程序代寫

·ACS11001代做、 Embedded Systems程序語言代寫

短信驗證碼酒店vi設計 NBA直播幣安下載

關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
ICP備06013414號-3 公安備 42010502001045

成人久久18免费网站入口