<em id="rw4ev"></em>

      <tr id="rw4ev"></tr>

      <nav id="rw4ev"></nav>
      <strike id="rw4ev"><pre id="rw4ev"></pre></strike>
      合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

      代寫COMP26120、代做C++, Java/Python編程

      時間:2023-12-08  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



      Academic Session: 202**4
      Lab Exercise 3: Spellchecking (Better Trade-offs)
      Duration: 4 weeks
      You should do all your work in the lab3 directory of the COMP26120 2023 repository - see Blackboard
      for further details. You will need to make use of the existing code in the branch as a starting point.
      Important: You submit this lab via a quiz on Blackboard. This will:
      1. Ask you some questions about your implementation, including the hash of the commit you want
      us to mark (see below for details of this).
      2. Ask you to upload a zip file of the python, c or java folder you have been working in.
      3. Ask you to upload a PDF report of the experiments you will run in Part C of this lab.
      4. Ask you for a statement on whether and (if relevant) how you used generative AI tools in your
      work. If you used generative AI you also have to upload a file containing transcripts of your
      usage.
      You can save your answers and submit later so we recommend filling in the questions for each part as
      you complete it, rather than entering everything at once at the end.
      • You MUST fill in and submit the quiz (at least some of it) to get marked.
      • Reports MUST be uploaded to Blackboard if you want them marked.
      • Code you want marked MUST be in the commit hash you give us in the Blackboard
      quiz.
      We will not mark anything pushed to GitLab if we don’t have the commit hash we can
      identify from Blackboard. Submission time is recorded as the time the Blackboard quiz
      is submitted, not the time you last pushed to GitLab or the last time you saved the quiz.
      Code Submission Details
      You have the choice to complete the lab in C, Java or Python. Program stubs for this exercise exist
      for each language. Only one language solution will be marked.
      Because people have had a number of issues with GitLab over the years we take a multiple redundancy approach to submission of code. This involves both pushing a commit to GitLab and uploading
      a zip of your code to Blackboard. By preference we will mark the code you submitted to GitLab but
      1
      Figure 1: Identifying the hash of your most recent commit in GitLab
      Figure 2: Identifying the hashes of previous commits in GitLab
      if we can’t find it or it doesn’t check out properly then we will look in the zip file on Blackboard.
      Please do both to maximise the chance that one of them will work.
      When you submit the assignment through Blackboard you will be asked for the hash of the commit
      you want marked. This is to make sure the TAs can identify exactly which GitLab commit you want
      marked.
      You can find the hash of your most recent commit by looking in your repository on GitLab as
      shown in figure 1.
      You can also find the hash for a previous commit by clicking on the “commits” link and then
      identifying the commit you are interested in. This is shown in figure 2.
      Note that while the full hash for commits are quite long, we only need the first 8 characters (as
      shown in the screenshots) to identify for marking.
      For this assignment, you may use built-in libraries in your chosen language but bear in
      mind that we ask you to implement hash sets - simply using the Java HashSet class (for
      instance) will not get you the marks for implementation.
      Reminder: It is bad practice to include automatically generated files in source control (e.g. your git
      repositories). This applies to object files (C), class files (Java), and compiled bytecode files (Python).
      2
      It’s not fatal if you do this but it can cause issues during marking so try not to.
      While it is fine to discuss this coursework with your friends and compare notes, the work submitted
      should be your own. In particular this means you should not have copied any of the source code,
      or the report. We will be using the turnitin tool to compare reports for similarities.
      Generative AI You may use generative AI for this assignment. If you choose to do so you should
      read the guidance notes and watch the guidance video in the Blackboard folder for the lab. At the
      start of the quiz you are asked to make a statement about your use of generative AI. If you leave
      this blank it will be understood as a declaration that you have not used generative AI
      for the coursework. The statement should make the following things clear:
      1. Which generative AI tool you used.
      2. What parts of the coursework you used it for: if you used it for the code was it for all the code
      or only some of it? If just some, which bits? If you used it to help you answer quiz questions,
      which quiz questions? if you used it to help with the report, which parts of the report?
      3. What steps you took to ensure that the generative AI tool had provided correct information.
      4. What evidence there is either in the statement, or in your coursework submission, that demonstrates you have met the coursework learning outcomes.
      If you used generative AI you should also upload a zip file containing the transcripts of your usage in
      question 2 of the Blackboard quiz.
      3
      Learning Objectives
      By the end of this lab you will be able to:
      • Explain the role of the hash function in the implementation of hash tables and describe at least
      one hash collision strategy.
      • Explain the basic structure of a binary search tree and the basic operations on this structure.
      • Produce C, Java, or Python code to implement the above concepts.
      • Evaluate experimentally the behaviour of hash sets and binary search trees.
      Introduction
      The aim of this exercise is to use the context of a simple spell-checking program to explore the hash
      table and binary search tree data structures.
      This exercise builds on work in Labs 1 and 2. If you have not done these labs we recommend you
      take a look at them and their model solutions in order to familiarise yourself with the spell-checking
      program and our expectations for creating and writing up experiments.
      Getting the Support Code
      You should be able to see a lab3 folder in your Gitlab that contains all the support files for this lab.
      You can find instructions for accessing your gitlab on Blackboard.
      Data Structures
      The spell-checking program stores a dictionary of words using a Set datatype. There are two data
      structures used to implement this Set datatype in this lab. In Lab 1 we introduced this datatype
      but we repeat that information here. You may also need to look online (e.g. the Wikipedia page) to
      complete your knowledge.
      Set. This is an abstract datatype that is used to store the dictionary of words in our spell-checking
      program. The operations required for this application are:
      • find whether a given value (i.e. string, alphabetic word) appears in the set.
      • insert a given value in the set. Note that there is no notion of multiplicity in sets - a value
      either appears or it does not. Therefore, if insert is called with a duplicate value (i.e. it is
      already in the set) it can be ignored.
      There would usually also be a remove function but this is not required for this application. We also
      include two further utility functions, which we ask you to implement, that are useful for printing what
      is happening :
      • print set to list the contents of a Set.
      • print stats to output statistical information to show how well your code is working.
      You should have already used a dynamic array to implement this dataype in Lab 1.
      In this exercise we use hash tables and binary search trees to implement the Set datatype. These
      have been introduced in lectures and we describe these briefly below. You may want to look at the
      recommended textbooks or online to complete your knowledge.
      4
      Hash Table. The underlying memory structure for a hash table is an array. A hash function is
      then used to map from a large ‘key’ domain to the (much smaller) set of indices into the array, where
      a value can be stored. When two values in the input domain map to the same index in the array this
      is called a collision and there are multiple ways to resolve these collisions. To use a hash table to
      represent a set we make the key and value the same - the result is usually called a hash set.
      Binary Search Tree. You can think of a tree as a linked list where every node has two ‘children’.
      This allows us to differentiate between what happens in each sub-tree. In a binary search tree the
      general idea is that the ‘left’ sub-tree holds values smaller than that found in the current node, and
      the ‘right’ sub-tree holds larger values.
      Lab 3: Better Storage
      In Lab 1 we achieved a faster find function by first sorting the input. In this part we explore two data
      structures that organise the data in a way that should make it faster to perform the find operation.
      Part a: Hash Table Lookup
      Time Budget: Students generally find Part a of this lab more challenging than Part b. If you have
      not attempted Lab 1, then you should budget around 1 hour for familiarising yourself with the code
      stubs we provide, the spell-checking program, command line flags, test harness and set data-type
      implementation. You might want to look at the sample answers to Lab 1 to assist with this. You
      should then budget 2-3 hours for the implementation itself. If you have spent more than 4 hours
      on Part a, we recommend skipping ahead to Part b and only returning to Part a if you have time
      remaining.
      So far our solution to a fast find function has been sorting the input and in Part b we will look at
      storing it in a sorted order. In this part you will take a different approach that relies on a hash
      function to distribute the values to store into a fixed size array.
      In this part you need to edit the program stub in your chosen language for hash sets. You should:
      1. Complete the insert function for hash sets. What should you do if the value to be inserted
      already exists? Hint: this is representing a set.
      2. Complete the find function. Importantly, it should follow the same logic as insertion.
      3. Implement the print set function – This should print out a sensible representation of the hash
      set when called with the −vvv flag.
      4. Implement the print stats function – This should print out a sensible set of statistics. This
      should include the total number of collisions (including those incurred when re-inserting after a
      rehash), total number of rehashes and the average number of collisions per access. print stats is
      automatically called by the spell-checking program on completion. We will look at this output
      so do not change or disable it. This will require some extra fields in the class (Java, Python) or
      node struct (C).
      You can find a discussion of how to test your code in the section of these instructions on Testing (after
      Part b).
      The hash-value(s) needed for inserting a string into your hash-table should be derived from the
      string. For example, you can consider a simple summation key based on ASCII values of the individual characters, or a more sophisticated polynomial hash code, in which different letter positions are
      associated with different weights. (Warning: if your algorithm is too simple, you may find
      that your program is very slow when using a realistic dictionary and causes issues when
      5
      you perform your experimental evaluation.)
      We want you to implement an open addressing hashing strategy so that collisions are dealt
      with by using a collision resolution function. The simplest of these is linear probing, the most
      straightforward strategy but you could also choose to implement quadratic probing or double hashing. To understand what is going on you should add code to count the number of collisions
      so you can calculate the average per access in print stats. You are welcome to implement whatever
      open addressing strategy you like.
      The program stubs allow you to implement several collision strategies and hash functions which
      can be accessed by modes already given in the stubs. The implementation you want us to mark
      should run in mode 0. You may find this useful if you are attempting the extension exercise in the
      report. More details of these are contained in the language specific instructions. Do not change these
      since they are used in our testing processes. Write your code so that you can use the −m parameter,
      which sets the mode.
      An issue with open addressing is what to do when it is not possible to insert a value. To make things
      simple to begin with you can simply fail in such cases. Your code should keep the num entries field
      of the table up to date. Your insert function should check for a full table, and exit the program
      cleanly should this occur. Once this is working you should increase (double?) the hash table size
      when the table is getting full and then rehash into a larger table. Implement this resize and
      rehash functionality. When the program is called using the −vvv flag it should print out
      a message when it rehashes and then call the print set function to show the new state of
      the hash set once rehashing completes. In C, beware of memory leaks!1
      Mod of negative numbers:
      Most of you are likely to use the mod operator in your hash functions by doing something
      like
      h = x % size
      to try to get a value of h that is between 0 and size - 1.
      If x is negative then the behaviour of this operator varies by language: in particular in C and Java it will return a negative number. See the discussion at
      https://torstencurdt.com/tech/posts/modulo-of-negative-numbers/
      You might think that x cannot be negative. However, if you are adding or multiplying large
      numbers together (e.g., when hashing a string) then you are likely to get integer overflow.
      In hashing, we don’t care about such overflows as it is deterministic and just adds to the
      ‘chaos’ a bit but Java, for instance, may throw ArrayIndexOutOfBoundsException if a
      negative number is passed to the hash data structure, if you have not accounted for this
      possibility in your implementation.
      In C you could use an unsigned integer to make sure that you only have positive numbers.
      Once you have completed this implementation you should look at the Blackboard submission form
      where you will be asked the following questions about Part a.
      1. What did you implement as your hash function? How does it work? How “good” is it? (No
      more than 100 words).
      1One can also get memory leaks in memory managed languages but these are more semantic e.g. preserving a
      reference to something you will never use again.
      6
      2. How do find and insert for hash sets work? How have you implemented them? You do not need
      to discuss the details of your collision resolution strategy here (this is the next question) but
      should cover the basics of how a word is inserted into the dictionary and how a word is found
      (or not) in the dictionary. (No more than 100 words, not counting any code snippets included).
      3. What is your collision resolution strategy? How does it work? How have you implemented
      it? Illustrate your explanation with a sample of your code - this may be tided up a bit for
      presentation but should clearly be the same as the code you have submitted (No more than 200
      words, not counting the code snippet).
      4. Did you implement rehashing? If so briefly explain your implementation including explaining
      when you choose to rehash. (No more than 200 words, not counting any code snippet included).
      Part b: Binary Search Tree Lookup
      Time Budget: This part of the lab should take you about 2 hours to complete. If it is taking you
      more than 3 hours and you have completed a basic hash set implementation for Part a you might
      want to skip forward and do the first section of Part c (which only needs the hash set implementation)
      before returning to this. If you have not completed any of Part a it is worth persevering with this to
      get it working before moving on to Part c where you should focus on those sections relating to binary
      trees.
      In this part you need to edit the program stub in your chosen language for binary search trees.
      You should:
      1. Complete the insert function by comparing the value to insert with the existing value.
      2. Complete the find function. Importantly, it should follow the same logic as insertion.
      3. Implement the print set function – This should print out a sensible representation of the hash
      set when called with the −vvv flag.
      4. Implement the print stats function – This should print out a sensible set of statistics, including
      the height of the tree and average number of comparisons per insert and find. print stats is
      automatically called by the spell-checking program on completion. We will look at this output
      so do not change or disable it.
      Once you have completed this implementation you should look at the Blackboard submission form
      where you will be asked the following question about Part b.
      1. How do find and insert for binary trees work? How have you implemented them? Include your
      code for insert and find and relate your explanation to this code. (No more than 200 words, not
      including the code snippet(s))
      In this lab exercise we will stop there with trees. However, it is worth noting that this implementation
      is not optimal. What will happen if the input dictionary is already sorted? In the next lab we will be
      exploring self-balancing trees.
      Testing
      There are instructions in the individual language sections explaining how to test your algorithm on a
      single example and you should try that first. Once that is done you can test them on some test suites.
      We have provided a test script, test.py in the top level directory of the COMP26120 2023 repository and some data in the data directory. test.py takes up to five arguments: the first is the language
      you are using; the second is the test suite (simple, large and collision tests are provided but
      you can create more); the third is the data structure (bstree or hashset) you are testing; the fourth
      7
      (optional, defaults to 0) is the mode you want to test; and the fifth (optional) is the initial hash set
      size (only applicable for the hashset).
      For example, to run the simple tests for mode 2, your language is Java and you want to test your
      binary tree implementation, you should run
      ./python3 test.py java simple bstree 2
      You might want to edit the script to do different things but we will use the one provided when marking
      your code so make sure it runs with this. We have also provided a large test (replace simple by
      large above) which will take a lot longer to run (see help box below). You will need to unzip the
      files by running unzip henry.zip in the data/large directory. This should be particularly useful to
      check if you have implemented rehashing correctly.
      Finally we have provided some collision test tests. These are dictionaries containing exactly
      five items. If you run them with an initial hash set size of 5 it is highly likely that at least one insert
      will result in a collision and this will allow you to check if your collision resolution mechanism is
      working correctly. These can also be used to test if your rehashing implementation is working.
      For example, to run the collision tests for mode 0 with python as the language, you should run
      ./python3 test.py python collision_tests hashset 0 5
      This will run your hash set implementation on these tests with an initial hash set size of 5.
      In general, to be sure that you collision resolution strategy is working, we would recommend adding
      a message that gets printed out when run in verbose mode (e.g., with the -vv flag) when a collision
      occurs, possibly including printing out the set after the element is inserted, and then running tests
      manually using the instructions in the language specific sections to make sure at least one of your
      tests is encountering a collision and it is handled correctly when it does.
      Failing Tests:
      If the tests are failing there are two possible explanations. Either your implementation is
      wrong or the test script is being more picky than you thought.
      For the first reason, make sure you understand what you expect to happen. Create a
      small dictionary and input file yourself and test using these or use one of the individual
      simple tests. There are nine of these and you will find them organised in the data/simple
      directory where each sub-directory contains an example dictionary, input file and the
      answer expected by the test. Most of these are small and easy to understand. Remember
      that the program should print out all of the words that are in the input file but not in the
      dictionary. Make sure that your code is connected up properly in find so that it returns
      true/false correctly.
      For the second reason, make sure that you don’t print anything extra at verbosity level 0. If
      you look at test.py you’ll see that what it’s doing is comparing the output of your program
      between “Spellchecking:” and “Usage statistics:” against some expected output. It runs the
      program at verbosity level 0 and expects that only spelling errors are output in-between
      those two points.
      As part of marking, we will run your implementations on the simple and collision tests test
      suites and may also run them on the large test suite if we feel additional testing is required.
      Part c: Making a Comparison
      Time Budget: If you have not attempted Lab 2 then you should budget 1 hour to familiarise
      yourself with the concept of running experiments, generating data and so on. You should then expect
      8
      this part of the lab should take you 2-3 hours to complete. However, actually generating data and
      running experiments is likely to take more time than that, but you can leave your computer running
      and do something else while this is happening – particularly if you write scripts to automate data
      generation and run experiments. You should plan your time with this in mind. We recommend writing
      up the experiments as you go so you have something to submit if you run out of time.
      You should now perform an experimental evaluation like the one you did for Lab 2. We want to
      address the question.
      Under what conditions are your different implementations of the dictionary data structure
      preferable?
      Note that for hash sets the initial size of the data structure will potentially play a role. You may
      want to design your experiment so this is not a concern. You may also find it useful to correlate your
      findings with statistics such as the number of collisions in the hash table.
      We recommend you read, if you have not already done so, the Lab 2 instructions on how to generate
      inputs for your experiments. You may also (if possible) reuse the inputs you generated as part of that
      lab as part of your evaluation here in order to save time. Note that generation of inputs is likely to
      take some time (hours for large dictionaries and queries), so you should plan your work so you can
      set a generation script running and then leave it while you do something else.
      You should perform two experiments for your evaluation and then draw a conclusion that answers
      the question above. You can then devise and run a third optional (set of) experiments as an extension
      exercise.
      The two experiments you should perform should answer the questions:
      1. Does your implementation of insert for hash sets work as expected in the average case in terms
      of complexity.
      2. Does your implementation of insert for binary trees work as expected in the average case in
      terms of complexity.
      Important: In your hypothesis, experimental design and results you should be very careful to
      consider whether you are discussing or measuring a single insertion, or several insertions and you
      should be consistent around this throughout. Of course, you can measure several insertions and from
      that calculate the cost of a single insertion, but you should be clear about this.
      Your presentation of each experiment should have four sections – these are likely to be very similar
      for each experiment:
      Hypothesis You should state, as a hypothesis, what you expect the performance to be in big O
      notation and then write a short paragraph explaining why you believe this to be the case, based
      on the theoretical complexities of insert for the data structure. Be careful about how you present
      big O notation. In particular O is not a function so it doesn’t make sense to perform arithmetic
      with it such as
      O(n)
      n
      or
      O(n) × n
      (This section should be about 200 words and take up no more than half a page in your report).
      Experimental Design You should describe how you designed the experiment. This should include:
      • Your intended independent variable – this is the thing you are varying.
      • Your intended dependent variable – this is the thing you are measuring whose value your
      hypothesis says depends upon the value of the independent variable.
      9
      • Anything that you could vary but you are not – for instance to avoid confusing your results
      with other variables that might also influence performance.
      • What input data you generated and how many input files you used for each value of your
      independent variable.
      • What program you ran on what inputs. How many times you ran it on each input.
      • What you measured and how.
      You should also mention anything else you think is relevant that will help your marker judge
      how well you designed your experiment to test your hypothesis. (This section should be about
      500 words and take up no more than a page in your report – not counting any scripts included
      as appendices).
      Results You should present your results, ideally as a graph showing a best fit line. If you do any
      processing on results, such as generating best fit lines, computing averages, etc., then these
      should be described. Raw data should be presented in an appendix or included with your code
      submission. You should state clearly whether your hypothesis was confirmed or refuted (or it is
      equivocal or difficult to tell). If your hypothesis was refuted you should discuss why that might
      be (e.g., incorrect implementation of insert, some problem with the experimental design) and
      sketch how you might investigate further. (This section should be about 200 words and take up
      no more than half a page in your report – not counting graphs and tables).
      Note that it is more important to be able to clearly explain your experimental design (justifying
      your decisions) than it is to have excellent results. The answers to the wrong question are useless.
      Top marks can be obtained for Part c for a well-designed experiment that has disappointing results
      (either disproving a hypothesis or just being unclear in what they show), so long as the conclusions
      you draw make it clear you are aware of the issues with the results.
      Disappointing/Surprising Results:
      While you can get full marks for results that disprove your hypothesis, these often are
      indicative of issues with the implementation or, if you are using them, with any scripts
      written to automate the experiments. Therefore it is alway worthwhile double-checking these
      things since problems with these will effect your marks. The commonest implementation
      issue we see here is that find and/or insert are implemented in a way that returns the correct
      answer but does more work than necessary. The commonest issue with automation scripts
      is that they are not running the program at all, trying to run it on non-existent or empty
      input files, or causing it to crash somehow. If your results seem to show that your program
      takes the same amount of time on all inputs (particularly if that time is very short) then
      it is worth double-checking that your automation script is actually running the program
      correctly.
      Once you have written up your experiments you should write two further sections.
      Conclusion Use your results to address the initial question: when should you use your hash set
      implementation and when should you use your binary tree implementation? You do not need to
      validate this experimentally since for some implementations this is likely to take a lot of time,
      even with a good implementation on a fast machine. Theoretically, there is a crossover point
      where one becomes better than the other that you should be able to calculate from your existing
      results so even if you can’t run large enough experiments to find this point we expect to see the
      answer you have calculated.
      Data Statement It is good practice to inform readers where they can find your data and code in
      order to check your results and re-run experiments for themselves. This should appear in a Data
      statement.
      10
      We’ve asked you not to include large input files in your gitlab, but you should include any scripts
      you created - for instance to generate input files or run experiments, and the“raw” output data
      - e.g., a table (e.g., as a .csv file or spreadsheet) of independent variable values matched with
      dependent variable values. You may also include this as an appendix in the report.
      The data statement should clearly state where all the input data (if included), scripts and output
      data can be found.
      Hints
      • You should be able to do all of the analysis you need to do for this whole lab using a simple
      spreadsheet application. However the unix utility gnuplot and the python matplotlib library
      also both allow you to create plots from data and fit lines to those plots.
      • As the numbers are all small you might want to count n in lots of 10k e.g. for an input of 10,000
      say it’s n = 1, for 20,000 say it’s n = 2 etceteras. Alternatively, you may wish to measure time
      in milliseconds rather than seconds. This is just to make numbers look more sensible.
      • You can use the UNIX time utility to measure running time. This gives you times for real,
      user and sys. The most accurate way to judge the running time of your algorithm is to take
      the sum or user and sys (real includes the time when other resources might be using your
      CPU and can’t account for computation time across multiple cores).
      • You may want to plot/calculate average time per insert, rather than plotting the total time
      taken to build the dictionary and query it for words. When doing this (particularly for Java) be
      aware that there may be a fixed “overhead” time that is independent of the size of dictionary
      or query – if this is the case and you are calculating averages you may see that the average time
      appears to get better as the dictionary size increases not worse as you would expect. If this is
      happening, you may want to start by plotting a graph of total time and determining where this
      crosses the origin, in order to give you a guess at overhead, and then deduct this start up value
      from the total time before calculating the average. If you do this, make sure to document it in
      your report.
      • If you are convinced your implementation is correct but your results don’t seem to be as expected,
      particularly if you are looking at an overhead that is making time per insert hard to calculate,
      you might want to investigate correlations between other factors – e.g., the relationship between
      dictionary size and tree height for binary trees to see if that can illuminate what is happening.
      Report Extension. As an extension activity you can include a third (set of) experiment(s) in your
      report. This experiment can be to explore any aspect of the practical complexity of your implementation of hash sets or binary trees that you wish – good things to look at include: query time; the other
      collision resolution strategies for hash sets (if you have implemented them); other hash functions;
      and the effect of rehashing on the performance of hash sets, but there are many opportunities here.
      Note that you can get **% of the marks for this coursework without attempting the extension. This
      is for students who wish to stretch themselves and should not be attempted unless the rest of the
      coursework has been completed in good time.
      Instructions for C Solutions
      If you intend to provide a solution in C you should work in the lab3/c directory of the COMP26120 2023
      repository. All program stubs and other support files for C programs can be found in this directory.
      The completed programs will consist of several “.c” and “.h” files, combined together using make.
      You are given a number of complete and incomplete components to start with:
      11
      • global.h and global.c - define some global variables and functions, including the verbosity level.
      • speller.c - the driver for each spell-checking program, which:
      1. reads strings from a dictionary file and inserts them in your data-structure
      2. reads strings from a second text file and finds them in your data-structure
      - if a string is not found then it is assumed to be a spelling mistake and is reported to the
      user, together with the line number on which it occurred
      3. processes input flags to set the dictionary, query file, size, mode and verbosity level to be
      used.
      You should not need to change speller.c – if you do, make sure that the input flags for running
      the program continue to work as specified below.
      • set.h - defines the generic interface for the data-structure
      • hashset.h and hashset.c - includes a partial implementation for hash sets that you need to
      complete in Part a.
      • bstree.h and bstree.c - includes a partial implementation for binary search trees that you need
      to complete in Part b.
      Note: The code in speller.c that reads words treats any non-alphabetic character as a
      separator, so e.g. “non-alphabetic” would be read as the two words “non” and “alphabetic”.
      This is intended to extract words from any text for checking (is that “-” a hyphen or a
      subtraction?) so we must also do it for the dictionary to be consistent. This means that
      your code has to be able to deal with duplicates i.e. recognise and ignore them. For example,
      on my PC /usr/share/dict/words is intended to contain **9,829 words (1 per line) but is
      read as 526,065 words of which 418,666 are unique and 107,399 are duplicates.
      Running your code
      To test your implementation, you will need a sample dictionary and a sample text file with some text
      that contains the words from the sample dictionary (and perhaps also some spelling mistakes). You
      are given several such files in the data directory of the COMP26120 2023 repository, and you will
      probably need to create some more to help debug your code. These files will need to be copied to
      the directory where you are working or you will need to set your PATH so that your program can be
      executed from anywhere.
      You should also test your program using a larger dictionary. One example is the Linux dictionary
      that can be found in /usr/share/dict/words.
      Compile and link your code using make hashset (for part a), or make bstree (for part b). These will
      create executables called speller hashset and speller bstree respectively.
      When you run your spell-checker program, you can:
      • use the −d flag to specify a dictionary.
      • (for part a) use the −s flag to specify a hash table size: try a prime number greater than twice
      the size of the dictionary (e.g. 1,000,003 for /usr/share/dict/words). Do not hard code the
      dictionary size into your code. You should let it be set by the size argument to the
      initialize set function.
      12
      • use the −m flag to specify a particular mode of operation for your code (e.g. to use a particular
      hashing algorithm). You can access the mode setting by using the corresponding mode variable
      in the code you write. Note that the modes you should use have already been specified in
      hashset.h. You should not need to use this unless you want to investigate additional hash
      functions or addressing strategy.
      • use the −v flag to turn on diagnostic printing, or −vv for more printing (or −vvv etc. - the
      more ”v”s, the higher the value of the corresponding variable verbose). We suggest using this
      verbose value to control your own debugging output. You can access the verbosity level as the
      global verbose. This is 0 by default and will be set to 1, 2 or 3 by use of the −v, −vv and −vvv
      flags.
      So the general form of a call to your program should be
      speller hashset -d <dictionary> -s <size> -m <mode> <input-file>
      For instance to run your hash set program on the third of the simple test cases, in mode 1, with a
      medium level of verbosity, you would call:
      speller hashset -d ../data/simple/3/dict -m 1 -vv ../data/simple/3/infile
      NOTE IF CUTTING AND PASTING THIS that the underscores often get lost when you cut and
      paste.
      Instructions for Java Solutions
      If you intend to provide a solution in Java you should work in the lab3/java directory of the
      COMP26120 2023 repository. The completed programs will form a Java package called comp26120.
      You can find this as a sub-directory of the java directory. All program stubs and other support files
      for Java programs can be found in this directory but you will need to copy in either your solution or
      the model solutions for Lab 1 into this directory – this should be the file darray.java.
      You are given a number of complete and incomplete components to start with:
      • speller config.java - defines configuration options for the program, including the verbosity level.
      • speller.java - the driver for each spell-checking program, which:
      1. reads strings from a dictionary file and inserts them in your data-structure
      2. reads strings from a second text file and finds them in your data-structure
      - if a string is not found then it is assumed to be a spelling mistake and is reported to the
      user, together with the line number on which it occurred
      3. processes input flags to set the dictionary, query file, size, mode and verbosity level to be
      used.
      You should not need to change speller.java – if you do, make sure that the input flags for running
      the program continue to work as specified below.
      • set.java - defines the generic interface for the data-structure
      • set factory.java - defines a factory class to return the appropriate data structure to the program.
      • hashset.java -includes a partial implementation for hash sets that you need to complete in Part
      a.
      • speller hashset.java - sub-classes speller for use with hashset.
      13
      • bstree.java - includes a partial implementation for binary search trees that you need to complete
      in Part b. You will need to edit this.
      • speller bstree.java - sub-classes speller for use with bstree.
      Note: The code in speller.java that reads words treats any non-alphabetic character as a
      separator, so e.g. “non-alphabetic” would be read as the two words “non” and “alphabetic”.
      This is intended to extract words from any text for checking (is that “-” a hyphen or a
      subtraction?) so we must also do it for the dictionary to be consistent. This means that
      your code has to be able to deal with duplicates i.e. recognise and ignore them. For example,
      on my PC /usr/share/dict/words is intended to contain **9,829 words (1 per line) but is
      read as 526,065 words of which 418,666 are unique and 107,399 are duplicates.
      Running your code
      To test your implementation, you will need a sample dictionary and a sample text file with some text
      that contains the words from the sample dictionary (and perhaps also some spelling mistakes). You
      are given several such files in the data directory of the COMP26120 2023 repository, and you will
      probably need to create some more to help debug your code. These files will need to be copied to
      the java directory or you will need to set your CLASSP AT H so that your program can be executed
      from anywhere.
      You should also test your program using a larger dictionary. One example is the Linux dictionary
      that can be found in /usr/share/dict/words.
      Compile your code using javac *.java. This will create an executable called speller bstree.class
      (for Part a) and speller hashset.class (for Part a). To run your program you should call java
      comp26120.speller bstree sample-file and java comp26120.speller hashset sample-file respectively. Note that you will either need to set your CLASSP AT H to the java directory of the
      COMP26120 2023 repository or call java comp26120.speller bstree sample-file (resp. java
      comp26120.speller hashset sample-file) in that directory.
      When you run your spell-checker program, you can:
      • use the −d flag to specify a dictionary.
      • (for part a) use the −s flag to specify a hash table size: try a prime number greater than twice
      the size of the dictionary (e.g. 1,000,003 for /usr/share/dict/words). Do not hard code the
      dictionary size into your code. You should let it be set by config.init size field of
      the config object supplied as an argument to the hashset constructor.
      • use the −m flag to specify a particular mode of operation for your code (e.g. to use a particular
      sorting or hashing algorithm). You can access the mode setting by using the corresponding mode
      variable in the code you write. Note that the modes you should use have already been specified
      in hashset.java. You should not need to use this unless you want to investigate additional hash
      functions or addressing strategy.
      • use the −v flag to turn on diagnostic printing, or −vv for more printing (or −vvv etc. - the
      more ”v”s, the higher the value of the corresponding variable verbose). We suggest using this
      verbose value to control your own debugging output. We suggest using this verbose value to
      control your own debugging output. You can access the verbosity level as the verbose field of
      the hashset object. This is 0 by default and will be set to 1, 2 or 3 by use of the −v, −vv and
      −vvv flags.
      14
      So the general form of a call to your program should be
      java comp26120.speller hashset -d <dictionary> -s <size> -m <mode> -v <input-file>
      For instance to run your hash set program on the third of the simple test cases, in mode 1, with a
      medium level of verbosity, you would call:
      java comp26120.speller hashset -d ../data/simple/3/dict -m 1 -vv ../data/simple/3/infile
      NOTE IF CUTTING AND PASTING THIS that the underscores often get lost when you cut and
      paste.
      Instructions for Python Solutions
      If you intend to provide a solution in Python you should work in the lab3/python directory of the
      COMP26120 2023 repository. All program stubs and other support files for Python programs can be
      found in this directory but you will need to copy in either your solution or the model solutions for
      Lab 1 into this directory – this should be the file darray.py. You will need to use Python 3.
      You are given a number of complete and incomplete components to start with:
      • config.py - defines configuration options for the program, including the verbosity level.
      • speller.py - the driver for each spell-checking program, which:
      1. reads strings from a dictionary file and inserts them in your data-structure
      2. reads strings from a second text file and finds them in your data-structure
      - if a string is not found then it is assumed to be a spelling mistake and is reported to the
      user, together with the line number on which it occurred
      You should not need to change speller.py – if you do, make sure that the input flags for running
      the program continue to work as specified below.
      • set factory.py - defines a factory class to return the appropriate data structure to the program.
      • hashset.py - includes a partial implementation for binary search trees that you need to complete
      in Part a. You will need to edit this.
      • speller hashset.py - front end for use with hashset which then calls the functionality in speller.py.
      • bstree.py - includes a partial implementation for binary search trees that you need to complete
      in Part b. You will need to edit this.
      • speller bstree.py - front end for use with bstree which then calls the functionality in speller.py.
      Note: The code in speller.py that reads words treats any non-alphabetic character as a
      separator, so e.g. “non-alphabetic” would be read as the two words “non” and “alphabetic”.
      This is intended to extract words from any text for checking (is that “-” a hyphen or a
      subtraction?) so we must also do it for the dictionary to be consistent. This means that
      your code has to be able to deal with duplicates i.e. recognise and ignore them. For example,
      on my PC /usr/share/dict/words is intended to contain **9,829 words (1 per line) but is
      read as 526,065 words of which 418,666 are unique and 107,399 are duplicates.
      15
      Running your code
      Important: To run the provided program stubs in python2.7 you will need to install the enum34
      package. You can do this from the UNIX command line. We advise that you use Python 3 instead.
      To test your implementation, you will need a sample dictionary and a sample text file with some
      text that contains the words from the sample dictionary (and perhaps also some spelling mistakes).
      You are given several such files in the data directory of the COMP26120 2023 repository, and you
      will probably need to create some more to help debug your code. These files will need to be copied
      to the python directory or you will need to set your P Y T HONP AT H so that your program can be
      executed from anywhere.
      You should also test your program using a larger dictionary. One example is the Linux dictionary
      that can be found in /usr/share/dict/words.
      To run your program you should call python3 speller hashset.py sample-file (where sample −
      f ile is a sample input file for spell checking) for Part a and python3 speller bstree.py sample-file
      for Part b. Note that you will either need to set your P Y T HONP AT H to the python directory of
      the COMP26120 2023 repository or call python3 speller hashset.py sample-file (resp. python3
      speller bstree sample-file) in that directory.
      When you run your spell-checker program, you can:
      • use the −d flag to specify a dictionary.
      • (for part a) use the −s flag to specify a hash table size: try a prime number greater than twice
      the size of the dictionary (e.g. 1,000,003 for /usr/share/dict/words). Do not hard code the
      dictionary size into your code. You should use self.hash table size which is created
      from the configuration options when the hash set object is created by init .
      • use the −m flag to specify a particular mode of operation for your code (e.g. to use a particular
      sorting or hashing algorithm). You can access the mode setting by using the corresponding mode
      variable in the code you write. Note that the modes you should use have already been specified
      in hashset.py. You should not need to use this unless you want to investigate additional hash
      functions or addressing strategy.
      • use the −v flag to turn on diagnostic printing, or −vv for more printing (or −vvv etc. - the
      more ”v”s, the higher the value of the corresponding variable verbose). We suggest using this
      verbose value to control your own debugging output. You can access the verbosity level as the
      self.verbose field of the hashset object. This is 0 by default and will be set to 1, 2 or 3 by use
      of the −v, −vv and −vvv flags.
      So the general form of a call to your program should be
      python3 speller hashset.py -d <dictionary> -s <size> -m <mode> <input-file>
      For instance to run your hash set program on the third of the simple test cases, in mode 1, with a
      medium level of verbosity, you would call:
      python3 speller hashset.py -d ../data/simple/3/dict -m 1 -vv ../data/simple/3/infile
      NOTE IF CUTTING AND PASTING THIS that the underscores often get lost when you cut and
      paste.
      Marking Rubric
      This coursework is worth 20% of your final mark for COMP26120. This means each mark in this
      mark scheme is worth 0.5% of your final mark for the module.
      16
      In the rubric below, the marks that an item is worth are slightly approximate due to the way
      Blackboard allocates percentages across sections. In general unsatisfactory performance will get you
      10% of the available marks for a section, satisfactory will get you 50% and excellent will get you
      100%. This can vary for items worth more than 1 mark because several criteria may be involved each
      of which can be marked unsatisfactory, satisfactory or excellent.
      Note that on Blackboard, most of the execution marks are given for the Code Upload question
      since the TA will run a bunch of tests on your code in order to determine how well it executes – we
      separate out these marks here into the sections related to specific parts of the implementation.
      If you have used generative AI and have produced excellent quality answers in any section but we
      are not convinced by your statement that you have met the learning outcomes then your mark will
      be capped at satisfactory. Depending upon how many people are affected by this we may then hold
      vivas to see if the mark should be raised.
      請加QQ:99515681 或郵箱:99515681@qq.com   WX:codehelp

      掃一掃在手機打開當前頁
    1. 上一篇:MA2552編程代寫、代做MATLAB程序
    2. 下一篇:CSC345編程代寫、代做Python語言程序
    3. 無相關信息
      合肥生活資訊

      合肥圖文信息
      挖掘機濾芯提升發動機性能
      挖掘機濾芯提升發動機性能
      戴納斯帝壁掛爐全國售后服務電話24小時官網400(全國服務熱線)
      戴納斯帝壁掛爐全國售后服務電話24小時官網
      菲斯曼壁掛爐全國統一400售后維修服務電話24小時服務熱線
      菲斯曼壁掛爐全國統一400售后維修服務電話2
      美的熱水器售后服務技術咨詢電話全國24小時客服熱線
      美的熱水器售后服務技術咨詢電話全國24小時
      海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
      海信羅馬假日洗衣機亮相AWE 復古美學與現代
      合肥機場巴士4號線
      合肥機場巴士4號線
      合肥機場巴士3號線
      合肥機場巴士3號線
      合肥機場巴士2號線
      合肥機場巴士2號線
    4. 幣安app官網下載 短信驗證碼 丁香花影院

      關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

      Copyright © 2024 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
      ICP備06013414號-3 公安備 42010502001045

      成人久久18免费网站入口