Hvordan jeg reverse-engineerede Hemingway Editor - en populær skriveapp - og byggede min egen fra en strand i Thailand

Jeg har brugt Hemingway-appen til at prøve at forbedre mine indlæg. Samtidig har jeg forsøgt at finde ideer til små projekter. Jeg kom på ideen om at integrere en Hemingway-stilredaktør i en markdown-editor. Så jeg havde brug for at finde ud af, hvordan Hemingway fungerede!

At få logikken

Jeg havde ingen idé om, hvordan appen fungerede, da jeg først startede. Det kunne have sendt teksten til en server for at beregne kompleksiteten ved skrivningen, men jeg forventede, at den skulle beregnes på klientsiden.

Åbning af udviklerværktøjer i Chrome (Control + Shift + I eller F12 på Windows / Linux, Command + Option + I på Mac) og navigering til Kilder gav svarene . Der fandt jeg den fil, jeg ledte efter: hemingway3-web.js.

Denne kode er i en formindsket form, som er en smerte at læse og forstå. For at løse dette kopierede jeg filen til VS-kode og formaterede dokumentet ( Control + Shift + I til VS-kode). Dette ændrer en 3-linjers fil til en 4859-linjefil med alt formateret pænt.

Udforskning af koden

Jeg begyndte at se igennem filen efter alt, hvad jeg kunne forstå. Starten af ​​filen indeholdt straks påkaldte funktionsudtryk. Jeg havde ikke nogen idé om, hvad der skete.

!function(e) { function t(r) { if (n[r]) return n[r].exports; var o = n[r] = { exports: {}, id: r, loaded: !1 }; ...

Dette fortsatte i cirka 200 linjer, før jeg besluttede, at jeg sandsynligvis læste koden for at få siden til at køre (React?). Jeg begyndte at skimme gennem resten af ​​koden, indtil jeg fandt noget, jeg kunne forstå. (Jeg savnede ret meget, som jeg senere ville finde gennem at finde funktionsopkald og se på funktionsdefinitionen).

Den første bit kode jeg forstod var hele vejen på linje 3496!

getTokens: function(e) { var t = this.getAdverbs(e), n = this.getQualifiers(e), r = this.getPassiveVoices(e), o = this.getComplexWords(e); return [].concat(t, n, r, o).sort(function(e, t) { return e.startIndex - t.startIndex }) }

Og forbavsende nok blev alle disse funktioner defineret lige nedenfor. Nu vidste jeg, hvordan appen definerede adverb, kvalifikationer, passiv stemme og komplekse ord. Nogle af dem er meget enkle. Appen kontrollerer hvert ord mod lister over kvalifikatorer, komplekse ord og passive stemmesætninger. this.getAdverbs filtrerer ord baseret på, om de ender med 'ly', og kontrollerer derefter, om det er på listen over ord, der ikke er adverb, der slutter med 'ly'.

Den næste nyttige kode var implementeringen af ​​fremhævning af ord eller sætninger. I denne kode er der en linje:

e.highlight.hardSentences += h

'hardSentences' var noget, jeg kunne forstå, noget med mening. Jeg søgte derefter filen efter hardSentencesog fik 13 matches. Dette førte til en linje, der beregnede læsbarhedsstatistikken:

n.stats.readability === i.default.readability.hard && (e.hardSentences += 1), n.stats.readability === i.default.readability.veryHard && (e.veryHardSentences += 1)

Nu vidste jeg, at der var en readabilityparameter i både statsog i.default. Jeg søgte i filen og fik 40 matches. En af disse kampe var en getReadabilityStylefunktion, hvor de vurderer din skrivning.

Der er tre niveauer: normal, hård og meget hård.

t = e.words; n = e.readingLevel; return t = 10 && n = 14 ? i.default.readability.veryHard : i.default.readability.normal;

"Normal" er mindre end 14 ord, "hårdt" er 10-14 ord, og "meget hårdt" er mere end 14 ord.

Nu for at finde ud af, hvordan man beregner læseniveauet.

Jeg tilbragte et stykke tid her for at finde en forestilling om, hvordan man beregner læseniveauet. Jeg fandt det 4 linjer over getReadabilityStylefunktionen.

e = letters in paragraph; t = words in paragraph; n = sentences in paragraph; getReadingLevel: function(e, t, n) { if (0 === t 0 === n) return 0; var r = Math.round(4.71 * (e / t) + 0.5 * (t / n) - 21.43); return r <= 0 ? 0 : r; }

Det betyder, at din score er 4,71 * gennemsnitlig ordlængde + 0,5 * gennemsnitlig sætningslængde -21,43. Det er det. Sådan klassificerer Hemingway hver af dine sætninger.

Andre interessante ting, jeg fandt

  • Højdepunktskommentaren (information om din skrivning på højre side) er en stor switch-erklæring. Ternære udsagn bruges til at ændre svaret baseret på hvor godt du har skrevet.
  • Karakteren går op til 16, før den klassificeres som ”Post-Graduate” niveau.

Hvad jeg skal gøre med dette

Jeg planlægger at oprette et grundlæggende websted og anvende det, jeg har lært af dekonstruktionen af ​​Hemingway-appen. Intet fancy, mere som en øvelse til at implementere nogle logik. Jeg har bygget en forhåndsvisning af Markdown før, så jeg kan også prøve at oprette en skriveapplikation med fremhævnings- og scoringssystemet.

Oprettelse af min egen Hemingway-app

Efter at have fundet ud af, hvordan Hemingway-appen fungerer, besluttede jeg mig for at implementere det, jeg havde lært, for at lave en meget forenklet version.

Jeg ville sikre mig, at jeg holdt det grundlæggende og fokuserede mere på logikken end stylingen. Jeg valgte at gå med et enkelt tekstfelt.

Udfordringer

1. Sådan sikres ydeevne. Genscanning af hele dokumentet ved hvert tastetryk kan være meget beregningsmæssigt dyrt. Dette kan resultere i UX-blokering, hvilket naturligvis ikke er det, vi ønsker.

2. Sådan opdeles teksten i afsnit, sætninger og ord til fremhævning.

Mulige løsninger

  • Genscan kun de afsnit, der ændres. Gør dette ved at tælle antallet af afsnit og sammenligne det med dokumentet før ændringen. Brug dette til at finde det afsnit, der er ændret, eller det nye afsnit og kun scanne det.
  • Har en knap til at scanne dokumentet. Dette reducerer scanningsfunktionens opkald massivt.

2. Brug det, jeg lærte af Hemingway - hvert afsnit er en

og eventuelle sætninger eller ord, der skal fremhæves, er pakket ind i en intern med den nødvendige klasse.

Opbygning af appen

For nylig har jeg læst mange artikler om opbygning af et minimum levedygtigt produkt (MVP), så jeg besluttede at jeg ville køre dette lille projekt det samme. Dette betød at holde alt simpelt. Jeg besluttede at gå med en inputboks, en knap til at scanne og et outputområde.

This was all very easy to set up in my index.html file.

 Fake Hemingway 

Fake Hemingway

Test Me

Now to start on the interesting part. Now to get the Javascript working.

The first thing to do was to render the text from the text box into the output area. This involves finding the input text and setting the output’s inner html to that text.

function format() { let inputArea = document.getElementById(“text-area”); let text = inputArea.value; let outputArea = document.getElementById(“output”); outputArea.innerHTML = text; }

Next is getting the text split into paragraphs. This is accomplished by splitting the text by ‘\n’ and putting each of these into a

tag. To do this we can map over the array of paragraphs, putting them in between

tags. Using template strings makes doing this very easy.

let paragraphs = text.split(“\n”); let inParagraphs = paragraphs.map(paragraph => `

${paragraph}

`); outputArea.innerHTML = inParagraphs.join(“ “);

Whilst I was working though that, I was becoming annoyed having to copy and paste the test text into the text box. To solve this, I implemented an Immediately Invoked Function Expression (IIFE) to populate the text box when the web page renders.

(function start() { let inputArea = document.getElementById(“text-area”); let text = `The app highlights lengthy, …. compose something new.`; inputArea.value = text; })();

Now the text box was pre-populated with the test text whenever you load or refresh the web page. Much simpler.

Highlighting

Now that I was rendering the text well and I was testing on a consistent text, I had to work on the highlighting. The first type of highlighting I decided to tackle was the hard and very hard sentence highlighting.

The first stage of this is to loop over every paragraph and split them into an array of sentences. I did this using a `split()` function, splitting on every full stop with a space after it.

let sentences = paragraph.split(‘. ’);

From Heminway I knew that I needed to calculate the number of words and level of each of the sentences. The level of the sentence is dependant on the average length of words and the average words per sentence. Here is how I calculated the number of words and the total words per sentence.

let words = sentence.split(“ “).length; let letters = sentence.split(“ “).join(“”).length;

Using these numbers, I could use the equation that I found in the Hemingway app.

let level = Math.round(4.71 * (letters / words) + 0.5 * words / sentences — 21.43);

With the level and number of words for each of the sentences, set their difficulty level.

if (words = 10 && level < 14) { return `${sentence}`; } else if (level >= 14) { return `${sentence}`; } else { return sentence; }

This code says that if a sentence is longer than 14 words and has a level of 10 to 14 then its hard, if its longer than 14 words and has a level of 14 or up then its very hard. I used template strings again but include a class in the span tags. This is how I’m going to define the highlighting.

The CSS file is really simple; it just has each of the classes (adverb, passive, hardSentence) and sets their background colour. I took the exact colours from the Hemingway app.

Once the sentences have been returned, I join them all together to make each of the paragraphs.

At this point, I realised that there were a few problems in my code.

  • There were no full stops. When I split the paragraphs into sentences, I had removed all of the full stops.
  • The numbers of letters in the sentence included the commas, dashes, colons and semi-colons.

My first solution was very primitive but it worked. I used split(‘symbol’) and join(‘’) to remove the punctuation and then appended ‘.’ onto the end. Whist it worked, I searched for a better solution. Although I don’t have much experience using regex, I knew that it would be the best solution. After some Googling I found a much more elegant solution.

let cleanSentence = sent.replace(/[^a-z0–9. ]/gi, “”) + “.”;

With this done, I had a partially working product.

The next thing I decided to tackle was the adverbs. To find an adverb, Hemingway just finds words that end in ‘ly’ and then checks that it isn’t on a list of non-adverb ‘ly’ words. It would be bad if ‘apply’ or ‘Italy’ were tagged as adverbs.

To find these words, I took the sentences and split them into an arary of words. I mapped over this array and used an IF statement.

if(word.match(/ly$/) &&, !lyWords[word] ){ return `${word}`; } else { return word };

Whist this worked most of the time, I found a few exceptions. If a word was followed by a punctuation mark then it didn’t match ending with ‘ly’. For example, “The crocodile glided elegantly; it’s prey unaware” would have the word ‘elegantly;’ in the array. To solve this I reused the .replace(/^a-z0-9. ]/gi,””) functionality to clean each of the words.

Another exception was if the word was capitalised, which was easily solved by calling toLowerCase()on the string.

Now I had a result that worked with adverbs and highlighting individual words. I then implemented a very similar method for complex and qualifying words. That was when I realised that I was no longer just looking for individual words, I was looking for phrases. I had to change my approach from checking if each word was in the list to seeing if the sentence contained each of the phrases.

To do this I used the .indexOf() function on the sentences. If there was an index of the word or phrase, I inserted an opening span tag at that index and then the closing span tag after the key length.

let qualifiers = getQualifyingWords(); let wordList = Object.keys(qualifiers); wordList.forEach(key => { let index = sentence.toLowerCase().indexOf(key); if (index >= 0) { sentence = sentence.slice(0, index) + ‘’ + sentence.slice(index, index + key.length) + “” + sentence.slice(index + key.length); } });

With that working, it’s starting to look more and more like the Hemingway editor.

The last piece of the highlighting puzzle to implement was the passive voice. Hemingway used a 30 line function to find all of the passive phrases. I chose to use most of the logic that Hemingway implemented, but order the process differently. They looked to find any words that were in a list (is, are, was, were, be, been, being) and then checked whether the next word ended in ‘ed’.

I looped though each of the words in a sentence and checked if they ended in ‘ed’. For every ‘ed’ word I found, I checked whether the previous word was in the list of pre-words. This seemed much simpler, but may be less performant.

With that working I had an app that highlighted everything I wanted. This is my MVP.

Then I hit a problem

As I was writing this post I realised that there were two huge bugs in my code.

// from getQualifier and getComplex let index = sentence.toLowerCase().indexOf(key); // from getPassive let index = words.indexOf(match);

These will only ever find the first instance of the key or match. Here is an example of the results this code will produce.

‘Perhaps’ and ‘been marked’ should have been highlighted twice each but they aren’t.

To fix the bug in getQualifier and getComplex, I decided to use recursion. I created a findAndSpan function which uses .indexOf() to find the first instance of the word or phrase. It splits the sentence into 3 parts: before the phrase, the phrase, after the phrase. The recursion works by passing the ‘after the phrase’ string back into the function. This will continue until there are no more instances of the phrase, where the string will just be passed back.

function findAndSpan(sentence, string, type) { let index = sentence.toLowerCase().indexOf(key); if (index >= 0) { sentence = sentence.slice(0, index) + `` + sentence.slice(index, index + key.length) + "" + findAndSpan( sentence.slice(index + key.length), key, type); } return sentence; }

Something very similar had to be done for the passive voice. The recursion was in an almost identical pattern, passing the leftover array items instead of the leftover string. The result of the recursion call was spread into an array that was then returned. Now the app can deal with repeated adverbs, qualifiers, complex phrases and passive voice uses.

Statistics Counter

The last thing that I wanted to get working was the nice line of boxes informing you on how many adverbs or complex words you’d used.

To store the data I created an object with keys for each of the parameters I wanted to count. I started by having this variable as a global variable but knew I would have to change that later.

Now I had to populate the values. This was done by incrementing the value every time it was found.

data.sentences += sentence.length or data.adverbs += 1

Værdierne skulle nulstilles hver gang scanningen blev kørt for at sikre, at værdierne ikke løbende steg.

Med de værdier, jeg havde brug for, måtte jeg få dem til at blive gengivet på skærmen. Jeg ændrede strukturen på html-filen, så inputboksen og outputområdet var i en div til venstre og efterlod en højre div til tællerne. Disse tællere er tomme divs med en passende id og klasse samt en 'counter' klasse.

Med disse divs brugte jeg document.querySelector til at indstille den indre html for hver af tællerne ved hjælp af de data, der var blevet indsamlet. Med en lille smule styling af 'counter' klassen var webappen komplet. Prøv det her eller se på min kode her.