Word counter

Aogbogcog
27 Oct 2024, 21:25

?


ice4
03 Nov 2024, 16:14

?


daeun
03 Nov 2024, 16:19

in room object,
at after entering the room column,
click the code view,
type

get input {
  text = result
  totalLength = LengthOf(text)
  textWithoutSpaces = Replace(text, " ", "")
  lengthWithoutSpaces = LengthOf(textWithoutSpaces)
  spaceCount = totalLength - lengthWithoutSpaces
  msg ("Number of spaces: " + spaceCount)
  numberofwords = spaceCount+1
  msg ("Number of words: " + numberofwords)
}

Test the code,
type in "test test test",
you will get
Number of spaces: 2
Number of words: 3

Obviously, the number of spaces are redundant but I need to showcase you how the method works


Create a function named wordcounter
type in the slim down code without msg player about number of spaces

get input {
  text = result
  totalLength = LengthOf(text)
  textWithoutSpaces = Replace(text, " ", "")
  lengthWithoutSpaces = LengthOf(textWithoutSpaces)
  spaceCount = totalLength - lengthWithoutSpaces
  numberofwords = spaceCount+1
  msg ("Number of words: " + numberofwords)
}

This is for a more flexible wordcounter so you can just call the function whenever you need it


mrangel
04 Nov 2024, 09:50

Counting words isn't necessarily the same as counting spaces. This function will give odd counts if you feed it a string with multiple spaces between words; or with spaces at the beginning or end. Or punctuation marks instead of spaces.

Here's a (somewhat slower) way to count words in a string:

<function name="CountWords" parameters="input" type="int"><![CDATA[
  result = 0
  while (IsRegexMatch ("\\w", input)) {
    result = result + 1
    split = Populate ("^\\W*\\w++\\W*(?<remainder>.*)", input, "firstword")
    input = DictionaryItem (split, "remainder")
  }
  return (result)
]]></function>

This uses the regular expression patterns \w++ (matches any complete word) and \W* (matches a block of nonword characters - including spaces and punctuation). So the call to IsRegexMatch checks if there is a word character (\w, any letter or digit) in the string. If so, Populate removes the first word and any spaces/punctuation from either side of it, and stores the part of the string that still needs to be counted in the remainder subpattern.


ice4
04 Nov 2024, 14:07

I changed mrangel's coding into the following copy and paste code view code,
I am not sure if I did it right,

  1. I rearranged the 'input' and 'result' as quest app recognize 'result' instead
  2. I added in get input {} function
  3. Quest app sounds an error, so I changed \++ to \+
get input {
  text = result
  count = 0
  while (IsRegexMatch("\\w", text)) {
    count = count + 1
    split = Populate("^\\W*\\w+\\W*(?<remainder>.*)", text, "firstword")
    text = DictionaryItem(split, "remainder")
  }
  msg ("Number of words: " + count)
}

K.V.
04 Nov 2024, 14:13

You could also probably add his function, then just use it in your code like this:

get input {
  msg("Number of words: " + CountWords(result))
}

mrangel
04 Nov 2024, 19:17

Huh… an error? Does that mean the regex engine that Quest uses doesn't support possessive (sticky) quantifiers?
I don't think it'll make any difference in this case, \w+ should work just as well. But I thought that had been part of the regex standard for a very long time.

In anyone is wondering about the distinction:

  • \w matches a single word character
  • \w+ matches one or more word characters
  • \w++ matches one or more word characters that are not followed by any more word characters
  • (\W is the opposite of \w, matching non-word characters in the same way that \S matches non-space characters, and \D matches non-digits)

K.V.
04 Nov 2024, 22:31

EDIT: I initially copied and pasted the wrong thing here.

This is nifty.

I made that one change and wrapped it in <![CDATA[[]]>, and it seems to work flawlessly.


Gng
05 Nov 2024, 07:53

\w matches a single word character
\w+ matches one or more word characters
\w++ matches one or more word characters that are not followed by any more word characters
(\W is the opposite of \w, matching non-word characters in the same way that \S matches non-space characters, and \D matches non-digits)

We need more tutorials on regular expression ._.