Author Topic: Word ladder - Rosetta Code  (Read 1242 times)

0 Members and 2 Guests are viewing this topic.

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #60 on: September 14, 2021, 08:22:18 PM »
@SMcNeill 
All your changes are fine including the t substitution for jk + 1, except using other than default screen that adds about a .1 sec to time on my system. Average is 2.27 secs on my system.

@david_uwi 
Yours are running best times on my system right around 2 secs on my system when I add
Code: QB64: [Select]
as Steve has done in his mods.

The path coming out is not the first alphabetically, it might be the last. Are you still doing multiple paths with that mod of yours?

I am also having problems incorporating your mod with Steve's.

Update: oh reverse the for loop! I think I have it. I will add Steve's changes to yours.
« Last Edit: September 14, 2021, 08:26:20 PM by bplus »

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #61 on: September 14, 2021, 09:44:31 PM »
Most odd! I could not get Steve's mod with ASC to work with david_uwi's latest code??? I left the code block in commented out. With david_uwi's latest mod mixed with Steve's changes I get 2.01 secs average on my system.
I expect it will be much lower on Steve's and johnno56.

Code: QB64: [Select]
  1. _TITLE "Word ladder v david_uwi 2021-09-14" '2021-09-10 david has already improved his first version!
  2. ' ref:  https://www.qb64.org/forum/index.php?topic=4157.msg135293#msg135293
  3. ' get original version above, I am modifying below to study
  4. ' 2021-09-11 in v3 b+ mod david_uwi I modify to compare whole set to my orignal whole set
  5. ' 2021-09-11 in v4 I think I can load the words faster
  6. ' 2021-09-12 in v5 lets just call the external file once, use david's idea instead of _continue,
  7. ' I think it looks cleaner without _continue which tool out skipping a For loop by a THEN (GOTO) line #.
  8. ' Ave of 5 runs before this (mod 4) was 11.27.. see if we improve still more, also DefLng A-Z  oh wow.
  9. ' Ave of 5 runs now 8.74 another 2.5 secs shaved off
  10. ' 2021-09-12 in v6 though using asc instead of mid$ might go faster.
  11. ' Ave of 5 runs now is 4.45 secs another 4.29 sec off!
  12.  
  13. ' 2021-09-13 v7 b+ mod david_uwi = v6 b+ mod but I will attempt to translate letter variables into
  14. ' more meaningful in attempts to understand what is being done in his code.
  15. ' Shaved significant time by making store$ (used to be q4) not fixed!
  16. ' Yes run a quick check that target word connects to something does save us so much time that
  17. ' we get a better overall time for the whole set even though we added a tiny bit of time to
  18. ' the ones that do connect.
  19. ' Ave of 5 runs is now 2.57 sec another 1.88 secs shaved off.
  20.  
  21. ' Steve's mods, mainly with $checking:off saves about .3 sec for me with ave 2.27
  22.  
  23. ' v david_uwi 2021-09-14 tried to work in Steves mods with david_uwi's latest improvement
  24. ' plus I insist the shortest path is the one that sorts out first alphabetically :)
  25. ' Running an average of 2.01 secs for my (bplus) system.
  26.  
  27. DEFLNG A-Z ' 2 secs off from old Single default!
  28. REDIM SHARED Fwords$(1 TO 1), UBF ' ubound of Fwords$
  29. start! = TIMER(.001)
  30.  
  31. ' do just once for all ladder calls
  32. OPEN "unixdict.txt" FOR BINARY AS #1 ' take the file in a gulp
  33. buf$ = SPACE$(LOF(1))
  34. GET #1, , buf$
  35. Split buf$, CHR$(10), Fwords$()
  36. UBF = UBOUND(fwords$)
  37.  
  38. ' test set of ladder calls
  39. ladder "boy", "man" '     quick
  40. ladder "girl", "lady" '   this takes awhile
  41. ladder "john", "jane" '   quick enough
  42. ladder "alien", "drool" ' cool but takes a long long time!
  43. ladder "child", "adult" ' and this takes awhile
  44. ladder "play", "ball" '   goes quick
  45. ladder "fun", "job" '     ditto
  46. PRINT: PRINT "Total time including one disk file access:"; TIMER(.001) - start!
  47.  
  48. SUB ladder (q1$, q2$)
  49.     tt! = TIMER(.001) 'include time taken to load to RAM bplus mod to accuracy (.001)
  50.     DIM w(10000) AS STRING * 5 ' words from file not String * 5 'make sure we have enough storage!!
  51.     DIM store$(10000, 100) ' wow  went from fixed string! storing connect words  to not fixed string and shaved a sec off time!!!
  52.     DIM storeIndexs(100)
  53.     DIM z4(10000, 100) AS STRING
  54.     FI = 1
  55.     wordLength = LEN(q1$)
  56.     IF wordLength < 5 THEN q1$ = q1$ + SPACE$(5 - wordLength): q2$ = q2$ + SPACE$(5 - wordLength)
  57.     WHILE FI <= UBF
  58.         IF LEN(Fwords$(FI)) = wordLength THEN
  59.             maxWordIndex = maxWordIndex + 1
  60.             IF Fwords$(FI) = q1$ THEN w(maxWordIndex) = "*****" ELSE w(maxWordIndex) = Fwords$(FI)
  61.         END IF
  62.         FI = FI + 1
  63.     WEND
  64.  
  65.     'q2$ needs to have at least one connect or skip to end
  66.     '(this block will add a little more time to each ladder but save over a sec on adult or any target word with no connects)
  67.     FOR i = 1 TO maxWordIndex
  68.         IF w(i) <> q2$ THEN ' just check before entering loop
  69.             cnt = 0
  70.             FOR j = 1 TO wordLength
  71.                 IF ASC(w(i), j) <> ASC(q2$, j) THEN cnt = cnt + 1
  72.             NEXT j
  73.             IF cnt = 1 THEN ' q2$ has a connect good to go
  74.                 targetOK = -1: EXIT FOR
  75.             END IF
  76.         END IF
  77.     NEXT i
  78.     IF targetOK = 0 THEN PRINT "No path found! from "; q1$; " to "; q2$;: GOTO skip
  79.  
  80.     ' carry on
  81.     jk = 1
  82.     storeIndexs(jk) = 1
  83.     store$(1, 1) = q1$
  84.     DO
  85.         FOR i = 1 TO maxWordIndex
  86.             IF w(i) <> "*****" THEN '_Continue '  500  just check before entering loop
  87.                 cnt = 0
  88.                 FOR kk = 1 TO storeIndexs(jk)
  89.                     cnt = 0
  90.                     FOR j = 1 TO wordLength
  91.                         IF ASC(w(i), j) = ASC(store$(kk, jk), j) THEN cnt = cnt + 1 ELSE zz = j
  92.                     NEXT j
  93.                     IF cnt = wordLength - 1 THEN
  94.                         t = jk + 1
  95.                         storeIndexs(t) = storeIndexs(t) + 1
  96.                         store$(storeIndexs(t), t) = w(i)
  97.                         z4(storeIndexs(t), t) = z4(kk, jk) + MID$(w(i), zz, 1) + CHR$(zz + 48) + " " 'stores a letter and change position
  98.                         w(i) = "*****"
  99.                     END IF
  100.                 NEXT kk
  101.             END IF
  102.         NEXT i
  103.         kflag = 0
  104.         ''*****new routine!!!
  105.         cnu = 0
  106.         FOR i = storeIndexs(t) TO 1 STEP -1 ' b+ reversed this for shortest path alphabetically
  107.             cnu = 0
  108.             FOR iq = 1 TO wordLength
  109.                 IF ASC(store$(i, t), iq) = ASC(q2$, iq) THEN cnu = cnu + 1
  110.             NEXT iq
  111.             IF cnu = wordLength - 1 THEN kflag = 99: final$ = z4(i, t)
  112.         NEXT i
  113.  
  114.         IF storeIndexs(t) = 0 THEN kflag = 99
  115.         jk = jk + 1
  116.         IF kflag = 99 THEN EXIT DO
  117.     LOOP
  118.     IF storeIndexs(jk) = 0 THEN PRINT "No path found! from "; q1$; " to "; q2$; ' b+ removed a print blank line
  119.  
  120.     IF storeIndexs(jk) > 0 THEN 'this block works the next (commented out) wont
  121.         xlen = LEN(final$)
  122.         'Print:
  123.         t$ = q1$
  124.         FOR i = 1 TO xlen STEP 3
  125.             c1$ = MID$(final$, i, 1)
  126.             c2$ = MID$(final$, i + 1, 1)
  127.             MID$(q1$, VAL(c2$), 1) = c1$
  128.             t$ = t$ + " " + q1$
  129.         NEXT i
  130.         PRINT t$ + " " + q2$
  131.     END IF
  132.  
  133.  
  134.     'If storeIndexs(jk) > 0 Then ' this is Steve's substitution but wont work here  ???
  135.     '    xlen = Len(final$)
  136.     '    t$ = q1$ + " "
  137.     '    For i = 1 To xlen Step 3
  138.     '        c1 = Asc(final$, i)
  139.     '        c2 = Asc(final$, i + 1)
  140.     '        Asc(q1$, c2) = c1 ' >>>>>>>>>>>>>>> errors on this line    ???
  141.     '        t$ = t$ + q1$ + " "
  142.     '    Next i
  143.     '    Print t$
  144.     'End If
  145.  
  146.  
  147.     skip:
  148.     PRINT "time taken = "; TIMER(.001) - tt!; " seconds"
  149.  
  150. SUB Split (SplitMeString AS STRING, delim AS STRING, loadMeArray() AS STRING)
  151.     DIM curpos AS LONG, arrpos AS LONG, LD AS LONG, dpos AS LONG 'fix use the Lbound the array already has
  152.     curpos = 1: arrpos = LBOUND(loadMeArray): LD = LEN(delim)
  153.     dpos = INSTR(curpos, SplitMeString, delim)
  154.     DO UNTIL dpos = 0
  155.         loadMeArray(arrpos) = MID$(SplitMeString, curpos, dpos - curpos)
  156.         arrpos = arrpos + 1
  157.         IF arrpos > UBOUND(loadMeArray) THEN REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO UBOUND(loadMeArray) + 1000) AS STRING
  158.         curpos = dpos + LD
  159.         dpos = INSTR(curpos, SplitMeString, delim)
  160.     LOOP
  161.     loadMeArray(arrpos) = MID$(SplitMeString, curpos)
  162.     REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO arrpos) AS STRING 'get the ubound correct
  163.  

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3679
    • Steve’s QB64 Archive Forum
Re: Word ladder - Rosetta Code
« Reply #62 on: September 14, 2021, 10:16:30 PM »
You’re getting the error you commented on from this line:

                        z4(storeIndexs(t), t) = z4(kk, jk) + MID$(w(i), zz, 1) + CHR$(zz + 48) + " " 'stores a letter and change position

Change CHR$(zz + 48) to just CHR$(zz) as I mentioned previously.with:

“I also swapped out the CHR$(zz + 48) to simply store the CHR$(zz), and then used ASC(final$, i + 1) to get that value back directly, rather than having to store it as a string and then take the VAL of it...”

By just storing zz, we can turn these 2 lines :

           c2$ = MID$(final$, i + 1, 1)
            MID$(q1$, VAL(c2$), 1) = c1$

Into:
     c2 = Asc(final$, i + 1)
    Asc(q1$, c2) = c1 ' >>>>>>>>>>>>>>> errors on this line    ???

We remove the use of VAL completely, and swap 2 MID$ for 2 ASC commands, as well as making c2 an integer instead of a string.



Basically iclly you’re storing numeric values as string characters “1”, “2”, “3”, while I’m storing them as ASCII characters CHR$(1), CHR$(2), CHR$(3).

You get your return values back via VAL(MID$…) while I get mine back via ASC.
« Last Edit: September 14, 2021, 10:21:58 PM by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #63 on: September 14, 2021, 10:33:26 PM »
Ah goot getting rid of 48 works!

Looks like string concatenation takes a tiny bit longer than just printing words? Maybe?
Code: QB64: [Select]
  1. _TITLE "Word ladder v david_uwi 2021-09-14" '2021-09-10 david has already improved his first version!
  2. ' ref:  https://www.qb64.org/forum/index.php?topic=4157.msg135293#msg135293
  3. ' get original version above, I am modifying below to study
  4. ' 2021-09-11 in v3 b+ mod david_uwi I modify to compare whole set to my orignal whole set
  5. ' 2021-09-11 in v4 I think I can load the words faster
  6. ' 2021-09-12 in v5 lets just call the external file once, use david's idea instead of _continue,
  7. ' I think it looks cleaner without _continue which tool out skipping a For loop by a THEN (GOTO) line #.
  8. ' Ave of 5 runs before this (mod 4) was 11.27.. see if we improve still more, also DefLng A-Z  oh wow.
  9. ' Ave of 5 runs now 8.74 another 2.5 secs shaved off
  10. ' 2021-09-12 in v6 though using asc instead of mid$ might go faster.
  11. ' Ave of 5 runs now is 4.45 secs another 4.29 sec off!
  12.  
  13. ' 2021-09-13 v7 b+ mod david_uwi = v6 b+ mod but I will attempt to translate letter variables into
  14. ' more meaningful in attempts to understand what is being done in his code.
  15. ' Shaved significant time by making store$ (used to be q4) not fixed!
  16. ' Yes run a quick check that target word connects to something does save us so much time that
  17. ' we get a better overall time for the whole set even though we added a tiny bit of time to
  18. ' the ones that do connect.
  19. ' Ave of 5 runs is now 2.57 sec another 1.88 secs shaved off.
  20.  
  21. ' Steve's mods, maily with $checking:off saves about .3 sec for me with ave 2.27
  22.  
  23. ' v david_uwi 2021-09-14 tried to work in Steves mods with david_uwi's latest improvement
  24. ' plus I insist the shortest path is the one that sorts out first alphabetically :)
  25. ' Running an average of 2.01 secs for my (bplus) system.
  26.  
  27. DEFLNG A-Z ' 2 secs off from old Single default!
  28. REDIM SHARED Fwords$(1 TO 1), UBF ' ubound of Fwords$
  29. start! = TIMER(.001)
  30.  
  31. ' do just once for all ladder calls
  32. OPEN "unixdict.txt" FOR BINARY AS #1 ' take the file in a gulp
  33. buf$ = SPACE$(LOF(1))
  34. GET #1, , buf$
  35. Split buf$, CHR$(10), Fwords$()
  36. UBF = UBOUND(fwords$)
  37.  
  38. ' test set of ladder calls
  39. ladder "boy", "man" '     quick
  40. ladder "girl", "lady" '   this takes awhile
  41. ladder "john", "jane" '   quick enough
  42. ladder "alien", "drool" ' cool but takes a long long time!
  43. ladder "child", "adult" ' and this takes awhile
  44. ladder "play", "ball" '   goes quick
  45. ladder "fun", "job" '     ditto
  46. PRINT: PRINT "Total time including one disk file access:"; TIMER(.001) - start!
  47.  
  48. SUB ladder (q1$, q2$)
  49.     tt! = TIMER(.001) 'include time taken to load to RAM bplus mod to accuracy (.001)
  50.     DIM w(10000) AS STRING * 5 ' words from file not String * 5 'make sure we have enough storage!!
  51.     DIM store$(10000, 100) ' wow  went from fixed string! storing connect words  to not fixed string and shaved a sec off time!!!
  52.     DIM storeIndexs(100)
  53.     DIM z4(10000, 100) AS STRING
  54.     FI = 1
  55.     wordLength = LEN(q1$)
  56.     IF wordLength < 5 THEN q1$ = q1$ + SPACE$(5 - wordLength): q2$ = q2$ + SPACE$(5 - wordLength)
  57.     WHILE FI <= UBF
  58.         IF LEN(Fwords$(FI)) = wordLength THEN
  59.             maxWordIndex = maxWordIndex + 1
  60.             IF Fwords$(FI) = q1$ THEN w(maxWordIndex) = "*****" ELSE w(maxWordIndex) = Fwords$(FI)
  61.         END IF
  62.         FI = FI + 1
  63.     WEND
  64.  
  65.     'q2$ needs to have at least one connect or skip to end
  66.     '(this block will add a little more time to each ladder but save over a sec on adult or any target word with no connects)
  67.     FOR i = 1 TO maxWordIndex
  68.         IF w(i) <> q2$ THEN ' just check before entering loop
  69.             cnt = 0
  70.             FOR j = 1 TO wordLength
  71.                 IF ASC(w(i), j) <> ASC(q2$, j) THEN cnt = cnt + 1
  72.             NEXT j
  73.             IF cnt = 1 THEN ' q2$ has a connect good to go
  74.                 targetOK = -1: EXIT FOR
  75.             END IF
  76.         END IF
  77.     NEXT i
  78.     IF targetOK = 0 THEN PRINT "No path found! from "; q1$; " to "; q2$;: GOTO skip
  79.  
  80.     ' carry on
  81.     jk = 1
  82.     storeIndexs(jk) = 1
  83.     store$(1, 1) = q1$
  84.     DO
  85.         FOR i = 1 TO maxWordIndex
  86.             IF w(i) <> "*****" THEN '_Continue '  500  just check before entering loop
  87.                 cnt = 0
  88.                 FOR kk = 1 TO storeIndexs(jk)
  89.                     cnt = 0
  90.                     FOR j = 1 TO wordLength
  91.                         IF ASC(w(i), j) = ASC(store$(kk, jk), j) THEN cnt = cnt + 1 ELSE zz = j
  92.                     NEXT j
  93.                     IF cnt = wordLength - 1 THEN
  94.                         t = jk + 1
  95.                         storeIndexs(t) = storeIndexs(t) + 1
  96.                         store$(storeIndexs(t), t) = w(i)
  97.                         z4(storeIndexs(t), t) = z4(kk, jk) + MID$(w(i), zz, 1) + CHR$(zz) + " " 'stores a letter and change position
  98.                         w(i) = "*****"
  99.                     END IF
  100.                 NEXT kk
  101.             END IF
  102.         NEXT i
  103.         kflag = 0
  104.         ''*****new routine!!!
  105.         cnu = 0
  106.         FOR i = storeIndexs(t) TO 1 STEP -1 ' b+ reversed this for shortest path alphabetically
  107.             cnu = 0
  108.             FOR iq = 1 TO wordLength
  109.                 IF ASC(store$(i, t), iq) = ASC(q2$, iq) THEN cnu = cnu + 1
  110.             NEXT iq
  111.             IF cnu = wordLength - 1 THEN kflag = 99: final$ = z4(i, t)
  112.         NEXT i
  113.  
  114.         IF storeIndexs(t) = 0 THEN kflag = 99
  115.         jk = jk + 1
  116.         IF kflag = 99 THEN EXIT DO
  117.     LOOP
  118.     IF storeIndexs(jk) = 0 THEN PRINT "No path found! from "; q1$; " to "; q2$; ' b+ removed a print blank line
  119.  
  120.     'If storeIndexs(jk) > 0 Then 'this block works the next (commented out) wont
  121.     '    xlen = Len(final$)
  122.     '    'Print:
  123.     '    t$ = q1$
  124.     '    For i = 1 To xlen Step 3
  125.     '        c1$ = Mid$(final$, i, 1)
  126.     '        c2$ = Mid$(final$, i + 1, 1)
  127.     '        Mid$(q1$, Val(c2$), 1) = c1$
  128.     '        t$ = t$ + " " + q1$
  129.     '    Next i
  130.     '    Print t$ + " " + q2$
  131.     'End If
  132.  
  133.  
  134.     IF storeIndexs(jk) > 0 THEN ' this is Steve's substitution
  135.         xlen = LEN(final$)
  136.         t$ = q1$ + " "
  137.         FOR i = 1 TO xlen STEP 3
  138.             c1 = ASC(final$, i)
  139.             c2 = ASC(final$, i + 1)
  140.             ASC(q1$, c2) = c1
  141.             t$ = t$ + q1$ + " "
  142.         NEXT i
  143.         PRINT t$ + " " + q2$
  144.     END IF
  145.  
  146.  
  147.     skip:
  148.     PRINT "time taken = "; TIMER(.001) - tt!; " seconds"
  149.  
  150. SUB Split (SplitMeString AS STRING, delim AS STRING, loadMeArray() AS STRING)
  151.     DIM curpos AS LONG, arrpos AS LONG, LD AS LONG, dpos AS LONG 'fix use the Lbound the array already has
  152.     curpos = 1: arrpos = LBOUND(loadMeArray): LD = LEN(delim)
  153.     dpos = INSTR(curpos, SplitMeString, delim)
  154.     DO UNTIL dpos = 0
  155.         loadMeArray(arrpos) = MID$(SplitMeString, curpos, dpos - curpos)
  156.         arrpos = arrpos + 1
  157.         IF arrpos > UBOUND(loadMeArray) THEN REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO UBOUND(loadMeArray) + 1000) AS STRING
  158.         curpos = dpos + LD
  159.         dpos = INSTR(curpos, SplitMeString, delim)
  160.     LOOP
  161.     loadMeArray(arrpos) = MID$(SplitMeString, curpos)
  162.     REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO arrpos) AS STRING 'get the ubound correct
  163.  
  164.  

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #64 on: September 14, 2021, 10:47:00 PM »
Quote
Looks like string concatenation takes a tiny bit longer than just printing words? Maybe?

Eh, not? The runs are too close in times.

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3679
    • Steve’s QB64 Archive Forum
Re: Word ladder - Rosetta Code
« Reply #65 on: September 14, 2021, 11:15:25 PM »
Eh, not? The runs are too close in times.

Maybe a difference in SCREEN 0 vs SCREEN 32?  A single time printing is about .1 to .2 seconds faster on my machine, when I tested it.

Or a difference in graphics cards?
« Last Edit: September 14, 2021, 11:16:47 PM by SMcNeill »
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline david_uwi

  • Newbie
  • Posts: 69
Re: Word ladder - Rosetta Code
« Reply #66 on: September 15, 2021, 02:58:07 AM »
I'm not sure that checking the last word for a path is a good idea. It works in this case, but I think that ADULT is something of an anomaly as there is no one letter substitution that will form a word - how often is that going to happen?

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3679
    • Steve’s QB64 Archive Forum
Re: Word ladder - Rosetta Code
« Reply #67 on: September 15, 2021, 04:44:22 AM »
I'm not sure that checking the last word for a path is a good idea. It works in this case, but I think that ADULT is something of an anomaly as there is no one letter substitution that will form a word - how often is that going to happen?

A lot.  Check the list Iposted on the other page.

acts, ally, ohm, ugh….  There’s a ton of them!

Just about any word with more than 15 letters can be automatically eliminated, as they only connect to one partner word directly.  (Example: underclassmAn ONLY connects to underclassmEn.) It’s hard to claim there’s any “chain” between them when they connect directly.



One thing I’m curious about is why there’s a need to sort and copy the list over and over each time.

    WHILE FI <= UBF
        IF LEN(Fwords$(FI)) = wordLength THEN
            maxWordIndex = maxWordIndex + 1
            IF Fwords$(FI) = q1$ THEN w(maxWordIndex) = "*****" ELSE w(maxWordIndex) = Fwords$(FI)
        END IF
        FI = FI + 1
    WEND

Why not sort the wordlist by length ONCE at the time when you load the data in from the disk, rather than repeatedly each time it’s called?

(Pseudocode follows)
OPEN file$
GET word
wordnum(LEN(word)) = wordnum(LEN(word)) + 1 ‘counter for words of same length
wordlist(LEN(word),wordnum) = word ‘2d array to store words by length, wordnum
CLOSE

Words are then sorted in one pass and don’t have to be ever again.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline SMcNeill

  • QB64 Developer
  • Forum Resident
  • Posts: 3679
    • Steve’s QB64 Archive Forum
Re: Word ladder - Rosetta Code
« Reply #68 on: September 15, 2021, 05:02:09 AM »
A second thing I’m curious about is with regards to why various data isn’t removed completely from the word lists at load time?  The routine that you guys are optimizing is obviously only intended to work with a very specific subset of the data, so why keep all the unnecessary data?

For example, the solved list you were generating stored values in a letter + position combo.  (b2 a1 y3, for example when going from “boy” to “bay”) 

At this point, you’re only storing a single digit for letter position, so you’ve decided to automatically reduce the dataset to words shorter than 10 letters.

Why not filter those out automatically to save processing them repeatedly?



      IF Fwords$(FI) = q1$ THEN w(maxWordIndex) = "*****" ELSE w(maxWordIndex) = Fwords$(FI)

With the way the above is configured, is it even possible to process a 6 letter word?  “*” is used when a letter matches the desired position, correct?  Or am I reading what’s going on here wrongly?

Seems to me that 6 or seven letter words wouldn’t work just because they have too many letters.  Will this work to see if “cheese” can turn into “butter”?

If not, then the dataset could be permanently reduced down to < 6, and not just < 10.
https://github.com/SteveMcNeill/Steve64 — A github collection of all things Steve!

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #69 on: September 15, 2021, 11:28:58 AM »
Quote
One thing I’m curious about is why there’s a need to sort and copy the list over and over each time.

    WHILE FI <= UBF
        IF LEN(Fwords$(FI)) = wordLength THEN
            maxWordIndex = maxWordIndex + 1
            IF Fwords$(FI) = q1$ THEN w(maxWordIndex) = "*****" ELSE w(maxWordIndex) = Fwords$(FI)
        END IF
        FI = FI + 1
    WEND

Why not sort the wordlist by length ONCE at the time when you load the data in from the disk, rather than repeatedly each time it’s called?

(Pseudocode follows)
OPEN file$
GET word
wordnum(LEN(word)) = wordnum(LEN(word)) + 1 ‘counter for words of same length
wordlist(LEN(word),wordnum) = word ‘2d array to store words by length, wordnum
CLOSE

Words are then sorted in one pass and don’t have to be ever again.

OK I did have the file unloaded into an array of word length strings but building those strings is concatenation and that takes time. I can't use w3$(), w4$(), w5$()... arrays because that would make it a nightmare using david_uwi's algo for all those different arrays.

Plus david_uwi's algo seems to need to mark the start word with "*****" (which BTW can be generalized to string$(wordLength, "*") for trying words > 5 letters to answer a question in next reply) so when we are copying  words into an array, we catch that starring of start word in the process, other wise we'd have to go through the list of wN$() find and star that word and then unstar it for next puzzle with same amount of letters.


Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #70 on: September 15, 2021, 11:46:36 AM »
It is right not to build the solution exclusively to solve the particular test set of words to ladder. If a set of ladder words had more than a couple repeat word lengths it may save time running such a set by doing the word length strings and splitting that string in the ladder sub. I think it would take allot more than 2 puzzles of same word length to realize savings and in meantime the times for 1 or 2 puzzles of same letter length times will suffer.

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #71 on: September 16, 2021, 01:41:03 PM »
@david_uwi, @SMcNeill, @johnno56 and whoever else following this thread:

Are you ready to have your socks knocked off?

I now have Word Ladder code with david_uwi's algorithm generalized to any N letter words.
Here I have added
"cheese", "butter"  (doesn't connect)
and
"seaman", "skater" (does connect)
to our test of set and average .67 secs for the entire set!!!

Code: QB64: [Select]
  1. _TITLE "Word ladder v 2021-09-16" '2021-09-10 david has already improved his first version!
  2. ' ref:  https://www.qb64.org/forum/index.php?topic=4157.msg135293#msg135293
  3. ' get original version above, I am modifying below to study
  4. ' 2021-09-11 in v3 b+ mod david_uwi I modify to compare whole set to my orignal whole set
  5. ' 2021-09-11 in v4 I think I can load the words faster
  6. ' 2021-09-12 in v5 lets just call the external file once, use david's idea instead of _continue,
  7. ' I think it looks cleaner without _continue which tool out skipping a For loop by a THEN (GOTO) line #.
  8. ' Ave of 5 runs before this (mod 4) was 11.27.. see if we improve still more, also DefLng A-Z  oh wow.
  9. ' Ave of 5 runs now 8.74 another 2.5 secs shaved off
  10. ' 2021-09-12 in v6 though using asc instead of mid$ might go faster.
  11. ' Ave of 5 runs now is 4.45 secs another 4.29 sec off!
  12.  
  13. ' 2021-09-13 v7 b+ mod david_uwi = v6 b+ mod but I will attempt to translate letter variables into
  14. ' more meaningful in attempts to understand what is being done in his code.
  15. ' Shaved significant time by making store$ (used to be q4) not fixed!
  16. ' Yes run a quick check that target word connects to something does save us so much time that
  17. ' we get a better overall time for the whole set even though we added a tiny bit of time to
  18. ' the ones that do connect.
  19. ' Ave of 5 runs is now 2.57 sec another 1.88 secs shaved off.
  20.  
  21. ' Steve's mods, maily with $checking:off saves about .3 sec for me with ave 2.27
  22.  
  23. ' v david_uwi 2021-09-14 tried to work in Steves mods with david_uwi's latest improvement
  24. ' plus I insist the shortest path is the one that sorts out first alphabetically :)
  25. ' Running an average of 2.01 secs for my (bplus) system.
  26.  
  27. ' v 2021-09-16 can we get 6 letters going?  I can't seem to get more that a 3 word connect by eye
  28. ' ah goot! seaman to skater works  Hurray! with the added 6 letter words to ladder it is taking even
  29. ' less time than before I generalized to do more than 5 letter words. This is because w() array no
  30. ' longer is fixed String * 5 but now variable length. Also reason #3 why we have to load the n letter
  31. ' words every time is because we change w(i) with starUsed$ signal as we go through it to
  32. ' indicate we've processed this word already. Average 1.49 secs per run!!!
  33. ' Holy moley! another 2 huge cuts in time!!! both storage and changes arrays don't have to be 10000 long!
  34. ' Now .67 secs average on my (b+) older laptop
  35.  
  36. DEFLNG A-Z ' 2 secs off from old Single default!
  37. REDIM SHARED Fwords$(1 TO 1), UBF ' ubound of Fwords$
  38. start! = TIMER(.001)
  39.  
  40. ' do just once for all ladder calls, Fwords$() contain the entire list of dictionary
  41. OPEN "unixdict.txt" FOR BINARY AS #1 ' take the file in a gulp
  42. buf$ = SPACE$(LOF(1))
  43. GET #1, , buf$
  44. Split buf$, CHR$(10), Fwords$()
  45. UBF = UBOUND(fwords$) ' track the ubound of Fwords$
  46.  
  47. ' test set of ladder calls
  48. ladder "boy", "man" '     quick
  49. ladder "girl", "lady" '   this takes awhile
  50. ladder "john", "jane" '   quick enough
  51. ladder "alien", "drool" ' cool but takes a long long time!
  52. ladder "child", "adult" ' and this takes awhile
  53. ladder "play", "ball" '   goes quick
  54. ladder "fun", "job" '     ditto
  55. ' These 6 letter words added to our test set to show it has been generalized past 5 letter words.
  56. ladder "cheese", "butter" ' Steve challnges to do more than 5 letter words  not going to connect
  57. ladder "seaman", "skater" ' I think this will connect
  58. PRINT: PRINT "Total time including one disk file access:"; TIMER(.001) - start!
  59.  
  60. SUB ladder (q1$, q2$)
  61.     tt! = TIMER(.001) ' time each ladder call, doesn't include one time download of words to Fwords$()
  62.     DIM w(10000) AS STRING '* 5   <<< no fixed string huge time savings!!!! w() contains all words of Len(q1$)
  63.     DIM store$(100, 100) ' wow  went from fixed string storing connect words to not fixed string and shaved a sec off time!!!
  64.     DIM storeIndexs(100) ' tracking indexes to changes
  65.     DIM changes(100, 100) AS STRING ' tracking the change letter and position going from one word to next
  66.     ' does changes have to be 10000? no! and another huge time cut!!!
  67.     ' does store$ have to be 10000? no! and another huge time cut!!!
  68.     FI = 1
  69.     wordLength = LEN(q1$)
  70.     'If wordLength < 5 Then q1$ = q1$ + Space$(5 - wordLength): q2$ = q2$ + Space$(5 - wordLength)
  71.     starUsed$ = STRING$(wordLength, "*") ' this is signal that word is used
  72.     WHILE FI <= UBF
  73.         IF LEN(Fwords$(FI)) = wordLength THEN
  74.             maxWordIndex = maxWordIndex + 1
  75.             IF Fwords$(FI) = q1$ THEN w(maxWordIndex) = starUsed$ ELSE w(maxWordIndex) = Fwords$(FI)
  76.         END IF
  77.         FI = FI + 1
  78.     WEND
  79.  
  80.     'q2$ needs to have at least one connect or skip to end
  81.     '(this block will add a little more time to each ladder but save over a sec on adult or any target word with no connects)
  82.     FOR i = 1 TO maxWordIndex
  83.         IF w(i) <> q2$ THEN ' just check before entering loop
  84.             cnt = 0
  85.             FOR j = 1 TO wordLength
  86.                 IF ASC(w(i), j) <> ASC(q2$, j) THEN cnt = cnt + 1
  87.             NEXT j
  88.             IF cnt = 1 THEN ' q2$ has a connect good to go
  89.                 targetOK = -1: EXIT FOR
  90.             END IF
  91.         END IF
  92.     NEXT i
  93.     IF targetOK = 0 THEN PRINT "No path found! from "; q1$; " to "; q2$: GOTO skip
  94.  
  95.     ' carry on with daid_uwi's original algo modified by b+ for more general use and speed help from SMcNeill
  96.     jk = 1
  97.     storeIndexs(jk) = 1
  98.     store$(1, 1) = q1$
  99.     DO
  100.         FOR i = 1 TO maxWordIndex
  101.             IF w(i) <> starUsed$ THEN
  102.                 cnt = 0
  103.                 FOR kk = 1 TO storeIndexs(jk)
  104.                     cnt = 0
  105.                     FOR j = 1 TO wordLength
  106.                         IF ASC(w(i), j) = ASC(store$(kk, jk), j) THEN cnt = cnt + 1 ELSE zz = j
  107.                     NEXT j
  108.                     IF cnt = wordLength - 1 THEN
  109.                         t = jk + 1
  110.                         storeIndexs(t) = storeIndexs(t) + 1
  111.                         store$(storeIndexs(t), t) = w(i)
  112.                         changes(storeIndexs(t), t) = changes(kk, jk) + MID$(w(i), zz, 1) + CHR$(zz) + " " ' try Steve's T substitution version
  113.                         w(i) = starUsed$
  114.                     END IF
  115.                 NEXT kk
  116.             END IF
  117.         NEXT i
  118.         kflag = 0
  119.         ''*****new routine!!! by david_uwi
  120.         cnu = 0
  121.         FOR i = storeIndexs(t) TO 1 STEP -1 ' b+ reversed this for shortest path alphabetically
  122.             cnu = 0
  123.             FOR iq = 1 TO wordLength
  124.                 IF ASC(store$(i, t), iq) = ASC(q2$, iq) THEN cnu = cnu + 1
  125.             NEXT iq
  126.             IF cnu = wordLength - 1 THEN kflag = 99: final$ = changes(i, t)
  127.         NEXT i
  128.  
  129.         IF storeIndexs(t) = 0 THEN kflag = 99
  130.         jk = jk + 1
  131.         IF jk > 100 THEN PRINT "No path found! from "; q1$; " to "; q2$: GOTO skip 'b+ added this for words that wont connect else error's out
  132.         IF kflag = 99 THEN EXIT DO
  133.     LOOP
  134.     IF storeIndexs(jk) = 0 THEN PRINT "No path found! from "; q1$; " to "; q2$ ' b+ removed a print blank line
  135.     IF storeIndexs(jk) > 0 THEN ' this is Steve's t substitution for david_uwi's jk + 1 using Asc instead of Mid$
  136.         xlen = LEN(final$)
  137.         t$ = q1$
  138.         FOR i = 1 TO xlen STEP 3
  139.             c1 = ASC(final$, i)
  140.             c2 = ASC(final$, i + 1)
  141.             ASC(q1$, c2) = c1
  142.             t$ = t$ + " " + q1$
  143.         NEXT i
  144.         PRINT t$; " "; q2$
  145.     END IF
  146.     skip:
  147.     PRINT "time taken = "; TIMER(.001) - tt!; " seconds"
  148.  
  149. SUB Split (SplitMeString AS STRING, delim AS STRING, loadMeArray() AS STRING)
  150.     DIM curpos AS LONG, arrpos AS LONG, LD AS LONG, dpos AS LONG 'fix use the Lbound the array already has
  151.     curpos = 1: arrpos = LBOUND(loadMeArray): LD = LEN(delim)
  152.     dpos = INSTR(curpos, SplitMeString, delim)
  153.     DO UNTIL dpos = 0
  154.         loadMeArray(arrpos) = MID$(SplitMeString, curpos, dpos - curpos)
  155.         arrpos = arrpos + 1
  156.         IF arrpos > UBOUND(loadMeArray) THEN REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO UBOUND(loadMeArray) + 1000) AS STRING
  157.         curpos = dpos + LD
  158.         dpos = INSTR(curpos, SplitMeString, delim)
  159.     LOOP
  160.     loadMeArray(arrpos) = MID$(SplitMeString, curpos)
  161.     REDIM _PRESERVE loadMeArray(LBOUND(loadMeArray) TO arrpos) AS STRING 'get the ubound correct
  162.  
  163.  

 

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #72 on: September 16, 2021, 01:45:33 PM »
Ha! if I do child to adult 10,000 more times maybe it wont take any time. ;-))

BTW thanks to Steve's post of all the word connects I was able to guess that seaman to skater might be a good 6 letter test.

Is it the best? (best = longest path you can get for 6 letter words)
« Last Edit: September 16, 2021, 01:50:25 PM by bplus »

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #73 on: September 16, 2021, 04:45:50 PM »
Nutz, girl to lady picked up 2 words and alien to drool picked up one.

Offline bplus

  • Forum Resident
  • Posts: 7129
  • b = b + ...
Re: Word ladder - Rosetta Code
« Reply #74 on: September 16, 2021, 05:08:09 PM »
Aha! If I run it without error checking it skips over places where the subscript goes over the array size, with error checking the change and store arrays are clearly seen as too small.