rjbs forgot what he was saying

not logged in (root) | by date | tagcloud | help | login

RSS feed entries

collapse entry bodies

learning some new languages (maybe) (body)

by rjbs, created 2014-07-16 22:44
tagged with: @markup:md journal programming

I wanted to make an effort to learn some more languages, old and new, more or less continually as time went on. I started with Forth and Go, and did a tolerable job at getting the basics of each down. I didn't do so well at writing anything of consequence, which isn't too surprising for Forth, but I'm pretty sure I could be writing a lot more Go to get things done. I really do mean to get back to that.

Although I feel like I'm only about 85% done with what I wanted to do with Forth, I'm thinking about what I'll look at next.

On one hand, Pragmatic Press is putting out Seven More Languages in Seven Weeks. The previous book covered Clojure, Haskell, Io, Prolog, Scala, Erlang, and Ruby. I thought it was just okay, but in part it was because I knew the languages well enough to see how the chapters could've been better. The new book is languages that I know, at best, by reputation: Lua, Factor, Elixir, Elm, Julia, MiniKanren, and Idris. Even if the book's only so-so, it's enough to get me going on a few weird-o languages. Also, from Forth to Factor? Woo!

On the subject of useless-but-influential languages, I just ended up with a pile of books on the matter. Stevan Little, Moose author and devotee of all programming languages past and present, is moving to the Netherlands, and couldn't take all his stuff. "You know who'd like a book on Algol-60?" he asked him self. "Rik!"

books from stevan

I'm not sure where I'll start. The Forth book isn't likely, as I've already got some. PostScript might be out for a while, since it's a bit Forthy on its own. Algol is a good possibility, or maybe Eiffel. (I just finished reading Smalltalk Best Practice Patterns, so I may be in the mood for more "here's how you do OO, kid" books.) I'm very interested in SNOBOL4 patterns, although I'll have to see if I can find an implementation I can run.

Probably all of this will have to wait, though. This weekend, OSCON begins, and I'll be taking in whatever I can there, rather than reading any dusty old books.

I went to YAPC::NA in Orlando! (body)

by rjbs, created 2014-07-09 22:26
last modified 2014-07-10 13:20
tagged with: @markup:md journal perl yapc

A few weeks ago was YAPC in Orland, Florida, and I attended! I already wrote about the amazingly great video recording of the conference. It was great, and I was amazed. I watched some of the videos just today!

I thought I'd write down a couple more things that I remember.

I ate good food.

I like good food, and lucked out finding places to eat at YAPC.

Monday, I found a little banh mi place, Banh Mi Nha Trang. It was great. One sandwich was about three bucks. I ate too much. Walt took a nibble of a pepper and declared it way too hot, so I ate the whole thing. Shortly thereafter, I was seriously considering whether I needed to request medical attention. I toughed it out, though.

On Tuesday, we got Cuban food at Latin Square Cuisine. I really miss having great Cuban food in Bethlehem at the long-gone Cafe Havana. Walt and I split halves of our sandwiches: one Cubano and one Pepito. The Pepito had potato sticks on it, which was a big deal when eating at Cafe Havana. I had a pineapple soda. It was great. (I was strong and avoided getting empanadas or tres leches, because I had to go right from lunch to giving a talk. It was hard, but I was strong.)

Wednesday, we got Asian/Mexican fusion tacos at Tako Cheena. It was the least awesome place, but I had no real problems with it. The water was a bit too cucumberry for me. I'd definitely go again, though. It seemed like a place worth going to try the specials every week.

Wendesday night was the big pay-off, though. Frew and Jerry Gay and I went to Cask & Larder, and it was great. The food was very good and the cocktails were great, and I would've loved to have stayed for another few rounds, except for the whole "alcohol is a poison" thing. I was especially pleased to have a drink that combined green Chartreuse and celery bitters, which I'd known in my heart would be a winning combination. It was!

I didn't get a shirt.

Oops. Frankly, that's fine. I rarely wear YAPC shirts, although the "Chicago flag with sigils" is an exception. That's a great design. I brough home very little swag from YAPC, and that's fine. As time goes on, swag becomes less interesting to me unless it's really useful. Best conference swag ever? Notepads. (OSCON's bookbag for speakers has also been a huge success.) Grant Street gave us USB batteries for charging phones (etc.) on the fly. I haven't tried mine yet, so the jury's still out. It could be a big winner.

I played Ogre!

Once again, there was a YAPC game night. I was taken aback (and flattered) by the number of people who approached me to try to get a seat at my presumed-certain-to-occur D&D game. I hadn't planned one! Whoops. Next year I better get back on the ball!

Instead, I planned to play and teach Steve Jackson's Ogre, a tank battle board game of some fame.

YAPC Game Night

It was a success! I think I lost every game, but I'm not sure. Anyway, I had a great time, and at least 2-3 games of Ogre occurred without me after I showed the table how to play. I realized that one problem with running a game at game night was that I couldn't try all the other cool games people brought, but that was okay. It was good to think that I may have created at least one or two new Ogre players.

I talked to other perl5 porters!

We had a little sit-down in the hotel bar and talked about what's been going on. I previewed the yet-unreleased-at-the-time civility policy and did not get any panicked abort instructions.

I'm not sure what else was discussed, anymore.

I saw some talks!

I did. I also skipped a bunch, knowing that I could watch them later. I finally started doing the "watch stuff later" part today, and look forward to doing some more of it tomorrow. Among other things, I want to see Jesse Luehr's talk on Rust and Scott Walters' talk on programming the Atari 2600, but I'm pretty sure that reviewing the YAPC playlist will remind me of other stuff I really wanted to see.

I did not attend an auction!

So, so grateful.

I stubbed my toe.

I'm really tempted to post a photo of my toe, which still looks just awful, but I will refrain. Nobody needs to see my nasty toe.

I got zero programming done.

Every YAPC, I head out thinking that I'll get a bunch of programming done. It basically never happens. Maybe next year, when I'm confident in the idea that there will be video, I'll just sit in one place and chat and code. This year… I don't think I wrote a single line.

I didn't meet enough new people.

When I wrote about going to !!Con, I said that it was hard to meet people when I didn't know anybody to start with, and that I wanted to try to meet more new people at YAPC. I didn't do a great job of following through with that, although I did better than I might have. It's very hard, I am reminded, to say, "I'm sorry good friend whom I see only once a year, I need to spurn your company to go meet new people!"

I'm not sure I know the middle route, here.

Maybe I can form a marauding band of friends who go befriend new people and continue the pillage until the entire conference is one mass of friendly mayhem.

Sounds good.

YAPC::NA is on YouTube (body)

by rjbs, created 2014-07-02 19:43
last modified 2014-07-02 19:47
tagged with: @markup:md journal perl yapc

I went to YAPC::NA! It was in Orlando, Florida, and I had a lovely time. I'll write more about it later. I wanted to say one thing as soon as possible, though, and I thought I'd say it in isolation from anything else, because I think it's a much more important thing than some may realize.

The immediate availability of YAPC::NA's content as streamable video is an incredibly good thing.

Many conferences have promised that talks would be available online, but in almost every case, I assume that this is not going to happen. It's been too often that I find out that only a few talks will actually go up, or that the only person with the video files has suddenly found a higher calling or… whatever. At YAPC::Asia, they established a great track record, but YAPC::NA didn't have one, so I had no hope.

Then, as each talk started, they were immediately available, live. Because they were streaming live to YouTube, it meant that there was no question about whether it would be uploaded later. It was being uploaded now!

First of all, this meant that people who could not attend the conference could view the material. I view this as a good thing, but I don't even care about selling that. You can think about how amazingly great that is later, on your own time.

What I found amazingly great is that the availability of recordings granted me immense freedom at the conference. There were two talks opposite one another, both of which I wanted to see. I could pick the one where I thought the speaker might want a friendly face, or where I knew I'd be more likely to have questions. Another time, I got into a conversation with Karen Pauley about Perl Foundation business just as a talk that I really wanted to see was beginning. I didn't even have to think about it: I stayed in the hall and finished my conversation, because I could watch the talk later.

The availability of the talks online later meant that I could spend much more time engaged in face-to-face conversations that simply could not happen any other time of the year. It is my great hope that the stellar performance of streaming this year sets a standard to which future conferences must adhere.

I feel like I must anticipate the objection that if all talks are streamed, some talks will get no attendance. I don't think it will really happen, and further, I think that if this is the case, I would suggest that the free market has spoken.

So, to recap: publishing the videos from the conference, and establishing up front the reliable expectation that it will really happen, is amazingly great. Thanks, YAPC::NA 2014!

just how much data did I lose? (body)

by rjbs, created 2014-06-14 22:33
last modified 2014-06-14 23:08
tagged with: @markup:md hardware journal

For a few years, I've kept most of my "stuff" on a two terabyte hard drive in a little tiny desktop computer running Ubuntu. It's got another 2T drive connected via USB, and once in a while I'd run an rsync job. This was basically my whole "storage solution" for my media files: ripped movies, ripped music, and books. Recently, I've been getting close to filling the drive, and I thought I should improve the whole setup with something less ad hoc.

I settled on getting a Synology DS214play NAS. The price seemed pretty good, and it could serve as a DLNA server so I could stop syncing files to an external drive plugged into my Roku. I had two 1T drives sitting around from my last upgrade, and I figured I'd start with those, migrate some files, and then move up to 3T drives if everything went well. This would probably have been a good plan, but it turns out that I just couldn't leave well enough alone.

I started by moving my music collection, which was very roughly around 175 GB. (I bought a lot of CDs in college.) I started by rsyncing the music to an external drive and then moving it from that drive onto the NAS. I encountered one pretty obnoxious problem which would continue to crop up over the rest of the experience. The NAS has a really neat web-based GUI that acts like a standard WIMP interface. To migrate the files in, I'd select the directory I wanted to move over and take "Move to ... [target]". Hours later, it would report completion, but less than 100% of the files would be moved. For example, an album might have been moved with only 8 of its 10 tracks. I'd run several "move" operations in a row, until all files were moved. Unfortunately, sometimes empty directories were left behind, which made it a bit harder to verify what was going on.

This really did not fill me with confidence.

To determine whether I'd gotten everything copied over, I'd run find on both volumes, then compare the output. This would sometimes be a bit less than perfect because I'd get different encodings back for non-ASCII filenames. Worse, I found that some badly-encoded filenames would simply be unavailable via the NAS. I think the problem is that some filenames were double-encoded, but I'm not positive. What I do know is that I'd see "Hasta Ma�ana" in the GUI, but I'd be totally unable to access the directory in any way.

This problem had already been haunting me on my old drive. I think it started because of a Samba upgrade years ago, but it affected relatively few files. Once in a while, iTunes would try to play one and I'd go sort out that directory. This migration gave me a reason to fix them all. I scanned for broken directories. When I found one, I'd delete it on the NAS, fix the filenames (with mv) on my Linux box, and then re-copy it individually.

This is where my first major problem probably crept in.

I had some problems with tracks by Cuban son artist "Compay Segundo," and deleted his artist directory. I believe that I accidentally cmd-clicked the directory below his while working. That directory was "Compilations," where iTunes was storing albums made up of many artists working together. This included things like musicals, soundtracks, and tribute albums. It was about one tenth of my music. I didn't notice that I'd deleted it, and I deleted it while fixing discrepancies. When I finished fixing problems, I didn't do a second comparison, which would've detected this huge loss.

I took my old 2T drive, which had been the rsync backup of the master drive, and slotted it into the NAS. That way, I'd be able to grow the RAID faster, later. Now the master drive was the only copy of "all my stuff."

I fixed some similar encoding problems in my books, but far fewer. At this point, I had a 1T and a 2T drive in the two-bay NAS, acting as a RAID1. I copied my video collection onto an external drive, along with some other random stuff. At this point, I could destroy the master 2T drive by slotting it into the NAS and letting the RAID repair itself, which would get me another terabyte of storage. Then I'd dump the video archive onto it and I'd be done. "Later", I thought, "I can do the upgrade to 3TB drives."

This was stupid. There was no reason to rush other than impatience and a little bit of miserliness. I could have ordered two 3T drives, had them on Tuesday, and rebuilt the RAID in two steps then, never destroying the master data. Instead, I decided that I'd been careful enough and stuck the old master drive into the NAS, utterly destroying the only remaining copy of 10% of my music. Oops.

I realized my error soon enough. While the RAID rebuilt, I decided to play a little music, but whatever it was I picked — I don't remember — it wouldn't play. I went to check what had happened, and I found that not only was the album missing, so was the entire Compilations directory. It didn't take long to realize that I'd lost a whole lot of music.

Fortunately, thanks to iTunes' database, it was easy to print out a listing of lost albums. It filled seven pages, and I went through it, highlighting the things I was interested in re-ripping. This probably totals about half of the lost music. I imagine it will take me weeks to get it all done. I'll also lose all the work I did getting album art and ratings onto things. (I've saved the rating data, but getting it restored later will be a huge pain.)

One of the things I noticed missing didn't make any sense. One track from Bad Religion's "The Gray Race" was missing. Why? It, and no other track, had been flagged as being part of a compilation. Bizarre. While investigating, I noticed some tracks from their "New America" were also missing. Now I began to panic! Had the "not all files copied" bug caused problems? Was I going to find that I was actually missing a huge random selection of all my data?

Kinda.

That is, plenty of stuff seems missing, but when I went back to the big find that I had done on the source data, I find stuff like this:

./Radiohead/Pablo Honey/01 You.mp3
./Radiohead/Pablo Honey/02 Creep.mp3
./Radiohead/Pablo Honey/03 How Do You Do_.mp3
./Radiohead/Pablo Honey/04 Stop Whispering.mp3
./Radiohead/Pablo Honey/06 Anyone Can Play Guitar.mp3
./Radiohead/Pablo Honey/08 Vegetable.mp3
./Radiohead/Pablo Honey/09 Prove Yourself.mp3
./Radiohead/Pablo Honey/11 Lurgee.mp3
./Radiohead/Pablo Honey/12 Blow Out.mp3
./Radiohead/Pablo Honey/13 Creep (Acoustic Version).mp3
./Radiohead/The Bends/02 The Bends.mp3
./Radiohead/The Bends/03 High And Dry.mp3
./Radiohead/The Bends/04 Fake Plastic Trees.mp3
./Radiohead/The Bends/05 Bones.mp3
./Radiohead/The Bends/07 Just.mp3
./Radiohead/The Bends/08 My Iron Lung.mp3
./Radiohead/The Bends/09 Bullet Proof...I Wish I Was.mp3
./Radiohead/The Bends/10 Black Star.mp3
./Radiohead/The Bends/11 Sulk.mp3
./Radiohead/The Bends/12 Street Spirit (Fade Out).mp3
./Radiohead/OK Computer/01 Airbag.mp3
./Radiohead/OK Computer/02 Paranoid Android.mp3
./Radiohead/OK Computer/03 Subterranean Homesick Alien.mp3
./Radiohead/OK Computer/04 Exit Music (For A Film).mp3
./Radiohead/OK Computer/05 Let Down.mp3
./Radiohead/OK Computer/06 Karma Police.mp3
./Radiohead/OK Computer/08 Electioneering.mp3
./Radiohead/OK Computer/09 Climbing Up The Walls.mp3
./Radiohead/OK Computer/10 No Surprises.mp3
./Radiohead/OK Computer/11 Lucky.mp3
./Radiohead/OK Computer/12 The Tourist.mp3

Notice that: Pablo Honey is missing tracks 5, 7, and 10; The Bends is missing tracks 1 and 6; OK Computer is missing track 7. How long have these been missing? I have no idea! It can't be that long, since Ripcord (track 7 of Pablo Honey) is on my iPhone, which was synchronized from this share.

I have no idea how much data has been lost, nor when, but I am just gutted. At least if I'd kept the original drive, I'd be able to go look at something more concrete than a dump of find output to see what was up. I am definitely paying for my stupid, pointless impatience.

I don't think the NAS is actually to blame. If it was, I'm not sure what that would get me, anyway. I burned my own bridge, here. What I need to do next is finish gettin my data onto the NAS, and then build a complete backup just in case.

Finally: in the unlikely event that you recently broke into my home, duplicated my media drive, and now have a backup I don't know about, please let me know. I won't press charges.

UPDATE: Immediately upon lying down in bed, I realized what happened with the randomly missing files.

While migrating, I saw that the rsync from the master drive was syncing not just (for example) ./music/Bad Religion but also ./music/Music/Bad Religion. Surely, I thought, I had at some point partially duplicated the entire music store within itself. A quick look showed that the artists and albums under music were also under music/Music. I deleted the "duplicate" without a thought and promptly forgot about it until just now.

Today, I noticed a song vanish when I played it. Later, it turned out that it had not vanished. iTunes decided to move it. It moved it from my music library in /Volumes/music to the new-style location of /Volumes/music/Music. It has presumably been doing that silently since the most recent upgrade to iTunes. So, when I deleted the nested Music directory, I deleted, from the master, all files that I'd played since upgrading iTunes most recently.

The Great Infocom Replay: The Witness (body)

by rjbs, created 2014-06-07 15:30

It's been over a month since I last tried to do anything on my still-crawling-along (re)playthrough of all of Infocom's games. The next up, for me, was The Witness. I'm going to be straight with you: I didn't play it.

I made a map, and I got to the game's introduction, and I read all of the manual and feelies. I played a few turns of what could fairly be called "the game." The fact of the matter is that I just could not work up any enthusiasm for it at all. I like murder mysteries, but I like to watch them. I never try to solve them ahead of time. It doesn't interest me in the least.

From Suspect, I knew that's what would be expected of me, and I knew it wouldn't be fun.

So I'm skipping it. I played enough to know I wouldn't have any fun at all, and if I ever write more IF, I won't be writing a mystery, so nevermind, okay?

Next up is Planetfall, which I've played before, but I hope to enjoy it a bit more this time. I had very mixed feelings about it, last time.

I played Ogre! (body)

by rjbs, created 2014-05-31 22:22
last modified 2014-05-31 22:22
tagged with: @markup:md games journal

Steve Jackson is a famous game designer and his company produced big hits like GURPS, Munchkin, and Car Wars. Lesser known, to me, was his first game, Ogre. It's a fairly simple tabletop war game, and it had a long and successful life beginning with its launch as a super cheap pocket game in 1977.

In 2012, Steve Jackson Games launched a Kickstarter campaign for a massive "designer's edition" and I somehow ended up backing it. I don't know what got into my head, but I somehow decided that I really wanted to play this game and that I'd rather buy a 28 pound steamer trunk of a game than a cheap edition off of eBay. My copy arrived about six months ago, and has been waiting to be played ever since then.

Ogre!

Actually, it did get played just once, but the game was me against my then six year old daughter. Although she's got some strategic thinking in her, at the time she was just having fun moving the pieces around in random legal moves. It was hard to judge the game from that.

When my long-running D&D 4E campaign petered out, I suggested that maybe we'd start playing board games instead. I picked a date and declared we'd play Ogre. People seemed interested… but then didn't deliver. No surprise, that's why the D&D game fell apart. Fortunately, I'd intentionally overbooked the evening. My brother-in-law was the sole arrival, and that was just fine. Ogre, after all, is a two-player game.

The idea behind Ogre is that in the grim darkness of the future, there is only war. Or, at least, there's a lot of it, and it's being fought, in part, by gigantic artificially-intelligent cybertanks. What could go wrong??

In the basic scenarios, one player has an army composed of about three dozen infantry and armor unit. The other player has a single massive tank — the titular "Ogre." The ogre's job is to reach and destroy the defender's command post, then escape. The defender's job is to prevent one or both of those things from happening. It is tough!

The basic rules are simple and similar to many other war games. Every unit has an attack strength and a defense strength. The ratio of attack to defense strength determines the odds of victory. All combat is resolved with a single roll of a six-sided die. Attackers can combine their attack strength by making joint attacks. For example, two infantry units (1 ATK each) and a hovercraft (2 ATK) can make a single joint attack at strength 4. They attack the ogre's railgun, which has a defense of 6. The ratio is 4:6, which is rounded down (always in the favor of the defender) to 1:2. The attackers will need a 6 to destroy the gun.

Figuring out what to attack, and with what, is tough. You can swarm the ogre with infantry and do some significant damage, but the ogre's antipersonnel weapons will decimate your troops. You can focus on destroying the ogre's mobility, but you'll take a beating from its weapons while you do so. If you focus on destroying its weapons, it can still move, and that means it can still ram you.

Over the years, many more unit types were added to the game. New maps were created with different kinds of terrain that confer different bonuses or penalties. Different scenarios were devised to changed the goals of the game and the resources available to either side. The big box contains everything, and then some. Just finding a raised surface on which one could play with the entirely assembled set of geomorphic maps is no small challenge.

Ogre map!

I'm not sure when I'll get a chance to play more ogre, but the huge game in its huge box is interesting enough to folks that I'm pretty sure I can get more play out of it on that angle alone. Fortunately, too, it came with a pocket edition very much like the 1977 original. That one, I can drop in my messenger bag and take everywhere. I probably will do that for a while, too — at least until I lose too many pieces.

I went to !!con!! (body)

by rjbs, created 2014-05-23 11:15
tagged with: @markup:md journal

Months ago, Mark Jason Dominus said to me, "Hey, I heard about a conference in New York that's going to just be two days of lightning talks!" I thought it sounded cool and promptly forgot about it. As it grew closer, though, I realized that I'd be able to go, and it sounded pretty fun. Tickets were free, but only about 30 were open to the public. I was very lucky to get one in the first pass. Almost everyone I met at the conference had gotten theirs through the wait list.

(Actually, it wasn't just luck. I felt like a bit of a jerk by interrupting a movie night with Gloria to try and buy the ticket right at 20:00, but it only took about a minute, thankfully. I later learned that all the tickets sold in about that one minute. Yow!)

The conference was !!Con, aka Bangbangcon, and talks were meant to address "what excites us about programming." I thought this was a good topic, and the speakers did a good job sticking to the stuff that excited them, which meant we had a lot of excited speakers, and that's a good thing. The topics ranged broadly, from the history of computing to interesting instructions on Intel CPUs to computer-identified accidental poetry.

The best part of the conference, generally speaking, was the speakers' excitement about their topics. In many cases, the topics were not particularly new to me, but it was fun to see how different speakers' excitement would manifest in their talks. I did make a to-do list of things to try or investigate after the conference, and I hope I follow through with the items on it. First up is probably a nice simple one: implement my own LZ77. From there, maybe I'll go on to the next few algorithms in that family.

The talks were all transcribed. At first, like many other attendees, I thought that there was some very good speech-to-text software being used. Later, I learned that Marabai Knight was serving as our stenographer. I've often wondered about stenography, and Mirabai was happy to answer all my questions and to let me peck at the keys on her stenotype. "How much does this machine cost?" I asked, and she told me that while her stenotype ran several grand, she had a project for open source stenography using commodity hardware. That went on the to-do list, too.

The most difficult part of the conference, for me, was socializing. Out of the hundred-odd attendees, I knew one — Mark — who was only there on the second day. I found it difficult to strike up conversations with a bunch of complete strangers, although I did try. In fact, I had a number of nice conversations, but it was difficult and uncomfortable to get started. I'm not sure whether there's anything to be done about that, but it didn't help that it seemed like half of the conference attendees knew each other already.

This experience really made me think again about YAPC and other conferences that I attend where I already know half the attendees and, even if I don't, am in a privileged position by virtue of my position within the community. Remember, fellow conference veterans: go talk to the new people and make them feel welcome. It's important.

It also reminded me of something of which I'm already quite aware: despite futzing about in other languages and with other tools, almost all the "rep" that I have is within the Perl community. This seems silly. I feel like I could make a lot more friends and contacts by just spending a little more effort interacting with the other projects that I am already touching.

The best meal of the conference was at S'Mac: their Parisienne macaroni and cheese, made with brie, roasted figs, roasted shiitakes, and rosemary. I ate too much of it, but only because it was great. Momofuku milk bar, where I went later, was a big disappointment. Both of these were a "group dinner," which was very nice. I think it's easier to start talking to a bunch of new people when the parameters of interaction are pretty well defined. There were six of us. I think I was the only person in the group who didn't already know everyone else, but it was just fine.

If there's a !!Con in the next year or two, I'll try to go again. I'll probably submit a talk, next time, too.

my life as a pile of stuff (body)

by rjbs, created 2014-05-15 22:17
last modified 2014-05-15 22:19
tagged with: @markup:md journal
I took this photo of my desk the other day:

my desk

I was pretty happy with how well it summed up all my hobby activities. I could probably write a blog post on each of these things and feel okay about it. Here's a breakdown of the junk on my desk.

Role-playing game stuff

Other games

Computer stuff

Random stuff to read

I went to DCBPW! (body)

by rjbs, created 2014-05-06 22:15
last modified 2014-05-06 22:16
tagged with: @markup:md dcbpw journal perl

This past weekend was the DC-Baltimore Perl Workshop in Silver Spring, Maryland, and I was in attendance! The venue was good, and the location was awesome, in downtown Silver Spring. Highlights, for me, included:

  • getting to see my family between conference events
  • Nick Patch's talk on CLDR::Number
  • Philip Hood reintroducing me to pentominoes
  • catching up with Mo Chaudhry, whom I last saw in some southern airport
  • lamb tartare for lunch
  • really excellent roast chicken with fingerlings and kale

I gave the closing talk, and was happy with how it went, which was just a bit of a surprise, give how significantly I rewrote it, repeatedly, from my original plans. I was also pleased to finally get a shirt with the way-cool DCBPW logo on it:

Unfortunately, I vainly requested a large, which I've grown a bit out of again. Getting into my cool eaglecrab shirt is just one more thing to motivate me to get back down to 180.

I look forward to the next DC conference!

The Great Infocom Replay: Suspended (body)

by rjbs, created 2014-04-27 22:13

I sat on the idea of writing this replay entry for a long time, because my replay of Suspended was almost necessarily perfunctory. I have played the game many, many times. Before writing this entry, I sat down to do a run after months of not playing it, and beat the game in ten minutes. (I got a lousy score, but I'm pretty sure that with a picture of the map in front of me, I could probably get a perfect score with a few more tries, from memory.)

Actually, I do have one help: I have a text file with these notes in it:

1 - 4/12
2 - 9/14
FOO MUM BLE BAR KLA CON BOZ TRA

Those notes save me about six turns, but they're really boring turns.

I've played Suspended many times. I'd guess at least a hundred times. I've done all the "AMUSING" stuff suggested in the clues. I've read a disassembly of the machine code to see how it was all put together and what I might have missed. (It's impressive!) I've built crude replicas of the game grid and the custom bits of the game, and I still hope to improve on them. Suspended is almost certainly my favorite computer game ever.

I don't know why.

I like the setting and the conceit of the game. The actor is in suspended animation, with only their consciousness active. They direct robots around an underground complex to solve problems (read: puzzles), but each robot is worthless in one way or another. I always think of them this way: not that each one is good at something, but each one stinks at something else. Poet can't carry the wedge and the cutter at the same time. Iris can't leave her home rooms. Despite that, I like them. Their tiny amount of text gives them enough personality to make them endearing.

I like the challenge of not just solving the game, but solving it over and over to figure out the right path of actions.

What else is there to Suspended, though. I don't know. There are plenty of other games with fun problems and good ideas. Maybe it's that I first played Suspended just before I was ready, so it sat around in the back of my brain as an impossibly hard game for a few years before I tried it again. Maybe some part of my subconscious still views beating Suspended as a rite of passage. My conscious brain doesn't think so, though.

Maybe it's that I've never seen another game that was quite the same. Suspended stands alone. I hope that someday I can get myself in gear and build something like it myself, but I feel like it will be a lot harder than I think. I already think it will be pretty hard.

The next game in the replay for me is The Witness. I was pretty disappointed by Deadline, but I'm hoping that they learned a lot about making a better mystery game in the year between the games. We'll see.

changing the rules to change the gameplay (body)

by rjbs, created 2014-04-26 23:09

I still use The Daily Practice to track the things I'm trying to do reliably. I like it. It helps.

It's got a very simple implied philosophy, which is something like "you should always have every streak active." This is good, because it's simple, and it's not some weird method that you have to accept and internalize. It's a bunch of lines, and you should probably keep them solid.

It's worked well for me for nine months or so, but I'm starting to feel like I'm hitting problems I knew I'd hit from the beginning.

The way the scoring works is that every time you "do" a goal, you get a point. Points add up as long as the streak is alive. If you have to do something once a week, and you do it once a week for a year, you end up with 52 points. If you did it twice a week (even though you didn't have to) you end up with 104. Then, if after you score that 104th point, you miss a week, your'e back to zero points. All gone!

Once you're at zero points, it doesn't get any worse. This means that once you've got a streak going, you're really motivated to keep it going, but once it's broken, it's not worth that much to start it again, unless you can keep it going. Another instance of a long-lived unimportant goal is worth a lot more than a streak-starting instance of something you care about.

You don't have to buy into the idea that points are really important to get value of of TDP, but I've tried to, because I thought it would make me feel more motivated. Unfortunately, I think it's motivated me in some of the wrong ways. To fix it, I wanted to make it more important to restart dead goals, and I've made a first pass at a system to do that.

For a long time, I've been bugging the author of TDP, the excellent and inimitable Jay Shirley, to add a way to see a simpler view of a given task's current status. He added it recently, and I got to work. The idea is this:

  • a live goal is worth its length in days, plus the number of times it got done
  • for every day a goal is dead, it's worth one more negative point than the day before

In other words, on the first day it's dead, you lose 1 point. On the second, you lose 2. On the third, 3. The first ten days of missing a goal look like this:

day  1 -  -1
day  2 -  -3
day  3 -  -6
day  4 - -10
day  5 - -15
day  6 - -21
day  7 - -28
day  8 - -36
day  9 - -45
day 10 - -55

This gets pretty brutal pretty fast. For example, here's my scoreboard as of earlier today:

                 review p5p commits:  562
                  catch up with p5p:  547
   get to RSS reader to 10 or lower:  451
                      drink no soda:  348
                  step on the scale:  332
           close some github issues:  191
                     spin my wheels:  160
           review p5p smoke reports:  111
         review and update perlball:   89
               post a journal entry:   48
  have no overdue todo items in RTM:   24
              no unhealthy snacking:   22
                  read things later:   22
                    read literature:   20
                   read unread mail:   17
            respond to flagged mail:   15
            work on my upload queue:   14
        do a session of code-review:    4
              do a writing exercise: - 28
                    play a new game: - 45
           close an old task in RTM: - 45
                    read humanities: - 66
          work on Code Wars program: - 78
          plan the next RPG session: -120
           email the Code Wars list: -253
            read science/technology: -465
make progress on the Infocom Replay: -820
                              TOTAL: 1057

What does this tell me? My score would go up 78% instantly if I'd just make some progress on my "Great Infocom Replay", which I've ignored horribly since declaring I'd do it. (It's been over a year and I've only played six of the thirty-ish games.) In other words: if I make something a goal, I should do it, even if I'm not doing it as frequently as I wanted. If I fall off the wagon, I need to get back on, even if I can't stay on for long.

I'd also wanted to change the result of missing a day. As I said, missing day 1000 of a 999-day streak drops you back to zero. Right now, I get sorely tempted to use "vacation days" as mulligans if I can remotely justify it. That is: the scoring model is driving me to game the system rather than live within its rules. This is my problem and not TDP's, but I'd like to address it. My idea was that each day a goal was dead, I'd lose a fraction of its point. Maybe half, maybe a quarter. This would add up quickly. For example, given that 1000 point streak, it would look like this:

day  1 - 500
day  2 - 250
day  3 - 125
day  4 -  62
day  5 -  31
day  6 -  15
day  7 -   7
day  8 -   3
day  9 -   1
day 10 -   0

Unfortunately, this isn't quite possible using only TDP's available-to-me data. I could implement it if I stored more of my own data locally, but I think I'll put that off for now.

The problem is that I can only see the length of the current streak. To implement the "I'm bleading points" system, I'd need to look before the current streak to see how many points were left over from that. I think I'll be fine without it for now.

I've published the code to compute a list like the one above, in case you use TDP and want to be graded harshly, too.

I still hate email (body)

by rjbs, created 2014-04-18 23:15
tagged with: @markup:md email journal

Last week, Yahoo! changed their DMARC policy. Since that event, I have grown to loathe email even more.

You can find a domain's DMARC policy, if any, by checking DNS:

~$ dig -t txt _dmarc.yahoo.com | grep TXT
;_dmarc.yahoo.com.    IN  TXT
_dmarc.yahoo.com. 1793  IN  TXT "v=DMARC1\; p=reject\; sp=none\; pct=100\;
rua=mailto:dmarc-yahoo-rua@yahoo-inc.com, mailto:dmarc_y_rua@yahoo.com\;"

DMARC gives instructions on how to check whether a message is really from a domain, and how to deal with messages that aren't. First you check a message's DKIM signature and SPF results, then you use the DMARC policy to decide what to do.

DKIM is a kind of digital signature. For example, here's one from a message I sent myself recently:

DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=subject:from
  :to:date:mime-version:content-type:content-transfer-encoding
  :message-id;
  s=sasl; bh=Thy1S1zI40m42mTl74YTuMseXt4=; b=W/XF275Z
  Es+/l8eC+TeiRiBerAmSbYV7zFTTQfxP4dPtws7xVo3bPxb+E1mZ4dQXbzv6b92N
  QREJ9lOSeET42toRjh37uDN8OhPZRqK37TfSSy2yplDC/1cpswW1Girg3FoUZ03q
  FVRtfzsJNABmAhg8tP5ajrCVaAFvUpHuig8=

The h=... stuff says which headers are part of the message. To verify the signature, look look up the public key (found with dig -t txt sasl._domainkey.pobox.com) and verify the signature against the content of the body and selected headers.

Without DMARC, DKIM can tell you that there's a valid signature, but not why or what to do about it. DMARC lets you say "if there's no valid signature, consider the message suspect, and please tell me about such messages." DMARC is designed to be used on "transactional email," like receipts and order status updates, on newsletters, or on other kinds of mail from an organization to a recipient. It's a reasonable way to attack phishing, because phishers won't be able to produce a valid DKIM signature without your private key. Even if they had it, it will be much more expensive to send out mail that requires a digital signature.

DMARC also lets you give the instruction to reject mail that doesn't authenticate. This is useful if you're a bank and you're really sure that you're getting DKIM right.

Last week, Yahoo! changed their DMARC policy to "reject when DKIM doesn't match." This is a big problem, because they made this change on yahoo.com addresses. DMARC is applied only and always to whatever address is in the From header of an email message. Addresses at yahoo.com are used not just by Yahoo!'s internal services, but also by end users, who can get such addresses by signing up for Yahoo! Mail. Then they can do things like join discussion mailing lists.

Mailing lists will almost always change the headers of the message, and changing the body is quite common, too. Either of these will break the DKIM signature. That means that if someone with a Yahoo! Mail account sends a message to your mailing list, quite a lot of the subscribers will immediately bounce the message. (Specifically, those subscribers whose mail servers respect DMARC will bounce.)

You can't just strip the DKIM signature to prevent there from being an invalid one. The policy requires a valid signature — or a valid SPF record. SPF records are designed to say which IP addresses may send mail for a domain. Since a mailing list will be sending from the list server's IP and not the original sender's IP, that fails too. One email expert described the situation as "Yahoo! has declared war on mailing lists." It's pretty accurate.

The most common solution that's being put into play is From header rewriting. Mailing lists are changing their From headers, so that when you used to see:

From: "Xavier Ample" <xample@example.com>

You'll now see:

From: "Xavier Ample via Fun List" <funlist@heaven.af.mil>

Of course, this screws with replies, so Reply-To needs munging, too. It screws with lots of stuff, though, and it makes everyone angry — and rightfully so, I think. Certainly, I've had to spend quite a lot of time trying to deal with the fallout of this decision. DMARC just isn't good for individual mail accounts. I worry that some of the mechanisms that are being introduced to deal with this are going to create a much less open system for email exchange. This isn't a good thing! Email has a lot of problems, but being a network that one can join without permission is a good thing.

DMARC does a fair job at the thing for which it was intended, but it makes everything else much trickier. This is the nature of many email "improvements," which were added without careful consideration. Or, often, with careful consideration but not much concern. It's understandable. Email seems impossible to replace and impossible to really fix, so we bodge it over and over.

I mentioned SPF, above. SPF also broke parts of the pre-existing email system. Specifically, forwarding. SPF lets you say "only the machines that are MXes for example.com may send mail with an SMTP sender at 'example.com'". This broke forwarding servers. That is, imagine that you've got mydomain.com and its MX sends mail on to your private host myhost.mydomain.com. On that last hop, mx.mydomain.com might be sending you mail FROM an address someone@example.com, while the SPF records for example.com only allow mx.example.com to send such mail.

A second standard was introduced to fix this problem: SRS. With SRS, mx.mydomain.com would be required to rewrite the address to something like SRS0=xyz=abc=example.com=someone@srs.mydomain.com. There are just a holy ton of problems with this setup, but I'll stick to the one that I had to fight with today.

This is a valid email address for use in SMTP interchange:

"your \"best\" friend"@example.com

It's not often used, and it's basically totally awful, but it exists, it's legal, and you should generally try to cope with it if you're doing something as wide-ranging in effect as SRS. Sadly, the reference implementation of SRS totally drops the ball, rewriting to:

SRS0=xyz=abc=example.com="your \"best\" friend"@srs.example.com

This is not a legal address, to say the least.

If anybody needs to reach me, just send a fax.

I bought a Wii U! (body)

by rjbs, created 2014-04-10 23:11
last modified 2014-04-10 23:11
tagged with: @markup:md journal videogame wii

On Tuesday, Gloria and I celebrated our 14th anniversary! We went out to Tulum (yum!) and Vegan Treats (yum!) and it wasn't quite late enough that we wanted to go pick up the kid, so we decided to go walk around Target. I said I'd been thinking about buying a Wii U, and Gloria said I should. (Or maybe she just didn't say "I strongly object." I'm not splitting hairs, here.)

I bought one. Today, we went looking for some games to use the Target credit that we got by buying the game toward this week's buy-2-get-1 promotion on video games. In the end, Gloria ended up driving to Quakertown to buy video games for me while I sat at home poking at security code. I owe her big time — nothing new there!

So, now I've played a handful of Wii U games, done a Wii-to-Wii-U transfer, used the Wii eShop, and tried out a few of the non-game features on the Wii U. This is my preliminary report.

I bought the Wii U because I wanted to play Nintendo games. I have almost no interest in playing non-Nintendo games on it. (I kind of want to play ZombiU, though.) Eventually, there were enough games for the Wii U from Nintendo that I thought it would be worth the investment. We now own:

  • Super Mario Bros. Wii U
  • Super Mario 3-D World
  • NES Remix
  • Super Luigi Bros. Wii U ☺
  • Pikmin 3
  • Donkey Kong Country: Tropical Freeze
  • NintendoLand
  • Scribblenauts Unmasked (not Nintendo, but I really wanted it)
  • Scribblenauts Unlimited (which I got because it was effectively free)

I've played the first three, although none of them very much. They are all excellent in the way that I expect from Nintendo. It amazes me how they are able to produce such consistently great games! The only major franchise Nintendo game I remember disliking in the last ten years is Metroid: The Other M, which was outsourced. (By the way, I loathed that game!)

The big problem so far is the controller. The Wii U gamepad is way cool, but as a controller it's just a little weird. I think I'm very used to my hands being at a bit of an angle when playing games, and the gamepad makes me hold them straight. I'm not sure whether the distance between them is really an issue.

On the other hand, I can play those games with the Wiimote, too. It's not bad, especially NES Remix. I'm left feeling, though, that the Wii U gamepad is a poor replacement for a "normal" gamepad, and the Wiimote is a poor replacement for an NES controller. I think I'll probably much prefer playing both the Mario games with the Wii U "pro" controller, which is much more like an Xbox or PS3 controller. As for NES Remix, I think I'll stick with the Wiimote, but I'd like something a bit more substantial. The height:width ratio on the Wiimote isn't quite right, and I feel like I'm holding the thing the wrong way. The Wii U menu doesn't help with that: it assumes that you can use the cross pad and buttons like you're holding the remote vertically, even when you're deep in playing a game that uses it horizontally.

Still, I bought the Wii U for the games, and so far they're just great. I expect that trend to continue, so I'm sure I'll be delighted with the purchase over time.

Finally, there's the matter of everything that is neither the hardware nor the games. For example, the menu system, the social network, the system setup, and so on. In short: it's all bizarre.

There's this pervasive idea in Wii U that everybody who plays Wii U is your buddy, and you want them to post sticky notes on your game. When you reach a new level in any game, you might see notes from other players, including typed or hand-drawn notes. These range from the relevant to the insipid to the bizarre. They can be turned off, but they're weird. Weirder, these occur on the main screen. Instead of a menu like the Wii had, showing me all my options, the default is to show a swarming mass of Mii avatars who chatter amongst themselves about nothing much. Can you imagine if you were using Windows, and little random speech bubbles popped up here and there talking about cool new programs that were going to be released soon? It's just weird.

On the other hand, it seems like shutting these off shuts off some kind of avenue to receiving news and advance information. I'll probably do it anyway.

What would be cool, though, would be to get this chatter from just my actual circle of friends. I'd love to be able to use the Wii main menu and "Miiverse" as a sort of bulletin board with friends. With the whole Internet, though? Not so much.

I haven't yet set up any friendships, but I will. I'm looking forward to a Nintendo gaming experience where I don't need to tell my friends a 16 digit code to be befriended. My assumption is that Nintendo is still several years behind the curve, but hopefully they're at last on the acceptable side of it. The Wii code experience was indefensible.

My goal is to play through Super Mario Bros. Wii U first, or at least first-ish. After that, Mario 3-D Land. I'm not too worried about playing in order, though. Everything looks good. I mostly bought Scribblenauts: Unmasked to see whether I can stump it. My daughter demanded that I start it up to see if it could make Raven, and was delighted to see that it could. I'm looking forward to seeing whether it can make Doctor Phosphorus.

lazyweb request tracker (body)

by rjbs, created 2014-04-03 23:22
last modified 2014-04-03 23:23
tagged with: @markup:md journal

I like using Remember the Milk. It's a to do list tracker. I use it for lots of little one-off tasks (blog ideas, games to try) and for simple projects that don't have GitHub repositories. It's got an API (which is kind of weird) and an iOS app (which is very good) and a bunch of other interesting little services.

I'd like to use it for more things. Most of them would be a little tough to do, because of the particulars of RTM. Today, though, I realized a useful thing to do: every time I think, "I wonder how I can do XYZ?" and tweet it, I'll also put it in a list in RTM. Then if I get an answer, I can record it, and if I don't, I can remember what I was looking for and ask again later. Or maybe figure out a solution on my own!

so long, module list! (body)

by rjbs, created 2014-03-26 10:48
tagged with: @markup:md cpan journal perl

There's a file in every CPAN mirror called 03modlist.data that contains the "registered module list." It's got no indenting, but if it did, it would look something like this:

sub data {
  my $result  = {};
  my $primary = "modid";
  for (@$CPAN::Modulelist::data) {
    my %hash;
    @hash{@$CPAN::Modulelist::cols} = @$_;
    $result->{ $hash{$primary} } = \%hash;
  }
  $result;
}
$CPAN::Modulelist::cols = [
  'modid',       'statd', 'stats', 'statl', 'stati', 'statp',
  'description',
  'userid',      'chapterid',
];
$CPAN::Modulelist::data = [
  [
    'ACL::Regex', 'b', 'd', 'p', 'O', 'b',
    'Validation of actions via regular expression',
    'PBLAIR', '11'
  ],
  ...
];

It's an index of some of the stuff on the CPAN, broken down into categories, sort of like the original Yahoo index. Or dmoz, which is apparently still out there! Just like those indices, it's only a subset of the total available content. Unlike those, things only appear on the module list when the author requests it. Over the years, authors have become less and less likely to register their modules, so the list became less relevant to finding the best answer, which meant authors would be even less likely to bother using it.

Some things that don't appear in the module list: DBIx::Class, Moose, Plack, Dancer, Mojolicious, cpanminus, and plenty of other things you have heard of and use.

Rather than keep the feature around, languishing, it's being shut off so that we can, eventually, delete a bunch of code from PAUSE.

The steps are something like this:

  • stop putting any actual module data into 03modlist
  • stop regenerating 03module list altogether, leave it a static file
  • convert module registration permissions to normal permissions
  • delete all the code for handling module registration stuff
  • Caribbean vacation

There's a pull request to make 03modlist empty already, just waiting to be applied… and it should be applied pretty soon. Be prepared!

the 2014 Perl QA Hackathon in Lyon: the work (body)

by rjbs, created 2014-03-18 22:37
last modified 2014-03-19 11:09

Today is my first full day back in Pennsylvania after the Perl QA Hackathon in Lyon, and I'm feeling remarkably recovered from four long days of hacking and conferring followed by a long day of travel. I can only credit my quick recovery to my significantly increased intake of Chartreuse over the last week.

The QA Hackathon is a small, tightly-focused event where the programmers currently doing work on the CPAN toolchain get together to get a lot of work done and to collaborate in ways that simply aren't possible for most of the rest of the year. I always come out of it feeling refreshed and invigorated, partly because I feel like I get so much done and partly because it's such an inspiration to see so many other people getting even more done all in one place.

I'm not going to recount the work I did this year in the order that I did it. You might be able to reconstruct this by looking at my git logs, but I'll leave that up to you. Also, I'm sticking to technical stuff here. I might make a second post later about non-code topics.

For this year's hackathon, I wasn't sure exactly what my agenda would be. I thought I might be working on the conversion of the perl 5 core's podcheck.t to Pod::Checker 1.70. In the end, though, I spent most of my time on PAUSE. PAUSE, the Perl Author Upload SErver, is a cluster of programs and services, mostly thought of as two parts:

  • a web site for managing user accounts and receiving archive uploads
  • a program that scans for new uploads and decides whether to put their contents into the CPAN indexes

I didn't do any work on the web site this year. (I would like to do that next, though!) Instead, I worked entirely on the indexer. The indexer is responsible for:

  • deciding whether to index new uploads at all
  • deciding what packages, at what versions, a new upload contains
  • checking the uploading user's authorization to update the index for those packages
  • actually updating the master database and the on-disk flatfile indexes

In the summer of 2011, David Golden and I got together for a one-day micro-hackathon. Our goal was to make it possible to write tests for the indexer's behavior, and I think we succeeded. I'm proud of what we accomplished that day, and it's made all my subsequent work on PAUSE possible.

I also worked on PAUSE change last year, and a few of the things we'd done then had not ever been quite finished. I decided that my first course of action would be to try to get those sorted out.

PAUSE Work

03modlist

The "module list" isn't talked about a lot these days. It's a header-and-body format file where the body is a "safe to eval" (ha ha) Perl document that describes "registered" modules and their properties. You can see it here: http://www.cpan.org/modules/03modlist.data

Most modules on the CPAN are only in "the index," also known as "02packages." There's very little information indexed for these. Given a package name, you can find out which file seems to have the latest version of it, and you can find who has permission to update it. That's it. Registered modules, on the other hand, list a description, a category, a programming style, and other things. The module list isn't much used anymore, and the kinds of data that it reports are now found, instead, in the META.json files meant to be included with every distribution.

I had filed a pull request to produce an empty 03modlist in 2013, but it wasn't in place. Andreas, David, and I all remembered that there was a reason we hadn't put it in place, but we couldn't remember specifics or find any evidence. We decided to push forward. I got in touch with a few people who I knew would be affected, rebased my work, and got a schedule in place. There wasn't much more to do on this front, but it will happen soon. The remaining steps are:

  1. write an announcement
  2. apply the patch
  3. post the announcement that it's been done

I expect this to be done by April.

After that's done, and a month or two have passed with no trouble, we'll be able to start deleting indexer code, converting "m" permissions to "f" or "c" permissions (more on that later), and eliminating unneeded user interface.

dist name permissions

Generally, if I'm going to release a module called Foo::Bar at version 1.002, it will get uploaded in a file called Foo-Bar-1.002.tar.gz. In that filename, Foo-Bar is the "dist name." Sometimes people name their files differently. For example, one might upload LWP::UserAgent in lwp-perl-5.123.tar.gz. This shouldn't matter, but does. The PAUSE indexer only checks permissions on packages, and nothing else. Unfortunately, some tools work based on dist names. One of these is the CPAN Request Tracker instance. It would allow distribution queues to clash and merge because of the lax (read: entire lack of) permissions around distribution names.

Last year, I began work to address this. The idea was that you may only use a distribution name if you have permissions on the matching module name. If you want to call your distribution Pie-Eater, you need permissions on Pie::Eater. We didn't get the work merged last year, because only at the last minute did we realize that there were over 1,000 cases where this wasn't satisfied. It was far more than we'd suspected. (This year, when I reminded Andreas of this, he was pretty dubious. I wasn't: I remembered the stunned disbelief I'd already worked through last year!)

A small group of us discussed the situation and realized that about 99% of the cases could be solved easily: we'd just give module permissions out as needed. A few other cases could be fixed automatically or were not, actually, problematic. The rest were so convoluted that we left them to be fixed as needed. Some of them dated to the 1990's, so it seemed unlikely that it would come up.

I filed a pull request to make this change, in large part based on the work from last year. It was merged and deployed.

Unfortunately, there was a big problem!

PAUSE does not (yet!) have a very robust transaction model, and its database updates were done one by one, with AutoCommit enabled. There was no way to entirely reject a file after starting work, prior to this commit, and I thought the simplest thing to do would be to wrap the indexing of each dist in a transaction. It made it quite easy to write the new check safely, although it required some jiggery-pokery with $dbh disconnect times. In the end, all the tests were successful.

Unfortunately, the tests and production behaved differently, using different database systems. Andreas and I spent about an hour on things before rolling back the changes and having dinner. The next morning, everything was clear. We knew that a child process was disconnecting from the database, but couldn't find out where. We'd set InactiveDestroy on the handle, so it shouldn't have been disconnecting… but it turned out that another object in the system had its own DESTROY method which disconnected explicitly. That fixed things, and after nearly a year, the feature was in place!

package name case-changing

Last year, we did a fair bit of work to make permission checks case-insensitive. The goal was that if "Foo" was already registered, nobody else could claim "foo". We wanted to prevent case-insensitive filesystems from screwing up where case-sensitive filesystems would work. Of course, this isn't a real solution, but it helps discourage the problem.

When we did this, we had to decide what to do when someone who had permissions on Foo tried to switch to using "foo". We decided that, hey, it's your package and you can change it however you like. This turned out to be a mistake, best demonstrated by some recent trouble with the Perl ElasticSearch client. We decided that if you want to change case, you have to be very deliberate about it. Right now, that means dropping permissions and re-indexing. In the future, I hope to make it a bit simpler, but I'm in no rush. This is not a common thing to want to do. I filed a pull request to forbid case-mismatching updates.

I also filed a pull request to issue a warning when package and module names case-mismatch. That is, if you upload a dist containing lib/Foo/Bar.pm with package foo::bar in it, you'll get a warning. In the future, we may start rejecting cases like this, but for now, it's really not good enough. We only handle some cases where the problem might be there, but it's probably most of them.

Indexing warnings are a new thing. I'm not sure what warnings we might add in the future, but it's easy to do so. Given the kinds of strictness we've talked about adding, being able to warn about it first will probably come in useful later.

fixing bizarro permissions

In the middle of some of the work above, while I was in the middle of some other discussion, at some point, somebody leaned over and said, "Hey, did you see the blog post about how to steal permissions on PAUSE distributions?" I blanched. I read the post, which seemed to describe something that should definitely not be possible, and decided it was now my top priority. What luck to have this posted during the hackathon!

In PAUSE, there are three kinds of permission:

  • first-come permission, given to the first person to upload a package
  • co-maintainer permission, handed out by the first-come user
  • module list permission, given to the registered owner in the module list

Let's ignore the last one for now, since they're going to go away.

The bug was that when nobody had first-come permissions on a package, the PAUSE code assumed that nobody could have any permissions on it, and would re-issue first-come. It wasn't the only bug that inspection turned up, though.

It might sound, from above, like a given package owner would only need either first-come or co-maint, but actually you always need co-maint. First-come is meant to be granted in addition to that. This was required, but not enforced, and if a user ended up with only f permissions, they're sometimes seem not to exist, and permissions could be mangled. I filed a pull request to prevent dist hijacking along with some tests.

While running the tests, I started seeing something really bizarre. Normally, permission lines in the permissions index test file would look like this:

Some::Package,USER1,f
Some::Package,USER2,c

...but in the tests, I was sometimes seeing this:

Some::Package,USER1,1
Some::Package,USER2,2

Waaaah? I was baffled for a while until something nagged at me. I noticed that the SQL generating the data to output was using double-quote characters for string literals, rather than standard single-quotes. This is fine in MySQL, which is used in production, but not in SQLite, which is used in the tests. I filed a pull request to switch the quotes. I'll probably file more of those in the future. Really, it would be good to test with the same system as is used in production, but that's further off.

package NAME VERSION

Almost a year ago, Thomas Sibley reported that PAUSE didn't handle new-style package declaration. That is, it only worked with packages like this:

package Foo::Bar;
our $VERSION = '1.001';

...but not any of these:

package Foo::Bar 1.001;

package Foo::Bar 1.001 { ... }

package Foo::Bar {
  our $VERSION = '1.001';
}

I strongly prefer package NAME VERSION when possible, but "possible" didn't include "anything released to the CPAN" because of this bug. I filed a pull request to support all forms of package. I'm really happy about this one, and look forward to making it possible for more of my dists to use the newer forms of package!

respecting release_status in the META.json file

The META.json file has a field called release_status. It's meant to be a way to put a definitive statement in the distribution itself that it's a trial release, not meant for production use. Right now, there are two chief ways to indicate this, both related only to the name of the file uploaded to the CPAN. That file doesn't stick around, and we want a way to decide what to do based on the contents of the dist, not the archive name.

Unfortunately, PAUSE totally ignored this field. I filed a pull request to respect the release_status field. Andreas suggested that we should inform users why we've done this, so I filed a pull request to add "why we skipped your dist" reports. I used that facility for the "dist name much match module name" feature above, and I suspect we'll start issuing those reports for more situations in the future, too.

spreading the joy of testing

Neil Bowers was at the hackathon, and had asked a question or two about how the indexer did stuff. I took this as a request for me to pester him mercilessly about learning how to write tests with the indexer's testing code. Eventually, and presumably to shut me up, he stopped by and I walked him through the code. In the process of doing so, we realized that half the tests — while all seemingly correct — had been mislabeled. I filed a pull request to fix all the test names.

I'm hoping to file some other related pulls to refactor the test file to make it easier to write new indexer tests in their own files. Right now, the single file is just a bit too long.

fixes of opportunity

Lots of the other work exposed little bugs to fix.

Because I was doing all my testing on perl 5.19.9, one of our new warnings picked up a precedence error in the code. I filed a pull request to replace an or with a ||.

Every time I ran the tests, I got an obnoxious flood of logging output. Sometimes it was useful. Usually, it was a distraction. I filed a pull request to shut up the noise unless I was running the tests in verbose mode.

Peter Rabbitson had noticed that when PAUSE skips a "dev release" because of the word TRIAL in the release filename, it was happy for that string to appear anywhere in the name. For example, MISTRIAL-1.234.tar.gz would have been skipped. I filed a pull request to better anchor the substring. I filed a matching pull request with CPAN::DistnameInfo that fixed the same bug, plus some other little things. I'm glad I did this (it was David Golden's idea!) because Graham Barr pointed out that historically people have used not just ...-TRIAL.tar.gz but also ...-TRIAL1.tar.gz.

I found some cases where we were interpolating undef instead of … anything else. I filed a pull request to use a default string when no module owner could be found.

PAUSE has a one second sleep after each newly-indexed distribution. I'm not sure why, and assume it's because of some hopefully long dead race condition. Still, in testing, I knew it wouldn't be needed, and it slowed the test suite down quite a lot every time I added a new test run of the indexer. I filed a pull request to updated the TestPAUSE system to skip the sleep, shaving off a good 90% of the indexer's test's runtime.

While testing something unrelated, Andreas and I simultaneously noticed a very weird alignment issue with some otherwise nicely-formatted text. I filed a pull request to eliminate some accidental indenting.

Dist::Zilla

I had hoped to spend the last day plowing through relevant tickets in the Dist::Zilla queue, but it just didn't happen. I did get to merge and tweak some work from David Golden to make it easier to run test suites in parallel. With the latest Dist::Zilla and @RJBS bundle, my tests suites run nine jobs at once, which should speed up both testing and releasing.

Version Numbers

One night, Graham Knop, Peter Rabbitson, David Golden, Leon Timmermans, Karen Etheridge, and I sat down over an enormous steak to discuss how Perl 5's abysmal handling of module versioning could be fixed. I hope that we can make some forward movement on some of the ideas we hammered out. They can all get presented later, once they're better transcribed. I have a lot of them on the back of a huge paper place-mat, right now.

perl5.git

I did almost nothing on the perl core, which is as I expected. On Friday morning, though, I was on the train to and from the Chartreuse distillery, with no network access, so I wanted to work on something requiring nothing but my editor and git. I knew just what to do!

Perl's lexical warnings are documented in two places: warnings, which documents a few things about the warnings pragma, and perllexwarn, which documents other stuff about using lexical warnings. There really didn't seem to be any reason to divide the content, and it has led, over and over, to people being unable to find useful documentation. I merged everything from perllexwarn into warnings. Normally, this would have been trivial, but warnings.pm is a generated file and perllexwarn.pod was an auto-updated file, so I had to update the program that did this work. It was not very hard, but it kept me busy on the train so that I was still working even while off to do something a bit more tourist-y.

Is that all?

I know there was some more to all this, and it might come back to me. I certainly had plenty of interesting discussions about a huge range of topics with many different groups of attendees. They ranged from the wildly entertaining to the technically valuable. I'll probably recount some of them in a future post. As for this post, meant only to recount the work that I did, I think I've gotten the great majority of it.

Thanks!

I was able to attend the 2014 Perl QA Hackathon because of the donations of the generous sponsors and the many donors to The Perl Foundation, which paid for my travel. Those subsidies, though, would not have been very useful if there hadn't been a conference, so I also want to thank Philippe "BooK" Bruhat and Laurent Bolvin who took on the organization of the hackathon. Finally, thanks to Wendy van Dijk, who began each day with a run to the market for fresh lunch foods. I had plenty of good food while in Lyon, but the best was the daily spread of bread and cheese. (Wendy also brought an enormous collection of excellent liquor, on which I will write more another day.)

I'm look forward to next year's hackathon already. I hope that it will stick to the same size as this year, which was back to the excellent and intimate group of the first few years. Until then, I will just have to stay productive through other means.

today's timezone rant (body)

by rjbs, created 2014-03-07 19:02
last modified 2014-03-08 20:52

Everybody knows, I hope, that you have to be really careful when dealing with time in programs. This isn't a problem only in Perl. Things are bad all over. If you know what you're doing when you start, you can avoid many, many problems. Unfortunately, not all our code is being bulit anew by our present selves. Quite a lot of it exists already, written by other, less experienced programmers, and often (to our great shame) our younger selves.

Every morning, I look at any unusual exceptions that have been reported overnight. Last night, I saw a few complaining about "invalid datetime values," and I saw that they were about times around two something in the morning. A chill went up my spine. I knew what was going to be the case. I checked with MySQL:

mysql> update my_table set expires = '20140309015959' where id = 134866408;
Query OK, 1 row affected (0.00 sec)

mysql> update my_table set expires = '20140309030000' where id = 134866408;
Query OK, 1 rows affected (0.00 sec)

mysql> update my_table set expires = '20140309020000' where id = 134866408;
ERROR: Invalid TIMESTAMP value in column 'expires' at row 1

So, 01:59:59 is okay. 03:00:00 is okay. 02:00:00 through 02:59:59 is not okay. Why? Time zones! Daylight saving time causes that hour to not exist in America/New_York, and the field in question is storing local times. You can't store March 9th, 2014 2:00 in the field because no such moment in time exists. The lesson here is that you shouldn't be storing your time in a local format. Obviously! I tend to store timestamps as integers, but storing them as universal time would have avoided this problem.

Of course, since there's a lot of data already stored in local times, and it can't always be "just fixed," we also have a bunch of tools that work with times, being careful to avoid time zone problems. Unfortunately, that's not always easy. This problem, though, came from a dumb little routine that looks something like this:

sub plusdays {
  Date::Calc::Add_Delta_YMDHMS( Now, 0, 0, 0, 0, 0, 86_400 * $_[0]);
}

So, you want a time a week in the future? plusdays(7)! You want a time 12 hours from now? plusdays(0.5). Crude, but effective and useful. Unfortunately, when it's currently 2014-03-08 02:30 and you ask for one day later, you get 2014-03-08 02:30 — a non-time.

The solution to this should was trivial. We already use DateTime extensively. It just hadn't gotten done to this one little piece of code. I wrote this:

sub plusdays {
  DateTime->local_now->add(seconds => $_[0] * 86_400)
}

It's a good thing that we did this in terms of seconds. See, this does what we want:

my $dt = DateTime->new(
  time_zone => 'America/New_York',
  year => 2014, month  => 3, day => 8, hour => 2, minute => 30,
);

say $dt->clone->add(seconds => 86_400);

It prints 2014-03-09T03:30:00.

On the other hand, if we replace the last line with

say $dt->clone->add(days => 1);

then we get this fatal error:

Invalid local time for date in time zone: America/New_York

This is totally understandable. It's the kind of thing that lets us distinguish between adding "a month" and adding "30 days," which are obviously distinct operations. Not all calendar days are 86,400 seconds long, for example.

Actually, this problem wouldn't have affected us, because we don't use DateTime. We use a subclass of DateTime that avoids these problems by doing its math in UTC. Unfortunately, this has other bizarre effects.

While I doing the above edit, I saw some other code that was also using Date::Calc when it could've been using DateTime. (Or, as above, our internal subclass of DateTime.) This code generated months in a span, so if you say:

my @months = month_range('200106', '200208');

You get:

('200106', '200107', '200108', '200109', ..., '200208')

Great! Somewhere in there, I ended up writing this code:

my $next_month = $curr_month->clone->add(months => 1);

...and something bizarre happened! The test suite entered an infinite loop as it tried to get from the starting month to the ending month. I added more print statements and got this:

CURRENTLY (2001-10-01 00:00) PLUS ONE MONTH: (2001-10-31 23:00)

What??

Well, as I said above, our internal subclass does its date math in UTC to avoid one kind of problem, but it creates another kind. Because the offset to UTC changes over the course of October, the endpoint seems one hour off when it's converted back to local time. The month in local time, effectively, is an hour shorter than the month in UTC. So, in this instance, I opted not to use our internal subclass.

Now, the real problem here isn't DateTime being hard to use or date problems being intractably hard. The problem is that when not handled properly from the start, date representations can become a colossal pain. We're only stuck with most of the stupid problems above because the code in question started with a few innocent-seeming-but-actually-terrible decisions which then metastasized throughout the code base. If all of the time representations had been in universal time, with localization only done as needed, these problems could have been avoided.

Of course, you probably knew that, so in the end, I guess I'm just venting. I feel better now.

that might be good enough for production... (body)

by rjbs, created 2014-02-27 10:30
last modified 2014-02-27 11:29

Sometimes, the example code in documentation or teaching material is really bad. When the code's dead wrong, that might not be the worst. The worst may be code that's misleading without being wrong. The code does just what it says it does, but it doesn't keep its concepts clear, and students get annoyed and write frustrated blog posts. This code might be good enough for production, but not for pedagogy.

I'm back to learning Forth, which I'm really enjoying. The final example in the chapter on variables is to write a tic-tac-toe board. (By the way, more evidence that Forth is strange: variables aren't introduced until chapter nine, more than halfway through the book.)

The exercise calls for the board state to be stored in a byte array, initialized to zeroes, with 1 used for X and -1 used for O. I thought nothing of this and got to work, but no matter what, when I played an "O" I would get a "?" in my output board — an indication that my code was finding none of -1, 0, or 1 in the byte in memory. Why?

Well, bytes are 0 to 255, so -1 isn't a natural value, but "everybody knows" the convention is that -1 is a way of writing 255. I wrote this code which, given a number on the stack, returns the character that should display it:

: BOARD.CHAR
  DUP 0 =  IF '- ELSE
  DUP 1  = IF 'X ELSE
  DUP -1 = IF 'O ELSE '? THEN THEN THEN SWAP DROP ;

The -1 there is a cell value, not a byte value, so on my Forth it's not 255 but 18,446,744,073,709,551,615. Oops.

The answer should be easy: I want a way to say CHAR -1 or something. We didn't see that yet in the book. How does the author do it? At this point, I'm already a little annoyed that I'm going to have to look at the author's answer, but that's life. My guess is that either he's using something he didn't show us, or he's using a literal 255.

It's neither. His factoring of the problem's a bit different, but:

: .BOX  ( square# -- )
  SQUARE C@  DUP 0= IF  2 SPACES
      ELSE  DUP 1 = IF ." X "
            ELSE ." O "
            THEN
            THEN
  DROP ;

He totally punts! If there's anything in a cell other than 0 or 1, he displays an O. Bah!

I found absolutely no value in this use of -1, so I stored all of O's moves as 2. All tests successful.

fixing my accidentally strict mail rules… or not (body)

by rjbs, created 2014-02-24 20:36

I recently made some changes to Ywar, my personal goal tracker, and I couldn't be happier! Mostly.

Ywar is configured with a list of "checks." Each check looks up some datum, compares it to previous measurements, decides whether the goal state was met, and saves the current measurement. The checks used to run once a day, at 23:00. This meant that, for the most part, the feedback I got was the next morning in my daily agenda mail. I could hit refresh at 23:05, if I wanted, and if I was awake. If I did something at 8:00, I'd just have to remember. For the most part, this wasn't a big problem, but I wanted to be able to run things more often.

Last week, when I was working on my Goodreads/Ywar integration, I also made the changes needed to run ywar update more often. There were two main changes: every measurement now carries a log of whether it resulted in goal completion, and checks don't get the last measured value, but the "last state," which contains both the last value measured and the value measured at the last completion.

While I was at it, I added Pushover notifications. Now, when I get up in the morning, I step on my scale. A few minutes later, my phone bleeps, telling me, "Good job! You stepped on the scale!" Over breakfast, I might read an article I've saved to Instapaper. While I was the dishes, or maybe while I read a second article, my iPad bleeps. "Good job! You read something from Instapaper!"

This is surprisingly motivating. I'm completing goals much more often than I used to, now. (The Goodreads integration has also been really motivating.)

This change also inadvertantly introduced a pretty significant change in my email rules. Most of them follow the same pattern, which is something like this:

  • at least once every five days, have less unread mail than the previous day

Some of them say "flagged" instead of "unread," or limit their checks to specific folders, but the pattern is pretty much always like the one above. When I started passing each check both the "last measured" and "last completion" values, I had to decide which they'd use for computing whether the goal was completed. In every case, I chose "last completion." That means that the difference checked is always between the now and the last time we met our goal. This has a massive impact here.

It used to be that all I had to do to keep my "keep reading email" goal alive was to reduce my unread mail count from the previous date. Imagine the following set of end-of-day unread message counts:

  • Sunday: 50
  • Monday: 100
  • Tuesday: 70
  • Wednesday: 100
  • Thursday: 75
  • Friday: 80
  • Saturday: 70

Under the old rules, I would get three completions in that period. On each of Tuesday, Thursday, and Saturday, the count of unread messages goes down from the previous day.

Under the new reules, I would get only one completion: Tuesday. After that, the only way, ever, to get another completion is to get down to below 70 unread messages. Maybe in a few days, I get to 60, and now that's my target. This gets pretty unforgiving pretty fast! My current low water mark for unread mail is 28, and I get an averge of 126 new messages each day. These goals actually have a minimum threshold, so that anything under the threshold counts, even if last time I was further below it. Right now, it's set at 10 for my unread mail goal.

It would be pretty easy to fix this to work like it used to work. I'd get the latest measurement made yesterday and compare to that. I'm just not sure that I should restore that behavior. The old behavior made it very easy to read the easy mail and ignore the stuff that really needed my time. I could let some mail pile up on Wednesday, read the light stuff on Thursday, and I'd still get my point. I kept thinking that I needed something "more forgiving," but I don't think that's true. I don't even think it makes sense. What would "more forgiving" mean? I'm not sure.

One thing to consider is that if I can never keep a streak alive, I won't bother trying. It can't be too difficult. It has to seem possible, and to be possible, without being a huge chore. It just shouldn't be so easy that no progress is really being made.

Also, I need to make sure that once I've broken my streak, any progress starts me up again. If I lose my streak and end up with 2000 messages, having to get back to 25 is going to be a nightmare. My original design was written with this in mind: any progress was progress enough. The new behavior ratchets the absolute maximum down, so that once I've gotten rid of most of those 2000 messages, I can't let them pile back up by ignoring 5 one day, 5 the next, and then reading six the third. Maybe the real solution will be to keep exactly the behavior I have, but to fiddle with the minimum threshold.

The other thing I want to think about, eventually, is message age. I don't want to ding myself for mail received "just now." If a message hasn't been read for a week, I should really read it immediately. If it's just come in this afternoon, though, it should basically be ignored in my counts. For now, though, I think I can ignore that. After all, my goal here is to read email, not to spend my email reading time on writing tools to remind me of that goal!

freemium: the abomination of desolation (body)

by rjbs, created 2014-02-24 10:38
tagged with: @markup:md games journal

I've never been a fan of "freemium," although I understand that game developers need to get paid. It often feels like the way freemium games are developed goes something like this:

  • design a good game
  • focus on making the player want to keep playing
  • insert arbitrary points at which the player must stop playing for hours
  • allow the user to pay money to continue playing immediately

This model drives me batty. It's taking a game and making it worse to encourage the user to pay more. It is, in my mind, the opposite of making a good game that you can make better by paying more. I gladly fork over money for add-on content on games that were good to start with. I never, ever pay to repair a game that has been broken on purpose.

The whole thing reminds me of the 486SX processor, where you could buy a disabled 486 processor now and later upgrade it with a completely new processor that was pinned to only fit into the add-on slot. At least the 486SX could be somewhat explained away as a means to make some money on processors that didn't pass post-production inspection. These fremium games are just broken on purpose from the start.

I think the deciding factor for me is whether I can play the game as much as I want without hitting the pay screen. Years ago, everyone at work was playing Travian. It's a simple browser-based nation-building game, something like a very simplified Civilization. Your workers collect resources and you use them to build cities, troops, and so on. The game is multiplayer and online, so you are in competition with other nations with whom you may eventually go to war or with whom you may establish trade routes. You can keep playing as long as you have resources to spend and free workers. by paying money, you could speed up work or acquire more resources, but the game didn't throw up a barrier every half hour forcing you to wait. It was all a natural part of the game's design, and made sense to have in a simultaneous-play multiplayer game. (Of course, the problem here is that players willing to spend more money have a tactical advantage. That's a different kind of problem, though.)

I used to play an iOS game called Puzzle Craft. The basic game play is tile-matching, and it's all built around the idea that you're the founder of a village that you want to grow into a thriving kingdom. At first, you tile-match to grow crops. Over time, new kinds of tiles are added, and you can respond by developing new tools and by changing the matching rules. You can also build a mine, for a similar but not identical tile matching game. You'll need to deal with both resources to progress along your quest.

I was very excited to see that the makers of Puzzle Craft released a new game this week, Another Case Solved. It's a tile-matcher built in a larger framework, just like Puzzle Craft, but this game is a silly hard boiled detective game. Matching tiles helps you solve mysteries. The game is fun to look at and listen to, but playing it has made me angrier and angrier.

Unlocking major cases requires solving minor cases. Solving minor cases requires a newspaper in which to find them. Newspapers are delivered every fifteen minutes, and you can't have more than three or four of them at a time. In other words, if you want to play more than four (very short) games an hour, you have to spend "candy" to get more newspapers, and you get a piece or two of candy every 12 hours. Also, after a little while, the minor cases become extremely difficult to solve, meaning that every hour you're allowed to play the game three or four times, and that you will probably lose most of them, because there is a low turn limit in each game. Of course, you can keep playing after the turn limit by paying candy.

The whole setup makes it completely transparent that the time and turn limits are there to cajole the player into paying to be allowed to play the free game. It sticks in my craw! I like the game. It is fun. I would pay for it, were it something I could buy at a fixed price. Microtransactions to continue playing the game, though, burn me up.

Maybe I should keep telling myself that I pumped a lot of quarters into Gauntlet when I was a kid. How different is this?

I think it's pretty different. I've seen people play for a very, very long time on one quarter.

integrating Ywar with Goodreads (body)

by rjbs, created 2014-02-17 20:37

Ywar is a little piece of productivity software that I wrote. I've written about Ywar before, so I won't rehash it much. The idea is that I use The Daily Practice to track whether I'm doing things that I've decided to do. I track a lot of these things already, and Ywar connects up my existing tracking with The Daily Practice so that I don't have to do more work on top of the work I'm already doing. In other words, it's an attempt to make my data work for me, rather than just sit there.

For quite a while now, only a few of my TDP goals needed manual entry, and most of them could clearly be automated. It wasn't clear, though, how to automate my "keep reading books" tasks. I knew Goodreads existed, but it seemed like using Goodreads would be just as much work as using TDP. Either way, I have to go to a site and click something for each book. I kept thinking about how to make my reading goals more motivating and more interesting, but nothing occurred to me until this weekend.

I was thinking about how it's hard for me to tell how long it will take me to finish a book. Lately, I'm taking an age to read anything. Catch-22 is about 500 pages and I've been working on it since January 2. Should I be able to do more? I'm not sure. My current reading goals have been very vague. I thought of them as, "spend 'enough time' reading a book from each shelf once every five days." This makes it easy to decide sloppily whether I've read enough, but it's always an all-at-once decision.

In Goodreads, I can keep track of my progress over several days. That means I can change my goal to "get at least 50 pages read a week." There's no fuzzy logic there, just simple page count. It might not be right for every book, but I can adjust it as needed. If it's too low or high, I can fix that too. It seemed like a marked improvement, and it also gave me a reason to consider looking at Goodreads a bit more, where I've seen some interesting recommendations.

With my mind made up, all I had to do was write the code. Almost every time that I've wanted to write code to talk to the developer API of a service that's primarily addressed not via the API, it's been sort of a mess that's usable, but weird and a little annoying. So it was with Goodreads. The code for my Goodreads/Ywar integration is on GitHub. Below is just some of the weirdness I got to encounter.

This request gets the books on my "currently reading" shelf as XML.

sprintf 'https://www.goodreads.com/review/list?format=xml&v=2&id=%s&key=%s&shelf=currently-reading',
  $user_id,
  $api_key;

The resource is review/list because it's a list of reviews. Go figure! That doesn't mean that there are actually any reivews, though. In Goodreads, a review represents the intersection of a user and a book. If it's on your shelf, it has a review. If there's no review in the usual sense of the word, it just means that the review's body is empty.

The XML document that you get in reply has a little bit of uninteresting data, followed by a <reviews> element that contains all the reviews for the page of results. Here's a review:

<review>
  <id>774476430</id>
  <book>
    <id type="integer">168668</id>
    <isbn>0684833395</isbn>
    <isbn13>9780684833392</isbn13>
    <text_reviews_count type="integer">7875</text_reviews_count>
    <title>Catch-22 (Catch-22, #1)</title>
    <image_url>https://d202m5krfqbpi5.cloudfront.net/books/1359882576m/168668.jpg</image_url>
    <small_image_url>https://d202m5krfqbpi5.cloudfront.net/books/1359882576s/168668.jpg</small_image_url>
    <link>https://www.goodreads.com/book/show/168668.Catch_22</link>
    <num_pages>463</num_pages>
    <format></format>
    <edition_information/>
    <publisher>Simon &amp; Schuster </publisher>
    <publication_day>4</publication_day>
    <publication_year>2004</publication_year>
    <publication_month>9</publication_month>
    <average_rating>3.96</average_rating>
    <ratings_count>355544</ratings_count>
    <description>...omitted by rjbs...</description>
    <authors>
      <author>
        <id>3167</id>
        <name>Joseph Heller</name>
        <image_url><![CDATA[https://d202m5krfqbpi5.cloudfront.net/authors/1197308614p5/3167.jpg]]></image_url>
        <small_image_url><![CDATA[https://d202m5krfqbpi5.cloudfront.net/authors/1197308614p2/3167.jpg]]></small_image_url>
        <link><![CDATA[https://www.goodreads.com/author/show/3167.Joseph_Heller]]></link>
        <average_rating>3.94</average_rating>
        <ratings_count>368314</ratings_count>
        <text_reviews_count>9588</text_reviews_count>
      </author>
    </authors>
    <published>2004</published>
  </book>

  <rating>5</rating>
  <votes>0</votes>
  <spoiler_flag>false</spoiler_flag>
  <spoilers_state>none</spoilers_state>
  <shelves>
    <shelf name="currently-reading" />
    <shelf name="literature" />
  </shelves>
  <recommended_for></recommended_for>
  <recommended_by></recommended_by>
  <started_at>Thu Jan 02 17:04:20 -0800 2014</started_at>
  <read_at></read_at>
  <date_added>Tue Nov 26 08:37:09 -0800 2013</date_added>
  <date_updated>Thu Jan 02 17:04:20 -0800 2014</date_updated>
  <read_count></read_count>
  <body>

  </body>
  <comments_count>review_comments_count</comments_count>
  <url><![CDATA[https://www.goodreads.com/review/show/774476430]]></url>
  <link><![CDATA[https://www.goodreads.com/review/show/774476430]]></link>
  <owned>0</owned>
</review>

It's XML. It's not really that bad, either. One problem, though, was that it didn't include my current position. My current position in the book is not a function of my review, but of my status updates. I'll need to get those, too.

I was intrigued, though by the format=xml in the URL. Maybe I could get it as JSON! I tried, and I got this:

  [...,
  {"id":774476430,"isbn":"0684833395","isbn13":"9780684833392",
  "shelf":"currently-reading","updated_at":"2014-01-02T17:04:20-08:00"}
  ...]

Well! That's certainly briefer. It's also, obviously, missing a ton of data. It doesn't include book titles, total page count, or any shelves other than the one that I requested. That is: note that in the XML you can see that the book is on both currently-reading and literature. In the JSON, only currently-reading is listed. Still, it turns out that this is all I need, so it's all I fetch. I get the JSON contents of my books in progress, and then once I have them, I can get each review in full from this resource:

  sprintf 'https://www.goodreads.com/review/show.xml?key=%s&id=%s',
    $api_key,
    $review_id;

Why does that help? I mean, what I got in the first request was a review, too, right? Well, yes, but when you get the review via review/show.xml, you get a very slightly different set of data. In fact, almost the only difference is the inclusion of comment and user_status items. It's a bit frustrating, because in both cases you're getting a review element, and their ids are the same, but their contents are not. It makes it a bit less straightforward to write an XML-to-object mapper.

When I get review 774476430, which is my copy of Catch-22, this is the first user status in the review:

  <user_status>
    <chapter type="integer" nil="true"/>
    <comments_count type="integer">0</comments_count>
    <created_at type="datetime">2014-02-16T12:47:14+00:00</created_at>
    <id type="integer">39382590</id>
    <last_comment_at type="datetime" nil="true"/>
    <note_updated_at type="datetime" nil="true"/>
    <note_uri nil="true"/>
    <page type="integer" nil="true"/>
    <percent type="integer">68</percent>
    <ratings_count type="integer">0</ratings_count>
    <updated_at type="datetime">2014-02-16T12:47:14+00:00</updated_at>
    <work_id type="integer">814330</work_id>
    <body/>
  </user_status>

By the way, the XML you get back isn't nicely indented as above. It's not entirely unindented, either. It's sometimes properly indented and sometimes just weird. I think I'd be less weirded out if it just stuck to being a long string of XML with indentation at all, but mostly libxml2 reads the XML, not me, so I should shut up.

The important things above are the page and percent items. They tell me how far through the book I am as of that status update. If I gave a page number when updating, the page element won't have "true" as its nil attribute, and the text content of the element will be a number. If I gave a percentage when updatng, as I did above, you get what you see above. I can convert a percentage to a page count by using the num_pages found on the book record. The whole book record is present in the review, as it was the first time, so I just get all the data I need this time via XML.

Actually, though, there's a reason to get the XML the first time. Each time that I do this check, it's for in-progress books on a certain shelf. If I start by getting the XML, I can then proceed only with books that are also on the right shelf, like, above, "literature." Although you can specify multiple shelves to the review/list endpoint, only one of them is respected. If there are four books on my "currently reading" shelf, but only one is "literature," then by getting XML first, I'll do two queries instead of five.

So I guess I should go back and start with the XML.

By the way, did you notice that review/list takes a query parameter called format, which can be either XML or JSON, and maybe other things... but that review/show.xml includes the type in the path? You can't change the xml to json and get JSON. You just get a 404 instead.

In the end, making Ywar get data from Goodreads wasn't so bad. It had some annoying moments, as is often the case when using a mostly-browser-based web service's API. It made me finally use XML::LibXML for some real work, and hopefully it will lead to me using Goodreads more and getting some value out of that.

games I've played lately (body)

by rjbs, created 2014-02-03 21:51
tagged with: @markup:md games journal

About a year ago, I told Mark Dominus that I wanted to learn to play bridge, but that it was tough to find friends who were also interested. (I'd rather play with physically-present people whom I know than online with strangers.) He said, "Sackson's Gamut of Games has a two-player bridge variant." I had never heard of Sackson, A Gamut of Games, or the two-player variant. I said, "oh, cool," and went off to look into it all.

So, A Gamut of Games is a fantastic little book that you can get at Amazon for about ten bucks. (That's a kickback-to-rjbs link, by the way.) It's got about thirty games in it, most of which you've probably never heard of. I've played only about a fifth of them so far, or less, and so far they've been a lot of fun. The first game in the book is called Mate, and is meant to feel a bit like chess, which it does. It's a game of pure skill, which is quite unusual for a card game. When I first got the book, I would teach the game to everybody. It was easy to learn and play, fun, and a novelty. I also taught Martha how to play, using a Bavarian deck of cards, which made the game more fun for her. We'd go to the neighborhood bar, get some chicken tenders and beer (root and otherwise) and play a few hands.

I got to wondering why more of the games in Gamut weren't available electronically. I found a Lines of Action app for iOS, but it was a bit half-baked, on the network side. Then, through Board Game Geek, I found a site called Super Duper Games offering online Mate, in the guise of "Chess Cards." It's a really mixed experience, but I am a big fan.

The site's sort of ugly, and incredibly slow. There are some weird display issues and some things that are, if not bugs, are darn close. On the other hand, it's got dozens of cool board games that you haven't played before, and you can play them online, against your friends or strangers. If you join, challenge me to something. I'm rjbs.

So far, I've only played a small number of the available games.

One of my favorites is Abande. Like many SDG games, I think it would be even better played in real time, and I'm hoping to produce a board for playing it. Another great one is Alfred's Wyke, which should be easy to play at a table with minimal equipment. I think I'll play it with wooden coins, which I bought in bulk several years ago.

There's also Amazons, Archimedes, Aries, and many more.

In many cases, the games would be improved by realtime play, I think, but only one game has stricken me as greatly hampered by its electronic form. Tumblewords is a cross between Scrabble and Connect Four, which sounds pretty great. It seems like it probably is pretty great, too, but it's got a problem. In some games, like cribbage, part of the goal is to correctly identify what you've just scored. Similarly, in Tumblewords, part of the challenge should be noting all the words that each move introduces. On Super Duper Games, the computer does this for you, using a dictionary. It means you get points for all sorts of words that you'd never have noticed otherwise. I think I may have to play this one in real space before anything else!

Check out Super Duper Games, even if only to read the rules for the games there and play them. Or, maybe try playing something! If you don't want to challenge me, there are dozens of open challenge sitting around at any time.

Dist::Zilla is for lovers (body)

by rjbs, created 2014-01-25 11:21
last modified 2014-01-25 14:20
tagged with: @markup:md journal perl

I don't like getting into the occasional arguments about whether Dist::Zilla is a bad thing or not. Tempers often seem to run strangly high around this point, and my position should, at least along some axes, be implicitly clear. I wrote it and I still use it and I still find it to have been worth the relatively limited time I spent doing it. Nonetheless, as David Golden said, "Dist::Zilla seems to rub some people wrong way." These people complain, either politely or not, and that rubs people who are using Dist::Zilla the wrong way, and as people get irritated with one another, their arguments become oversimplified. "What you're doing shows that you don't care about users!" or "Users aren't inconvenienced at all because there are instructions in the repo!" or some other bad over-distillation.

The most important thing I've ever said on this front, or probably ever will, is that Dist::Zilla is a tool for adjusting the trade-offs in maintaining software projects. In many ways, it was born as a replacement for Module::Install, which was the same sort of thing, adjusting trade-offs from the baseline of ExtUtils::MakeMaker. I didn't like either of those, so I built something that could make things easier for me without making install-time weird or problematic. This meant that contributing to my repository would get weird or problematic for some people. That was obvious, and it was something I weighed and thought about and decided I was okay with. It also meant, for me, that if somebody wanted to contribute and was confused, it would be up to me to help them, because I wanted, personally, to make them feel like I was interested in working with them¹. At any rate, of course it's one more thing to know, to know what the heck to do when you look at a git repository and see no Makefile.PL or Build.PL, and having to know one more thing is a drag. Dist::Zilla imposes that drag on outsiders (at least in its most common configurations), and it has to be used with that in mind.

Another thing I've often said is that Dist::Zilla is something to be used thoughtfully. If it was a physical tool, it would be yellow with black stripes, with a big high voltage glyph on it. It's a force multiplier, and it lets you multiply all kinds of force, even force applied in the wrong direction. You have to aim really carefully before pulling the trigger, or you might shoot quite a lot of feet, a surprising number of which will belong to you.

If everybody who was using Dist::Zilla thought carefully about the ways that it's shifting around who gets inconvenienced by what, I like to imagine that there would be inconsiderate fewer straw man arguments about how nobody's really being inconvenienced. Similarly, if everybody who felt inconvenienced by an author's choice in built tools started from the idea that the author has written and given away their software to try and help other users, there might be fewer ungracious complaints that the author's behavior is antisocial and hostile.

Hopefully my next post will be about some fun code or maybe D&D.

1: My big failure on this front, I think, is replying promptly, rather than not being a big jerk. I must improve, I must improve, I must improve...

Dist::Zilla and line numbering (body)

by rjbs, created 2014-01-14 11:22

brian d foy wrote a few times lately about potential annoyances distributed across various parties through the use of Dist::Zilla. I agree that Dist::Zilla can shuffle around the usual distribution of annoyances, and am happy with the trade offs that I think I'm making, and other people want different trade offs. What I don't like, though, is adding annoyance for no gain, or when it can be easily eliminated. Most of the time, if I write software that does something annoying and leave it that way for a long time, it's actually a sign that it doesn't annoy me. That's been the case, basically forever, with the fact that my Dist::Zilla configuration builds distributions where the .pm files' line numbers don't match the line numbers in my git repo. That means that when someone says "I get a warning from line 10," I have to compare the released version to the version in git. Sometimes, that someone is me. Either way, it's a cost I decided was worth the convenience.

Last week, just before heading out for dinner with ABE.pm, I had the sudden realization that I could probably avoid the line number differences in my shipped dists. The realization was sparked by a little coincidence: I was reminded of the problem just after having to make some unrelated changes to an unsung bit of code responsible for creating most of the problem.

Pod::Elemental::PerlMunger

Pod::Weaver is the tool I use to rewrite my sort-of-Pod into actual-Pod and to add boilerplate. I really don't like working with Pod::Simple or Pod::Parser, nor did I like a few of the other tools I looked at, so when building Pod::Weaver, I decided to also write my own lower-level Pod-munging tool. It's something like HTML::Tree, although much lousier, and it stops at the paragraph level. Formatting codes (aka "interior sequences") are not handled. Still, I've found it very useful in helping me build other Pod tools quickly, and I don't regret building it. (I sure would like to give it a better DAG-like abstraction, though!)

The library is Pod::Elemental, and there's a tool called Pod::Elemental::PerlMunger that bridges the gap between Dist::Zilla::Plugin::PodWeaver and Pod::Weaver. Given some Perl source code, it does this:

  1. make a PPI::Document from the source code
  2. extract the Pod elements from the PPI::Document
  3. build a Pod::Elemental::Document from the Pod
  4. pass the Pod and (Pod-free) PPI document to an arbitrary piece of code, which is expected to alter the documents
  5. recombine the two documents, generally by putting the Pod at the end of the Perl

The issue was that step two, extracting Pod, was deleting all the Pod from the source code. Given this document:

package X;

=head1 OVERVIEW

X is the best!

=cut

sub do_things { ... }

...we would rewrite it to look like this:

package X;

sub do_things { ... }
__END__
=head1 OVERVIEW

X is the best!

=cut

...we'd see do_things as being line 9 in the pre-munging Perl, but line 3 in the post-munging Perl. Given a more realistic piece of code with interleaved Pod, you'd expect to see the difference in line numbers to increase as you got later into the munged copy.

I heard the suggestion, many times, to insert # line directives to keep the reported line numbers matching. I loathed this idea. Not only would it be an accounting nightmare in case anything else wanted to rewrite the file, but it meant that the line numbers in errors wouldn't match the file that the user would have installed! It would make it harder to debug problems in an emergency, which is never okay with me.

There was a much simpler solution, which occurred to me out of the blue and made me feel foolish for not having thought of it when writing the original code. I'd rewrite the document to look like this:

package X;

# =head1 OVERVIEW
#
# X is the best!
#
# =cut

sub do_things { ... }
__END__
=head1 OVERVIEW

X is the best!

=cut

Actually, my initial idea was to insert stretches of blank lines. David Golden suggested just commenting out the Pod. I implemented both and started off using blank lines myself. After a little while, it became clear that all that whitespace was going to drive me nuts. I switched my code to producing comments, instead. It's not the default, though. The default is to keep doing what it has been doing.

It works like this: PerlMunger now has an attribute called C, which refers to a subroutine or method name. It's passed the Pod token that's about to be removed, and it returns a list of tokens to put in its place. The default replacer returns nothing. Other replacers are built in to return blank lines or commented-out Pod. It's easy to write your own, if you can think of something you'd like better.

Karen Etheridge suggested another little twist, which I also implemented. It may be the case that you've got Pod interleaved with your code, and that some of it ends up after the last bits of code. Or, maybe in some documents, you've got all your Pod after the code, but in others, you don't. If your concern is just keeping the line numbers of code the same, who cares about the Pod that won't affect those line numbers? You can specify a C for replacing the Pod tokens after any relevant code. I decided not to use that, though. I just comment it all out.

PkgVersion

Pod rewriting wasn't the only thing affecting my line numbers. The other thing was the insertion of a $VERSION assignment, carried out by the core plugin PkgVersion. Its rules are simple:

  1. look for each package statement in each Perl file
  2. skip it if it's private (i.e., there's a line break between package and the package name)
  3. insert a version assignment on the line after the package statement

...and a version assignment looked like this:

{
  $My::Package::VERSION = '1.234';
}

Another version-assignment-inserter exists, OurPkgVersion. It works like this:

  1. look for each comment like # VERSION
  2. put, on the same line: our $VERSION = '1.234';

I had two objections to just switching to OurPkgVersion. First, the idea of adding a magic comment that conveyed no information, and served only as a marker, bugged me. This is not entirely rational, but it bugged me, and I knew myself well enough to know that it would keep bugging me forever.

The other objection is more practical. Because the version assignment uses our and does not wrap itself in a bare block, it means that the lexical environment of the rest of the code differs between production and test. This is not likely to cause big problems, but when it does cause problems, I think they'll be bizarre. Best to avoid that.

Of course, I could have written a patch to OurPkgVersion to insert braces around the assignment, but I didn't, because of that comment thing. Instead, I changed PkgVersion. First off, I changed its assignment to look like this:

$My::Package::VERSION = '1.234';

Note: no enclosing braces. They were an artifact of an earlier time, and served no purpose.

Then, I updated its rules of operation:

  1. look for each package statement in each Perl file
  2. skip it if it's private (i.e., there's a line break between package and the package name)
  3. skip forward past any full-line comments following the package statement
  4. if you ended up at a blank line, put the version assignment there
  5. otherwise, insert a new line

This means that as long as you leave a blank line after your package statement, your code's line numbers won't change. I'm now leaving this code after the # ABSTRACT comment after my package statements. (Why do the VERSION comments bug me, but not the ABSTRACT comments? The ABSTRACT comments contain more data — the abstract — that can't be computed from elsewhere.) Now, this can still fall back to inserting lines, but that's okay, because what I didn't include in the rules above is this: if configured with die_on_line_insertion = 1, PkgVersion will throw an exception rather than insert lines. This means that as I release the next version of all my dists, I'll hit cases once in a while where I can't build because I haven't made room for a version assignment. That's okay with me!

I'm very happy to have made these changes. I might never notice the way in which I benefit from them, because they're mostly going to prevent me from having occasional annoyances in the future, but I feel good about that. I'm so sure that they're going to reduce my annoyance, that I'll just enjoy the idea of it now, and then forget, later, that I ever did this work.

making my daemon share more memory (body)

by rjbs, created 2014-01-10 19:45
last modified 2014-01-10 19:45
Quick refresher: when you've got a unix process and it forks, the new fork can share memory with its parent, unless it starts making changes. Lots of stuff is in memory, including your program's code. This means that if you're going to `require` a lot of Perl modules, you should strongly consider loading them early, rather than later. Although a runtime `require` statement can make program start faster, it's often a big loss for a forking daemon: the module gets re-compiled for every forked child, multiplying both the time and memory cost.

Today I noticed that one of the daemons I care for was loading some code post-fork, and I thought to myself, "You know, I've never audited that program to see whether it does a good job at loading everything pre-fork." I realized that it might be a place to quickly get a lot of benefit, assuming I could figure out what was getting loaded post-fork. So I wrote this:

use strict;
use warnings;
package PostForkINC;

sub import {
  my ($self, $code) = @_;

  my $pid = $$;

  my $callback = sub {
    return if $pid == $$;
    my (undef, $filename) = @_;
    $code->($filename);
    return;
  };

  unshift @INC, $callback;
};

When loaded, PostForkINC puts a callback at the head of @INC so that any subsequent attempt to load a module hits the callback. As long as the process hasn't forked (that is, $$ is still what it was when PostForkINC was loaded), nothing happens. If it has forked, though, something happens. That "something" is left up to the user.

Sometimes I find a branch of code that I don't think is being traversed anymore. I love deleting code, so my first instinct is to just delete it… but of course that might be a mistake. It may be that the code is being run but that I don't see how. I could try to figure it out through testing or inspection, but it's easier to just put a little wiretap in the code to tell me when it runs. I built a little system called Alive. When called, it sends a UDP datagram about what code was called, where, and by whom (and what). A server receives the datagram (usually) and makes a note of it. By using UDP, we keep the impact on the code being inspected very low. This system has helped find a bunch of code being run that was thought long dead.

I combined PostForkINC with Alive and restarted the daemon. Within seconds, I had dozens of reports of libraries — often quite heavy ones — being loaded after the fork.

This is great! I now have lots of little improvements to make to my daemon.

There is one place where it's not as straightforward as it might have been. Sometimes, a program tries to load an "optional" module. If it fails, no problem. PostForkINC can seem to produce a false positive, here, because it says that Optional::Module is being loaded post-fork. In reality, though, no new code is being added to the process.

When I told David Golden what I was up to, he predicted this edge case and said, "but you might not care." I didn't, and said so. Once I saw that this was happening in my program, though, I started to care. Even if I wasn't using more memory, I was looking all over @INC to try to find files that I knew couldn't exist. Loading them pre-fork wasn't going to work, but there are ways around it. I could put something in %INC to mark them as already loaded, but instead I opted to fix the code that was looking for them, avoiding the whole idea of optional modules, which was a pretty poor fit for the program in question, anyway.

I've still got a bunch of tweaking to do before I've fixed all the post-fork loading, but I got quite a lot of it already, and I'm definitely going to apply this library to more daemons in the very near future.

prev page
next page
page 1 of 82
2037 entries, 25 per page