Continuing my Science Hack Day 2010 theme for this week’s posts I’m cross posting the following idea I came up with this evening:

Science-friendly url-shortener. The idea would be that scientists could use per-article urls for the references they cite – which is the same as when advertisers give specific codes to different types of ad during each ad campaign to enable them to track people’s interest in a product. Whenever somebody reads their article and follow a url they are first taken to the url shortener site which then forwards them to the actual url cited. The site should only allow people to shorten urls if they have registered, which they must do via some type of single sign-on system such as OpenId or ORCID. This registration means that whenever someone follows a link the system knows exactly who it was that followed it and allows the scientist who wrote the article to gather detailed information about the interests of those who read their articles. Of course anybody would be allowed to unshorten, although this still allows stats to be gathered. The actual format of the shortened url needs careful thinking in order to ensure it’s acceptable to journals.

Tags: ,

I’ve just come back from two days of hacking at HackCamp 2010 hosted at Google in London. It was so great to see the diversity of projects people were working on. The project I decided to tackle turned out to be far more ambitious than was possible in under 24 hours of coding but my collaborator @leipie and I made great progress with identifying the necessary components in the stack for putting together a future implementation. Since next weekend is Science Hack Day 2010 I believe this project would be suitable to take on over the weekend. I intend to resolve most of the additional problems raised by this weekend’s work in future blog posts during the week with the next on in particular discussing the chosen components and thinking behind each. In the meantime below is a dump of the idea and it’s motivation:

  • There are plenty of great examples of long-standing open problems in theoretical computer science and math; for many of these there is a strong believe, based on past experience, that the solution (should one actually exist) requires thinking somehow “outside of the box”
  • I make this “thinking outside of the box” concept more concrete in the following way: almost all examples of purported solutions to these open problems follow standard patterns although their details differ. Hence if, given a steady stream of these potential solutions, you can find a way to annotate each new one and compare the pattern of proof with those found in ones you’ve already received, as soon as you encounter a solution that doesn’t follow this pattern it will stick out like a sore thumb.
  • These anomalous solutions may not actually solve the problem but they may signal potential new avenues of attack, hopefully meaning the solution is reached far quicker. In fact certain lemmas within the proof may be entirely correct but the rest of the proof totally bogus – there should be a way of reusing just those parts that were correct, assuming they actually help with finding a solution.
  • Anyone will be allowed to submit as many solutions (read published papers) as they wish. Annotations will be done by the community and anyone in the community to contribute annotations. As a result these solutions are judged on their merit against each other and winners of this “competition” are those solutions which contribute novel ideas in the sense that they rise to the top in any listing of solutions.
  • The annotations are at a fairly coarse granularity (compared to formal proofs supplied to verification procedures), roughly at the level of proof technique (e.g. “this section is proof by induction” or “here they used diagonalisation”). Another way to think of this is that it’s kind of at the level of “hand-wavy” styles of proof :)
  • (To allow reuse there could be a concept of ‘forking’ someone else’s “paper” submitted to the system)
  • This system is supposed to contrast directly with the arXiv, where such “out there” solutions are less likely to appear due to the filtering process, although Perelman’s recent solution to Poincaré conjecture is one very notable exception to this.
  • The situation is even worse with traditional publishing because nobody gets to see the rejects and so these cannot be used to train a filter. The reason for this siomple: a small group of reveiwers simply does scale – I actively want this system to attract many of those who might be considered “cranks” or “nut jobs” i.e. the sorts of people who think they have solved the the hardest problem in certain field in a single page of text with no equations. But even if they haven’t they may actually have interesting ideas that are worth filtering on. I want this system to scalable enough to cope and indeed thrive on the data they provide.
  • The key open problem I have chosen here is “N vs. NP” because there are lots of papers out there from a diversity of sources that I can use to test the system from the outset.
  • One implementation issue I’d like to address is the centralized gatekeeper-like nature of arXiv, although in the first instance the system will have to be centralized so that code development can be bootstrapped. The eventual hope is that anyone that wishes to submit solutions can do so on a server of their choice and this system will aggregate feeds from each such server.

Tags: ,

In May Craig Venter announced to an assembled press pack that he and his team had successfully created a synthetic genome and implanted it into a cell to create the world’s first synthetic bacteria M. mycoides JCVI-syn1.0. The work was published in the journal Science in article the “Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome.

In this press conference he gave a few details of certain “watermarks” placed in the base pair sequence for the purpose of clearly distinguishing the synthetic organism from any potential contamination in their experimental results but did not reveal their entire contents nor give details of how they were encoded. Instead he threw open a challenege for anyone to decode the watermarks and uncover an email address, to which they were invited to send an email to prove that they had indeed correctly decoded it. The watermarks are given in the supporting online meterials of the article cited above.

The challenge is also described in the following video:

(Jump to the specific section.)

On Sunday I decided to see if I could crack the code and after about six hours of coding I managed to do it! It turns out I was the 44th such person to do so. Rather than explain the technique I used I’ll simply present the following as conclusive proof that I did indeed crack the code, though of course you’re going to have crack it yourself to verify this.

TTAACTAGCTAATGATCACTGGCTATAACACTACGTTTGTAAGCTATAATTTAGTGCATATCATAGTACCGTTGCATATTTCTATAGTTTGCATAAATTATATGATCATAAATATTGTAATGCTGATAACTAATATACTAATGCCGTCAATAAATAGTCTAGTGATAACTACAATAGCTAGCACGAATATTATAAATTCGAGGCGCGCCTTAACTAGCTAA

If you’re lucky enough to be going to one of the barcamps I’m also going to this summer I’ll most likely present the method in one of my talks and eventually on this blog.

Tags: , ,

Last Saturday I was saddened to hear of the passing of Martin Gardner. My own relationship with Martin’s work was through his republication and of commentary of Silvanus P Thompson’s Calculus Made Easy. In this little book, over the course of a summer holiday (can’t remember now if it was 2002 or 2003), I learned about calculus for the first time. Martin’s commentary helped to put the subject into perspective in light of the radical changes in style that had taken place since Thompson’s first publication of the work, emphasizing that the core ideas were essentially no different. I can’t emphasis enough just how pivotal this event was in my journey through mathematics.

When I at school doing A-Levels first time around I had little interest in studying the subject. In fact I was far more interested in doing art using my computer. However as I got further into the techniques of computer graphics and tried to code my own computer graphics tools I very quickly realized I didn’t know enough math to progress any further. There was no getting around it, before I could do any of the cool things I wanted to do with my computer I’d have to learn calculus and the book that was recommended to me (I think it was in some SIGGRAPH course notes) was Calculus Made Easy – what one fool can do, another can.

There are several obituaries and tributes you can read to learn more about his life and work (for instance here, here, here and here). However the best way to appreciate his work is just to pickup one of his books and start reading.

News of Gardner’s death broke late on Saturday evening. The following day I headed over to the Jam Factory in Oxford for a day of coding at Oxford Geek Jam 6. Leading up to the day we had discussed various formats we could follow but hadn’t really agreed on any one of these so I suggested that we code up one of Martin Gardner’s puzzles as a tribute to him.

Once a critical mass of coders had arrived at the event and after a little searching we decided that the Game of Hip would work really well. I think we were all initially more interested in coding up solutions to puzzles since at the previous geek jam, which took the form of a coding dojo, we’d worked on a problem that we could tackle computationally. However, we could only find one example where someone had coded the game before and that was in Pascal. We therefore also agreed we would implement a JavaScript UI for the game which would be svg rendered on an HTML5 canvas. This would make it possible for us to get a version of the game running on the iPhone (although it turned out that there were some HTML5 and/or svg issues which meant it didn’t really work on Android).

Oxford Geek Jam 6 in full flow

This was the hottest weekend of the year so far and the Sunday felt even hotter that the Saturady, but the pitchers of Pimms got us through. (Incidentally that reminds me that the Cambridge puntcon is coming up fairly soon. Unfortunately I missed it last year, though it might be on a weekend this year I can’t make again. Perhaps we need an Oxford puntcon?) Even after getting kicked out of the Jam Factory earlier than normal for staff training we managed to find a pub that has good wifi continued coding. You can see the end result of our day here.

The format on Sunday differed somewhat from previous Oxford Geek Jams (I’d previously only been to Oxford Geek Jam 5) where hackers mostly worked individually on projects, although there was some pair programming at the last one. As we launched into developing the Game of Hip we decided that we would tackle the problem on three fronts; core game engine, front-end UI and game playing AI (sadly we had insufficient time to fully develop the latter). We then sub-divided the five man team into smaller groups around these themes and programmed the various parts in pairs. As a result I feel we really gelled as a team and in part this was helped by our use of the Oxford Geek Jam svn code repository. We committed changes fairly frequently, although everything got committed into the project trunk so we often found we needed to resolve conflicts. In one respect I think this shows just how closely integrated we were working as a team though it’s clearly not ideal to have to fix conflicts manually. However, I believe it provides a good reason for trying out git next time round to compare the effect on our workflow.

I’m really proud of what we were able to achieve and the day has reaffirmed my belief in the principle that if you get enough bright and motivated people in a room collaborating on a great idea you can do amazing things.

Tags:

At most points in my life I have had a good idea about what my goals are and a plan about the ways I could make that happen. Perhaps the clearest goal I have ever had was when I sat down one afternoon in mid-2003 and decided there and then I would secure myself a place at Cambridge on the Computer Science course. Although I didn’t realise it at the time, this would take me two years of extremely intensive work, but with this singular goal in mind throughout, I finally achieved it.

As goals go this one was pretty good; the result was very specific and the route to achieving it, through studying for A-Levels, was also very clear and well understood. For various reasons, that aren’t important here, I had missed out on going to University directly from school and so at the time I felt this was the most challenging goal I could set myself to progress my career. These qualities strongly lend themselves to goals that are ultimately achievable.

The only problem was, all my goal said was that I would get in to Cambridge. In particular it said absolutely nothing about what I was going to do once I had got in. I was so focussed on just this one aim that I had left no room to consider anything beyond it. So when I actually found myself there I very quickly lost direction.

This might seem rather strange, because surely the goal was to study hard and do well in my finals? This is certainly a goal, and a jolly good one at that, but it was never explicitly mine. Neither for that matter was the opposite my goal. Simply put I just didn’t have any particular goal in mind. I was, however, motivated by a vague desire to explore and understand as much as possible about Computer Science intially, and then later Mathematics more generally, and in the greatest detail possible by following my interests wherever they took me within those fields. At best this could be described as a yearning. It was certainly not a goal. It was no basis for a plan; especially not one involving studying hard for exams.

It is perhaps glib to observe that without a goal I had nothing to aim for. However, the key driving force that gave me the confidence to continually strive to achieve the highest possible grades I could my A-Levels second time around was that in my mind’s eye I could visualise my acheiving that goal because it was so clearly defined, even though the grades I got doing A-Levels at school said I hadn’t a chance. In constrast, at no point during my degree did I buy into and truly believe (although plenty of people around me did believe) I could achieve First Class grades at the end of each year in Cambridge. Sure enough I didn’t .

None of this is in any way intended as an excuse but there are important observations here that it is worth making. Especially since now, almost two years since the end of my course where I potentially missed out on the benefits of the PhD I might have persued had I given myself the chance to, I’m in a place where I’m ready to be serious again about what direction I’m going and what I want to achieve along the way. I currently have lots of goals. In fact I have too many of them and for most I have yet to fully buy into them. So in my next post on this subject I shall describe the most important of these, outlining for each a specific goal, a simple plan and a reason why it’s challenging and worthwhile to do. In effect I will be pitching these goals to myself as a way of buying into them so that I can believe that I can eventually achieve them and hence start the process of making that happen.

Tags: , , , ,

The following post is a response to Peter Murray-Rust’s post “Time flies like an arrow; fruit flies like a banana. Or do they?

Peter, I fully agree on the fundamental importance of NLP to AI (for me it’s the most important of the so-called AI-complete problems). Indeed, it’s interesting to note that Chomsky’s work on natural language linguistics gave rise to the subject of formal languages which includes all computer languages in which AI solutions must somehow be written. Clearly for efficient human-computer interaction NLP would be extremely beneficial.

However, I strongly believe we should be continually striving for increasing formalism in the end product of our labours independently of the means of how we got to them (I’m mainly thinking of scientific end products here). My definition of formalism in this context includes some form of reduction in ambiguity by some degree of agreement on the meaning and prescribed usage of terms.

I base this last assertion on what I think is the key scientific example of the importance of clarity in meaning and in the logical consequences of that meaning. I speak of course of the revolution in thought about space and time brought about by Einstein’s Theory of Relativity. Post-Einstein wherever you needed to talk about “time flying” (at least in scientific discourse, and more specifically physics) what was meant by that was necessarily fundamentally different to what it had been before. The previous sloppy usage was now simply unacceptable.

All of which affords me the opportunity to quote in-extenso the following passage from Eddington’s Mathematical Theory of Relativity (p. 8):

Those who still insist on the existence of a unique “true time” generally rely on the possibility that the resources of experiment are not yet exhausted and that some day a discriminating test may be found. But the off-chance that a future generation my discover some significance in our utterances is scarcely an excuse for making meaningless noises.

Conversely, where I think NLP techniques have even greater potential, beyond simply working out what someone said, are in the following two ways;

  1. In a grammatically accurate phrase such as “All men are mortal. Socrates is a man. Socrates is not mortal.” it should be possible for a machine to identify the obvious logical error. Much scientific discourse essentially comes down to formulae expressible in simple logics, in which it is possible for a machine to tease out seemingly subtle flaws. (Any formal structure capture in this process must form the basis of the end product if it is to be worthwhile)
  2. Although ambiguity is problematic if we are trying to understand what a person has said in an amenable automated way, there are classes of case in which ambiguity arises where it has beneficial consequences. I can make the analogy here with an abstract algebra such as Group Theory where the ambiguity in exactly what sorts of thing the elements of a group are enables one to prove general theorems about groups that apply to any arbitrary types of elements of groups. Alternatively, we can take the example of Dirac’s bra-ket notation where in a complete bra-ket the individual components have different interpretations which means we can view the complete bra-ket ambiguously from different perspectives although it turns out the they are in any case equivalent.

    So my hope would be that the machine that encounters such ambiguity is able to either abstract away from it a more general concept or allow the ambiguity to pass whilst acknowledging that the interpretations of which it admits are equally valid and possibly intended. Without this latter observations much of poetry would be impossible.

Ping

So once again I return this blog and I’m going to start another post with an empty promise to myself to update this blog more often. At some time since the last post I silently changed the name and subtitle of this blog – feel free to lampoon as you see fit. Here are five things I’ve been up to since last time:

  1. Handwriting tweets using the Wacom Bamboo Pen and Touch tablet.
  2. Creating a revised version of the chem visualiser gadget – it’s not ready for the samples gallery yet but I’m working on it.
  3. I started a FriendFeed conversation about minimal scientific artefacts which I then synthesised into a Wave – I think the minimality criterion is actually quite powerful for reasons I go into in the Wave
  4. We had a Wave hack day at RAL which was very successful
  5. I bought a larger antenna for my wifi card – I know this sounds really minor but it’s made a massive difference to my workstation connectivity which in turn has made me a happy bunny

So prepare for the deluge – I’m back in the blogosphere and this time I have something to say. Hopefully…

Really quick post about this because I’m really squeezed for time (I shouldn’t have been working on this today but couldn’t resist it, especially when I worked out how it could be done).

I’ve now got a proof of concept \LaTeX gadget for Google Wave to try out:

(Image of \LaTeX Gadget to be inserted here)

The URL is here: http://www.danhagon.me.uk/Wave/LaTeX.xml

To install click the jigsaw icon when editing a Wave. Enjoy!

I have to say a massive thanks to Cameron Neylon for this because without his support in getting me a Wave account this wouldn’t have been possible. I’ll blog in more detail about my first impressions hopefully later this week.

Tags: ,

Had a great day in Oxford on Friday at Social Media Convention 2009. I was hoping to blog during the conference but given the amount of issues and ideas that were being thrown about at the time that now seemed overly-optimistic.

Below are summaries of what I think are the main themes of the day from my own perspective.

Conference-craft, backchannels and the battle for attentionspace

This was the first conference I had been to where I’d brought a laptop and so it was my first real exposure to live-tweeting, although I had seen the feeds at other conferences. Perhaps because I was unfamilar with the process (or maybe I was just a little tired after an early start that day) but I found it incredibly difficult at first to keep up with both the discussion from panelists and from twitterers.

In a way this served to make the sessions more practical than it would otherwise would have been since quite a bit of the “official” discussion centered around how the backchannel can often be more important. Another issue that got discussed, I think started by Nigel Shadbolt, was how we as individual users find it increasing difficult to service all the demands on our attention that social media generates. “The battle is for our attention space.” One important metric I picked up on was that 20 mintues per second get uploaded to youtube. The discussion emphasised too that there is an incredible amount of noise in these (back)channels.

Since the same hashtag (#oxsmc09) was used for both of the parallel sessions there was an interesting mupliplexing of comments from differing discussions. However the discussions were not so dissimilar that they didn’t make sense in the context of the discussion I was listening to. During the first sessions one member of the audience, when asked what their twitter id was said they had never used twitter to which there was rapturous applause from the rest of the audience.

During the session on “The growth of the corporate blog – ‘Letting go’ of information control or maintaining the official line?” I took a different approach, with less twittering and instead I created a mind map which you can download from here (Please note: to read this file you will need a copy of Freemind installed on you machine). A slightly unreadable PDF rendering is available from here. [Disclaimer: these documents are released under the same Creative Commons BY-NC-SA License as the rest of this site at time of publication. I lay no claim to and/or responsibility for the accuracy of the details of who said what and when during the discussion those documents purport to portray but would be more than happy to correct any errors that might have crept in.]

Barriers to entry and flat namespaces

One take-home message for the points Bill Thompson (@billt) made was that breaking down barriers to entry is what can facilitate innovation by those who could never have imagined they could do things they have. A quote I like from him was “There are no conferences about fax machines.” (I.e. it’s virtually obsolete technology)

Nigel Shadbolt, perhaps no unsurprisingly, argued that unusual names help to unify the otherwise flat namespace used across all the various social media sites people use. I wondered about the merits of a more hierachical namespace along the lines of the Domain Name system. However this issue is probably less important than solving the problem of maintaining a genuine, authentic and verifiable online persona/identity. For all user of the web this is important but for scientists in particular who are attempting build a reputation through work published online the issue is highly relavent.

Making Science Public – some thoughts on the panel session

Cameron Neylon drove home the point that we all have a stake in science since it is funded from the government coffers and thus we should be encouraging scientists to engage with us and themselves in ways which make their methods and results more transparent and available.

I think Ben Goldacre accurately described the current state of science communication and public understanding when he said that the “Mainstream media cover science badly.” I totally agree. Most mainstream science programs on TV, for instance, are a total joke and have next to zero pure science content. They could be more accurately reclassified as technology programs because that’s where their focus is mostly.

Maxine Clarke, setting out her position at the start of the discussion, explained that essentially Nature’s role is to manage the peer-review process and heavily filter the stream of possible publications. Later in the discussion by the other panelists gave counter-arguments along the lines that readers are actually quite a good at this filtering by themselves anyway.

I was interested to hear Cameron plug a book which I think is Beautiful Data from O’Reily. He cited this as giving examples of the advantages of online sciecnce. I’ll have to get me a copy.

I know there was someone in the audience from GalaxyZoo because Cameron gave them a shout-out and they retweeted one of my comments about how citizen science could be key to engaged anyone who’s currently not a scientist. I was hoping to get to talk with them following the session but they left before I had a chance.

What the rest of the blogosphere is saying

The tweeter who captured most of the limelight and put the the “disruptive” back into “disruptive technologies” was @caffeinebomb. You can read her blog post about the event here.

There’s a really good in-depth summary of some of the sessions on Sara Fletcher’s blog. Another comment peice and link nexus is Brian Kelly’s post about the event.

Tags: , ,

« Older entries

Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales
This work by Daniel Hagon is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales.