[fonc] Hacking Maru

Discussion:

[fonc] Hacking Maru

Faré

2013-10-19 03:51:44 UTC

Dear Ian,

having exhausted my sources of procrastination in Common Lisp,
I'm now just starting to look at maru seriously.

Is there a mailing list specifically about technical details of maru,
or is FoNC the thing?

Stupid questions:

* I see the gc reserves 8 bits for flags, but uses only 3. Is that a
typo? Is that done with the intent of addressing that byte directly?
On the other hand, given the opportunity for 5 extra bits, I'd gladly
grab one for a one-bit reference count (i.e. is the object linear so
far or not), and another one for a can-we-duplicate capability,
if/when implementing a linear lisp on top of it.

* The source could use comments and documentation. Would you merge in
patches that provide some?

* Speaking of documentation, could you produce an initial README, with
a roadmap of which files are which and depend on which, and how to run
the various tests, etc.? Maybe separating things in subdirectories
could help. Or not. Many Makefile targets are obsolete. A working
test-suite could help.

* Is the idea that everyone should be doing/forking his own,
CipherSaber style, or is there an intent to share and build common
platform?

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
If it's not worth doing, it's not worth doing well — Donald Hebb
If it's not worth doing right, it's not worth doing. — Scott McKay

Attila Lendvai

2013-10-19 06:52:18 UTC

Permalink

Post by FarÃ©
the various tests, etc.? Maybe separating things in subdirectories
could help. Or not. Many Makefile targets are obsolete. A working

+1 for the subdirectories. what was especially confusing to me when i
looked at it is that it took a lot of time to decypher which files are
automatically generated, which are generated + hand edited, and which
are genuine sources.

i'd suggest at least 3 dirs (or a naming convention) for these
categories, and if generated files end up hand edited afterwards, then
either keeping the vanilla outputs, or using patch files to apply the
edits from the makefile.

but then as i understand maru is more about being a proof of concept
than a v0.1 platform that should gather external hacking power and
slowly turn into another common lisp... :)

--
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“Time is really the only capital that any human being has and the
thing that he can least afford to waste or lose.”
— Thomas A. Edison (1847–1931)

Ian Piumarta

2013-10-21 05:40:32 UTC

Permalink

Dear Faré,

I'm sorry (but also impressed ;) to hear that you have reached the limit of procrastination within Common Lisp.

I'm a little preoccupied until next week, but brief answers to some of your questions are inline below...

Post by FarÃ©
Is there a mailing list specifically about technical details of maru,
or is FoNC the thing?

No and not really (at least not for any kind of sustained planning, design and development discussion).

Throwing caution to the wind and misplaced trust into the toxic smog^H^H^H^H^H^H^H^H^H^H cloud, I created maru-***@googlegroups.com (with https://groups.google.com/forum/#!forum/maru-dev possibly being the place to sign up, I think, assuming some cookie or other isn't poisoning my view of that page compared to everyone else's view of it).

I am tempted to write a big introductory post containing my vision for where Maru might go or how it might be reborn, given enough community support, but that will not happen before next week.

Post by FarÃ©
* I see the gc reserves 8 bits for flags, but uses only 3. Is that a
typo? Is that done with the intent of addressing that byte directly?

No and not really (at least not explicitly). An ancient gcc generated wretched code when the boundary was not byte-aligned. Things may be different now. You're right that 16 Mbytes is a stingy limit for certain kinds of application.

Post by FarÃ©
On the other hand, given the opportunity for 5 extra bits, I'd gladly
grab one for a one-bit reference count (i.e. is the object linear so
far or not), and another one for a can-we-duplicate capability,
if/when implementing a linear lisp on top of it.

They're yours for the taking. I was waiting to have some time to reply properly to the linear lisp thread but, since you mention it, one of the reasons for having all accesses to object fields go through get() and set() was to make it ridiculously easy to add read and write barriers.

Post by FarÃ©
* The source could use comments and documentation.

I suffer from being able to make sense of my own code many years after writing it. Not very conducive to community process, I know.

Post by FarÃ©
Would you merge in patches that provide some?

Of course! I might not merge them very quickly, though, since I'd be so unreasonable as to want to read them all first and elaborate if necessary. :)

Post by FarÃ©
* Speaking of documentation, could you produce an initial README, with
a roadmap of which files are which and depend on which, and how to run
the various tests, etc.? Maybe separating things in subdirectories
could help. Or not. Many Makefile targets are obsolete. A working
test-suite could help.

You almost answered your own question. I try to keep README.examples (which is a script) working properly. It exercises a huge amount of the code, but to find out what exactly that code is you do need to obtain some kind of transitive closure over all the 'load's and 'require's that pull in lots of source files.

Many Makefile targets were for temporary experiments, long abandoned, and should be removed. Others are for exercising interesting code (that still works and yet is not portable enough to be part of README.examples) and should be made more prominent. (Targets that exercise a statically-typed compiler for Maru are hiding in there somewhere, with obscure names.) Other interesting code has no Makefile or README.examples presence at all. (Minimal SDL bindings to open a window, render some text and then paint with the mouse is one example, prompted by a recent discussion on this list.)

The whole thing is teetering on the edge of being rewritten in itself (as the original three-file version of Maru was written in itself). My intention was always to tidy things up considerably at that time.

FWIW, there is a sketch of how Maru's generalised eval works (http://piumarta.com/freeco11/freeco11-piumarta-oecm.pdf) which is entirely accurate in intent and approach, if a little different in some implementation details.

Post by FarÃ©
* Is the idea that everyone should be doing/forking his own,
CipherSaber style, or is there an intent to share and build common
platform?

I'd love to build a common platform. Maru is in particular trying to be malleable at the very lowest levels, so any special interest that cannot be accommodated easily within the common platform would be a strong indicator of a deficiency within the platform that should be addressed rather than disinherited.

Where there is a choice between forking to add a fix or feature and clamouring to get that fix of feature adopted into some kind of central repository, I believe clamouring always benefits vastly more people in the long run. I intensely dislike the github/gitorius 'clone fest' mindset because it dilutes and destroys progress, encouraging trivial imitation rather than radical innovation -- which, if there is any at all, finds itself fighting an intractably high noise floor. Forking will always split a community (even if one side is only left with a community of one) so if you want what you are doing to be relevant one day and benefit the most people then trying to limit forks and clones is a good thing, IMO. If anyone wants to do something hugely incompatible with Maru, with no guarantee of eventual success, I'm happy to make a branch in the repository for it. While public forking might be a viable model for development that closely linked with and intends to contribute back to a mature project, where the gravitational field of the parent repository is irresistibly high, I don't think it is very helpful or efficient for getting a small and unestablished project off the ground.

Regards,
Ian

Tom Novelli

2013-10-21 13:00:31 UTC

Permalink

Post by Ian Piumarta
I'd love to build a common platform. Maru is in particular trying to be
malleable at the very lowest levels, so any special interest that cannot be
accommodated easily within the common platform would be a strong indicator
of a deficiency within the platform that should be addressed rather than
disinherited.

Sounds like an idea I can get behind. I was writing my own minimal
Lisp/APL compiler but I don't have enough time to do it justice, let alone
build anything (like a little OS) with it. Never even started GC. Just
glancing at Maru I'm like "yup, I wrote pretty much the same stuff from
scratch." :)

-Tom

Loup Vaillant-David

2013-10-21 15:36:30 UTC

Permalink

Post by Ian Piumarta

Post by FarÃ©
* Is the idea that everyone should be doing/forking his own,
CipherSaber style, or is there an intent to share and build common
platform?

I'd love to build a common platform. Maru is in particular trying
to be malleable at the very lowest levels, so any special interest
that cannot be accommodated easily within the common platform would
be a strong indicator of a deficiency within the platform that
should be addressed rather than disinherited.
Where there is a choice between forking to add a fix or feature and
clamouring to get that fix of feature adopted into some kind of
central repository, I believe clamouring always benefits vastly more
people in the long run. I intensely dislike the github/gitorius
'clone fest' mindset because it dilutes and destroys progress,
encouraging trivial imitation rather than radical innovation --
which, if there is any at all, finds itself fighting an intractably
high noise floor. Forking will always split a community (even if
one side is only left with a community of one) so if you want what
you are doing to be relevant one day and benefit the most people
then trying to limit forks and clones is a good thing, IMO. If
anyone wants to do something hugely incompatible with Maru, with no
guarantee of eventual success, I'm happy to make a branch in the
repository for it. While public forking might be a viable model for
development that closely linked with and intends to contribute back
to a mature project, where the gravitational field of the parent
repository is irresistibly high, I don't think it is very helpful or
efficient for getting a small and unestablished project off the
ground.

If I may, there may be one non-trivial argument in favour of forking:
learning. I found that building my own stuff helps me learn. For
instance, I had to write my own meta compilers[1] to really understand
the magic behind parsing expression grammars. Reading OMeta's source
code[2] simply wasn't enough.

I'm now doing the same with Earley Parsing[3]. I have a working toy
recogniser, and now seek to reconstruct a tree (or several). Since
the Wikipedia article didn't help much on that front, I have sought
the simplest implementation I could find[4]. But again, no luck
understanding how it reconstructs the parse tree. (Can't I read
source code?)

So I tried to reconstruct a parse tree manually, from the states
generated by my recogniser. Surprisingly, it worked. The Wikipedia
says that "the recogniser can be easily modified to create a parse
tree as it recognises, and in that way can be turned into a parser".
Turns out I don't even need to. And now, I think I understand the
reconstruction algorithm well enough to implement it. I predict some
minor trouble with ambiguous parses, though.

[1]: http://loup-vaillant.fr/projects/metacompilers/
[2]: http://www.tinlizzie.org/ometa/
[3]: https://en.wikipedia.org/wiki/Earley_parser
[4]: https://github.com/tomerfiliba/tau/blob/master/earley3.py

---

Now, back to Maru.

A central repository for serious work is probably best. But we also
need a reliable way to learn. I simply cannot trust myself with a
COLA system until I know I can build one, for two reasons:

- It's a new, untested, immature technology (or so many people will
think). If it breaks, I must be able to fix it myself, because
like Linus, Ian probably doesn't scale.
- Its abstractions are "leaky" by design. Unlike with C or Java, the
actual machine behind the language is for me to take over. There
is no layer behind which I cannot get past. (Or so I guess.)

So if we want to have a chance to spread something like Maru, we
probably need to favour deep learning as well. Surface understanding
is likely to get its users into trouble, which will then blame the
tool. Of the top of my head, I see a few ways one could learn Maru:

- Dive into the source code of the real thing. I may try, but I will
likely fail miserably, just like I did OMeta.
- Read scientific papers. I gathered a surface understanding of some
principles, but nothing solid yet.
- Build a toy from scratch. I'll probably do that, since it worked
so far.
- Learn from an existing toy. That toy would be the "useful fork".
Bonus points if the toy can lift itself into the real thing. Extra
bonus points if the real thing is _actually_ lifted up from the
toy.
- Learn from tutorials, like "Write yourself a Maru in 48 hours"[5].
Bonus points if there's a second tutorial to lift your toy into
something serious.

[5]: https://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours

---

Here is how I would imagine my dream world. It would be a central
repository with:

- A toy Maru, optimised for clarity.
- A tutorial for writing your own toy.
- A serious Maru, lifted up from the toy.
- A tutorial for lifting your own toy up.
- The hand-written bootstrap compilers (for understanding, and the
Trusting Trust problem).

Does this dream world sounds possible? Is it even a good idea?

Loup.

Ian Piumarta

2013-10-22 01:18:26 UTC

Permalink

Loup,

By 'fork' I meant to imply creating a publicly-visible repository that pops up in google searches and prevents you finding the place where the progress is being made, unless you happen to spot the tiny icon hidden in the corner that takes you to the repo from which your current page was forked. (That's a personal gripe and might be blowing my dislike of github, et al., out of all proportion. :) Cloning a repo and experimenting/breaking/repairing in order to understand is not the same, nor is using your local Mercurial repository clone to work locally and then contribute back to a parent repo. If that's what Faré meant by "fork" then I'm all for it.

Post by Loup Vaillant-David
I'm now doing the same with Earley Parsing[3].

The Wikipedia article's presentation is not the clearest and it is about the minimum needed, with some reading between the lines, to make a working recogniser.

Earley's thesis and original papers are known to contain errors. I recommend you get hold of "Parsing Techniques: A Practical Guide" (Grune and Jacobs, Springer, 2008) which presents lots of parsing algorithms (including several chart parsers) clearly and concisely. There are a few papers building on Earley's work that contain clear presentations of the original algorithm, parse tree reconstruction and their compact representations; e.g., "SPPF-Style Parsing from Earley Recognisers" (Elizabeth Scott, Elsevier, 2008) and "Practical Earley Parsing" (Aycock and Horspool, The Computer Journal, 45(6), 2002).

I agree entirely that after noticing that following the causality of predict and scan steps (backwards from the final states) gives all the derivations, the rest is relatively easy.

Post by Loup Vaillant-David
- Read scientific papers. I gathered a surface understanding of some
principles, but nothing solid yet.
- Build a toy from scratch. I'll probably do that, since it worked
so far.

These two are fun to do in parallel. They feed each other very well.

Post by Loup Vaillant-David
Here is how I would imagine my dream world. It would be a central
- A toy Maru, optimised for clarity.
- A tutorial for writing your own toy.
- A serious Maru, lifted up from the toy.
- A tutorial for lifting your own toy up.
- The hand-written bootstrap compilers (for understanding, and the
Trusting Trust problem).
Does this dream world sounds possible? Is it even a good idea?

I hope so, and I think so. At some point you could consider literate programming. Jones Forth is one example of how this can be attempted even from the very first point (http://rwmj.wordpress.com/2010/08/07/jonesforth-git-repository). By the time you're on the third step, the above hierarchy could begin to support source code representations intended for ease of understanding.

Regards
Ian

Loup Vaillant-David

2013-10-23 08:41:34 UTC

Permalink

Post by Ian Piumarta
I recommend you get hold of
- Parsing Techniques: A Practical Guide
- SPPF-Style Parsing from Earley Recognisers
- Practical Earley Parsing

Whoa, thanks. Will do right away.

Post by Ian Piumarta

- Read scientific papers. […]
- Build a toy from scratch.[…]

These two are fun to do in parallel. They feed each other very well.

Okay.

Post by Ian Piumarta

Here is how I would imagine my dream world. It would be a central
- A toy Maru, optimised for clarity.
- A tutorial for writing your own toy.
- A serious Maru, lifted up from the toy.
- A tutorial for lifting your own toy up.
- The hand-written bootstrap compilers (for understanding, and the
Trusting Trust problem).
Does this dream world sounds possible? Is it even a good idea?

I hope so, and I think so. […]
By the time you're on the third step, the above hierarchy could
begin to support source code representations intended for ease of
understanding.

By third step, I gather you mean "serious Maru".

What do you mean by "supporting source code representations indented
for ease of understanding"? Structure editing? Code folding?
Something like Bret Victor's work (learnable programming, visual
representations for Nile…)?

Loup.

shawnmorel

2013-10-23 15:36:27 UTC

Permalink

Since we're on the topic of forking...

About a year and a half ago, I basically took maru-2.1 apart and rebuilt it from scratch as a learning experiment. Also, in the spirit of fonc and the sciences of the artificial, I wanted to write experiments against the maru system - these of course boil down to expressing a new system / language IN maru (I've called those sets of experiments / language / system "modernity")

Given the recent interest in trying to understand maru, I figure I'd share my heavily documented experience doing this.
https://github.com/strangemonad/modernity

In many ways this was an adventure in software forensics and trying to put myself in Ian's mind / frame of thinking.

Post by Ian Piumarta
(That's a personal gripe and might be blowing my dislike of github, et al., out of all proportion. :)

Or maybe a gripe against the way the Linux kernel devs organize their work?

Post by Ian Piumarta

Post by Loup Vaillant-David
Here is how I would imagine my dream world. It would be a central
- A toy Maru, optimised for clarity.

Hopefully, what' you'll find in modernity/tools fits this. Let me know if it doesn't

Post by Ian Piumarta

Post by Loup Vaillant-David
- A tutorial for writing your own toy.

That is basically the intent with the modernity system in modernity/src. You'll see a few references to "books". My hope is to have a rich enough gui to load an active-essay that is the description of the system. Similar to the Physically based ray tracing book (http://www.pbrt.org/).

Post by Ian Piumarta

Post by Loup Vaillant-David
- The hand-written bootstrap compilers (for understanding, and the
Trusting Trust problem).

see modernity/tools/maru-bootstrap (for the C version) and modernity/tools/maru for the maru-in-maru implementation

I'd be curious to see what parts are more understandable and what parts are still confusing to folks. Since I spent a good 4-5 months steeped in this much of this makes a lot of sense to me but I'm sure there's still lots in the way of handholding for a newcomer.

shawn

Loup Vaillant-David

2013-10-23 17:01:47 UTC

Permalink

Terrific work! I have just cloned your git repository, I will check
it out.

But first, I need to crack generalised Earley Parsing. I love OMeta,
but the hack it uses to get around PEGs limitations on left recursion
is ugly (meaning, not fully general).

I basically want PEGs that run on Earley parsing. If we consider
functions that return rules as infinite sets of rules, I believe this
should work. The main difficulty is that now, states rules are
effectively closures, and we still need to compare them.

Loup

Post by shawnmorel
Since we're on the topic of forking...
About a year and a half ago, I basically took maru-2.1 apart and rebuilt it from scratch as a learning experiment. Also, in the spirit of fonc and the sciences of the artificial, I wanted to write experiments against the maru system - these of course boil down to expressing a new system / language IN maru (I've called those sets of experiments / language / system "modernity")
Given the recent interest in trying to understand maru, I figure I'd share my heavily documented experience doing this.
https://github.com/strangemonad/modernity
In many ways this was an adventure in software forensics and trying to put myself in Ian's mind / frame of thinking.

Post by Ian Piumarta
(That's a personal gripe and might be blowing my dislike of github, et al., out of all proportion. :)

Or maybe a gripe against the way the Linux kernel devs organize their work?

Post by Ian Piumarta

Post by Loup Vaillant-David
Here is how I would imagine my dream world. It would be a central
- A toy Maru, optimised for clarity.

Hopefully, what' you'll find in modernity/tools fits this. Let me know if it doesn't

Post by Ian Piumarta

Post by Loup Vaillant-David
- A tutorial for writing your own toy.

Post by Ian Piumarta

Post by Loup Vaillant-David
- The hand-written bootstrap compilers (for understanding, and the
Trusting Trust problem).

see modernity/tools/maru-bootstrap (for the C version) and modernity/tools/maru for the maru-in-maru implementation
I'd be curious to see what parts are more understandable and what parts are still confusing to folks. Since I spent a good 4-5 months steeped in this much of this makes a lot of sense to me but I'm sure there's still lots in the way of handholding for a newcomer.
shawn

Attila Lendvai

2013-10-21 14:23:16 UTC

Permalink

Post by Ian Piumarta
Where there is a choice between forking to add a fix or feature and
clamouring to get that fix of feature adopted into some kind of central
repository, I believe clamouring always benefits vastly more people in the
long run. I intensely dislike the github/gitorius 'clone fest' mindset
because it dilutes and destroys

github is only a tool to communicate, and IMO a very good one at that. how
much people will cooperate using that tool depends on the people who use it.

when i have the author hat on, then i welcome that i can easily see who
forked my repo and what changes he did, and i can even pull over a patch
from them if i like it with a click of the mouse, all on my own initiative.

just some 0.02, let's keep this thread on track.

--
â¢ attila lendvai
â¢ PGP: 963F 5D5F 45C7 DFCD 0A39
--